Twitter announced on Friday that it is open-sourcing the code behind its recommendation algorithm that is used to select content for users’ timelines. Twitter’s engineering team revealed that tweets that end up in the For You timeline are chosen by a service called Home Mixer, which uses a pipeline process called candidate sourcing.
The pipeline fetches the best tweets from different recommendation sources, ranks each tweet using a machine learning model, and applies heuristics and filters. Twitter’s code made public today does not include parts behind advertising recommendations, which would endanger Twitter’s ability to keep threat actors’ attempts to manipulate the platform under control.
However, the company did not release training data or model weights associated with the Twitter algorithm at this point.
The end goal is for each user’s For You timeline to show 50% of relevant and recent tweets coming from their followers and the other 50% from people not in their network based on what the user would find interesting. Twitter’s announcement follows Twitter CEO Elon Musk’s tweet promising to make the Twitter algorithm public.
Earlier this month, Twitter took down proprietary source code and internal tools leaked on GitHub and publicly available for at least several months.
In a DMCA infringement notice, the company also asked GitHub to provide info on the access history for leaked code, likely to find out who downloaded the code while it was available online.
Twitter is also attempting to use a subpoena filed with the U.S. District Court for the Northern District of California to force GitHub to share identifying information on the FreeSpeechEnthusiasm user who first published the files and anyone who accessed and distributed the leaked Twitter source code, which could likely also be used for further legal action.
Twitter’s engineering team has published two separate GitHub repositories containing the source code for its recommendation algorithm and some of the machine learning models powering it.
However, Twitter’s code made public today doesn’t include parts behind advertising recommendations, or that would endanger Twitter’s ability to keep threat actors’ attempts to manipulate the platform under control.