What is Phoenix, the Grok-based ranker?
Phoenix is the model that decides what goes in your For You feed. By X's own description it is the Grok-1 transformer architecture adapted for recommendation, and it works in two stages: retrieval narrows millions of posts down to roughly a thousand candidates, then a ranking transformer scores those candidates by predicting how likely you are to take each of nineteen actions on each one. A deliberate design choice means each post is scored in isolation — its score does not depend on which other posts happen to be in the batch beside it.
Behind the engagement weights and the spam screens sits the model that actually does the ranking. X open-sourced a representative version of it, and its own README is unusually direct about what it is.
It is Grok, adapted for your feed
The release states the lineage plainly: Phoenix's transformer is ported from the Grok-1 open
release, adapted for recommendation with custom input embeddings and attention masking. The same
model family that powers xAI's chatbot is the architecture ranking your timeline — which is why
the content-understanding layer that reads your posts (the spam
classifier, the multimodal embedder) is
Grok-based too. The whole pipeline speaks one model's language.
Phoenix's transformer is ported from the Grok-1 open-source release, adapted for recommendation with custom input embeddings and attention masking; the release states it is representative of the model used internally except for specific scaling optimizations.
Two stages: retrieval, then ranking
The README describes a two-stage pipeline. Stage one, retrieval, uses a two-tower model to narrow millions of candidate posts down to hundreds using fast similarity search. Stage two, ranking, takes that smaller set and scores it with a more expressive transformer.
241. Retrieval: Efficiently narrow down millions of candidates to hundreds using approximate nearest neighbor (ANN) search 252. Ranking: Score and order the retrieved candidates using a more expressive transformer model
Phoenix operates in two stages: retrieval narrows millions of candidates to hundreds via approximate-nearest-neighbor similarity search (a two-tower model), then a more expressive transformer ranks the retrieved set.
This is why getting retrieved and getting ranked are two different problems. A post that never enters the candidate set in stage one never gets a ranking score at all — the two-tower retrieval has to surface you before the expressive model ever judges you.
Each post is scored in isolation
The ranking transformer carries a deliberate constraint: candidates cannot attend to one another. The README calls this out as a critical design choice.
72...candidates cannot attend to each other during inference. This is a critical design choice that ensures the score for a candidate doesn't depend on which other candidates are in the batch
In the ranking transformer, candidates cannot attend to each other during inference — a deliberate design choice ensuring a candidate's score does not depend on which other candidates are in the batch. Candidates still attend to the user and the user's history.
Your post is judged on its own merits and your history — not against whatever else happened to be retrieved alongside it. Candidates still attend to you and your engagement history (so ranking is deeply personalized), but not to each other. There is no "the competition was tough that minute" effect inside the model itself; the head-to-head sorting happens later, in the weighted scorer and diversity passes.
It predicts nineteen actions at once
The ranking model's output is multi-action: for each candidate it emits a probability for each of nineteen engagement types simultaneously. The README's example decodes part of the action enum directly:
250Action indices follow the proto ActionName enum: 1 = favorite, 4 = reply, 5 = quote, 6 = repost, 11 = dwell, 13 = video quality view.
The ranking model predicts multiple engagement types simultaneously per candidate (output shape [B, num_candidates, num_actions]); the mini config declares 19 action types, and the README decodes part of the proto ActionName enum: 1=favorite, 4=reply, 5=quote, 6=repost, 11=dwell, 13=video quality view.
Nineteen action types — the exact count the weighted scorer combines. Phoenix produces the per-action probabilities; the weighted scorer multiplies each by its weight and sums them into the single number that orders your feed. This is the join between the two pages: Phoenix predicts, the scorer weighs.
Signal by signal
| in the code | in plain english | where xDoctor surfaces it |
|---|---|---|
| two-tower retrieval | You must be surfaced before you can be ranked. Relevance and similarity get you into the candidate set. | Coach · Niche fit |
| candidate isolation | Your post is judged on itself and your history, not against the batch around it. | — |
| multi-action output (19) | The model predicts every engagement type at once, including the negative ones — the same nineteen the scorer weighs. | Coach · Predicted actions |
| personalized to history | Ranking attends to the viewer's own engagement history — the same post scores differently for different people. | Coach · Audience fit |
What the code doesn't say
The production model's size and weights. The release is explicit that it is a mini
version — 128-dimensional, 4-layer — and a frozen checkpoint, while production Phoenix is larger,
trained continuously on real-time data. The architecture and behavior are representative and
code-current; the exact production parameters, the trained embedding values, and the
per-action weights (which live in the separate withheld params module) are not in
the open. The shape of the machine is public; its precise calibration is not.
The released Phoenix is a mini model (128-dim, 4-layer) and a frozen checkpoint; the README states production Phoenix uses a larger model with more layers and wider embeddings and is trained continuously on real-time data.
The numeric values of the current weights are not included in the open-source release: weighted_scorer.rs references a params module (e.g. p::FAVORITE_WEIGHT, p::REPLY_WEIGHT) whose values are not present anywhere in the published repository.
What to do with this
Think in two steps, because the model does. First earn retrieval — be similar and relevant enough to enter the candidate pool, which is where niche consistency and topic signals matter. Then earn ranking — give the model reasons to predict the nineteen positive actions and few reasons to predict the negative ones. Phoenix is not a black box anymore; it is a documented Grok transformer, and the signals it predicts are the exact terms xDoctor's diagnostics estimate from your own history.