What is the "post safety screen" in the X code?

It's an annotator, not a judge. The released PostSafetyDeluxeClassifier runs a critical-tier Grok vision model over your post and profile, and returns only boolean metadata flags — the code hardcodes its verdict to non-positive with a score of exactly 0.0. It attaches safety facts to your post for other systems to consume; what those flags are, the withheld prompt decides.

What the code says

The whole classifier is small enough to read in one sitting, and its construction and verdict tell the story between them:

grox/classifiers/content/post_safety_screen_deluxe.py · L28–L29, L36–L38@ 0bfc279

28class PostSafetyScreenResult(BaseModel):
29    tweet_bool_metadata: TweetBoolMetadata
36        vlm_config = grox_config.get_model(ModelName.VLM_PRIMARY_CRITICAL)
37        vlm_config.temperature = 0.000001
38        vlm = VisionSampler(GrokModelConfig(**vlm_config.model_dump()))

grox/classifiers/content/post_safety_screen_deluxe.py · L83–L89@ 0bfc279

83            return [
84                ContentCategoryResult(
85                    category=ContentCategoryType.POST_SAFETY_SCREEN,
86                    positive=False,
87                    score=0.0,
88                    tweet_bool_metadata=result.tweet_bool_metadata,
89                )

Lines 86–87 are the point: the result is always non-positive with a score of exactly zero. This screen never flags anything itself — it runs a critical-tier vision model purely to attach boolean metadata to the post, for downstream systems to act on.

CODE-CURRENT0bfc279verified 2026-06-12

The post safety screen is an annotator, not a judge: it runs a critical-tier vision model (VLM_PRIMARY_CRITICAL) and returns only boolean metadata, with the verdict hardcoded to positive=False and score=0.0 — it attaches safety facts for downstream systems rather than flagging posts itself.

xai-org/x-algorithm — grox/classifiers/content/post_safety_screen_deluxe.py, constructor (L36–L44) and verdict (L83–L90)as of the May 15, 2026 release

Signal by signal

in the code	in plain english	where xDoctor surfaces it
VLM_PRIMARY_CRITICAL	This runs on the critical model tier — safety metadata is expensive and X pays for it on purpose.	—
tweet_bool_metadata only	The output is a set of yes/no facts about your post, not a penalty. The penalty logic lives elsewhere.	Checkup · Flagged Posts
positive=False, score=0.0	Hardcoded. The annotator never "convicts" — it testifies.	—
UserRenderer in prompt	Annotated with your profile in frame, like every Grox classifier.	Coach · Account

CODE-CURRENT0bfc279verified 2026-06-12

Grox content classifiers render the post's AUTHOR into the judging prompt: UserRenderer.render(post.user) places the user's profile in the model's context alongside the post, in the PTOS, banger, and post-safety-screen classifiers alike.

xai-org/x-algorithm — safety_ptos.py L84, banger_initial_screen.py L68, post_safety_screen_deluxe.py L54as of the May 15, 2026 release

What the code doesn't say

▲ What the code doesn't say

Which boolean flags exist. The prompt (PostSafetyDeluxe) imports from the absent grox.prompts.template module, and the TweetBoolMetadata type definition lives in a data-types module that is likewise not in the release. We can prove the screen annotates rather than judges; the list of facts it annotates is withheld.

UNKNOWN0bfc279verified 2026-06-12

The actual spam criteria are withheld: spam.py imports its system prompt (SpamSystemLowFollower) from grox.prompts.template, and the entire grox/prompts/ module is absent from the public release — the classifier machinery is open, the rules it enforces are not.

xai-org/x-algorithm — whole-tree absence check: grox/prompts/ does not exist at the pinned commit (import at spam.py L10)as of the May 15, 2026 release

What to do about it

The architecture lesson: a "flag" on X is rarely one model's opinion — it is metadata attached here, consumed by enforcement logic elsewhere. That is why Checkup triages your history by surface rather than by a single score: the system being modeled works the same way.

← Penalties, safety screens & "shadowbans"

What is the "post safety screen" in the X code?

What the code says

Signal by signal

What the code doesn't say

What to do about it

Related questions