What is PTOS enforcement in the X algorithm?
PTOS — "Post Terms of Service" — is the safety enforcement layer in the released Grox code, and it names its categories in the open: seven policy areas, screened by a two-stage pipeline of Grok vision models. The policy texts themselves are withheld, and the released file carries visible redaction seams where clauses were stripped before publication.
The seven policies, named in the code
The enforcement taxonomy is not a leak or a guess — it is a Python set in the released file:
217 SUPPORTED_POLICY_CATEGORIES = { 218 SafetyPolicyCategory.ViolentMedia, 219 SafetyPolicyCategory.AdultContent, 220 SafetyPolicyCategory.Spam, 221 SafetyPolicyCategory.IllegalAndRegulatedBehaviors, 222 SafetyPolicyCategory.HateOrAbuse, 223 SafetyPolicyCategory.ViolentSpeech, 224 SafetyPolicyCategory.SuicideOrSelfHarm, 225 }
The PTOS safety enforcement taxonomy is named in the released code: seven policy categories — ViolentMedia, AdultContent, Spam, IllegalAndRegulatedBehaviors, HateOrAbuse, ViolentSpeech, SuicideOrSelfHarm — defined as the SUPPORTED_POLICY_CATEGORIES set.
A two-stage pipeline, on two different models
Enforcement is not one model call. A category classifier
(SafetyPtosCategoryClassifier, running VLM_SAFETY) first decides whether a
post violates anything at all; only then does a policy classifier
(SafetyPtosPolicyClassifier, running VLM_PRIMARY_CRITICAL — a critical-tier
model) re-judge the post against the specific policy detected. Both are vision models at
deterministic temperature: images are inside the judgment, and the verdict is repeatable.
PTOS enforcement is a two-stage pipeline on two model tiers: SafetyPtosCategoryClassifier (VLM_SAFETY) detects whether a post violates anything, then SafetyPtosPolicyClassifier (VLM_PRIMARY_CRITICAL) re-judges it against the specific detected policy — both vision models at temperature 0.000001.
There is also a "deluxe" mode: when enabled, the two most consequential categories —
AdultContent and ViolentMedia — are routed to a dedicated reasoning model,
with a fallback to the standard vision model if that call fails.
In 'deluxe' mode, the two most consequential policy categories — AdultContent and ViolentMedia (DELUXE_4_2_CATEGORIES) — are routed to a dedicated reasoning model, with fallback to the standard vision model on failure.
Signal by signal
| in the code | in plain english | where xDoctor surfaces it |
|---|---|---|
| 7 policy categories | The complete list of what safety enforcement screens for — by name, in a set literal. | Checkup · Policy-Sensitive |
| two-stage pipeline | Detection first, then a deeper re-judgment against the specific policy — a flag is a two-model event. | Checkup · Flagged Posts |
| UserRenderer in prompt | The model sees you — your profile is rendered into the judging context alongside the post. | Coach · Account |
| include_reply_to=True | The post you replied to is in the context. A reply is judged in its conversation, not in isolation. | Checkup · Promotional |
Grox content classifiers render the post's AUTHOR into the judging prompt: UserRenderer.render(post.user) places the user's profile in the model's context alongside the post, in the PTOS, banger, and post-safety-screen classifiers alike.
What the code doesn't say
Every policy's actual text — HateOrAbusePolicy,
ViolentSpeechPolicy, all seven — imports from the absent
grox.prompts.template module. And the release carries visible redaction seams:
a restriction-lines set emptied to blank strings, and a boolean condition missing its first
operand where a clause was stripped before publication. The category names are open; the
criteria, and at least one routing condition, are deliberately withheld.
The released safety_ptos.py carries visible redaction seams: the _THINKING_RESTRICTION_LINES set contains only empty strings (L40–L43), and a boolean condition is missing its first operand (L241–L244: "if ( / and self.deluxe / and …") — clauses were removed from the file before publication.
The actual spam criteria are withheld: spam.py imports its system prompt (SpamSystemLowFollower) from grox.prompts.template, and the entire grox/prompts/ module is absent from the public release — the classifier machinery is open, the rules it enforces are not.
What to do about it
The taxonomy is the audit checklist. Seven named categories means seven concrete questions to ask of your own archive — which is exactly how xDoctor's Policy-Sensitive surface is organized: it triages your history against the enforcement categories the code names, rather than against folklore.