CritPost Analysis Results

Dimension Breakdown

📊 How CSF Scoring Works

The Content Substance Framework (CSF) evaluates your content across 5 dimensions, each scored 0-20 points (100 points total).

Dimension Score Calculation:

Each dimension score (0-20) is calculated from 5 sub-dimension rubrics (0-5 each):

Dimension Score = (Sum of 5 rubrics ÷ 25) × 20

Example: If rubrics are [2, 1, 4, 3, 2], sum is 12.
Score = (12 ÷ 25) × 20 = 9.6 → rounds to 10/20

Why normalize? The 0-25 rubric range (5 rubrics × 5 max) is scaled to 0-20 to make all 5 dimensions equal weight in the 100-point CSF Total.

7/20

Specificity

Zero quantitative data and no named entities—catalogs patterns without proving they exist with measurement

9/20

Experience Depth

No personal stories or lived examples—reads like observation from distance rather than hands-on detection work

18/20

Originality

Exceptional taxonomy but stops at identification—doesn't explore implications or systemic causes

15/20

Nuance

Strong second-order thinking but lacks empirical validation and causal mechanism explanations

12/20

Integrity

Authentic voice but minimal actionability—tells readers what to spot, not what to do about it

Rubric Score Breakdown

🎤 Voice

Cliché Density 5/5

Structural Variety 5/5

Human Markers 4/5

Hedge Avoidance 4/5

Conversational Authenticity 4/5

Sum: 22/25 → 18/20

🎯 Specificity

Concrete Examples 5/5

Quantitative Data 1/5

Named Entities 0/5

Actionability 2/5

Precision 1/5

Sum: 9/25 → 7/20

🧠 Depth

Reasoning Depth 4/5

Evidence Quality 2/5

Nuance 4/5

Insight Originality 5/5

Systems Thinking 4/5

Sum: 19/25 → 15/20

💡 Originality

Novelty 5/5

Contrarian Courage 5/5

Synthesis 4/5

Unexplored Angles 4/5

Thought Leadership 5/5

Sum: 23/25 → 18/20

Priority Fixes

Impact: 9/10

Specificity

⛔ Stop: Making quantitative claims ('10x the normal human rate,' 'every third phrase') without data. Your Quantitative Data rubric score is 1/5—this is your critical vulnerability.

✅ Start: Pick three patterns and prove them. Analyze 100 ChatGPT outputs vs. 100 human blog posts. Count hedge words per 1,000 tokens. Measure 'delve' and 'landscape' frequency in GPT-3.5 training data vs. Google Books corpus. Example: 'Analysis of 500 LLM outputs shows "it's worth noting" appears in 68% vs. 4% in human academic writing.' Even small-scale evidence transforms this from 'I've noticed' to 'I've measured.'

💡 Why: Right now, someone can dismiss your entire piece by saying 'prove it.' Your Named Entities score is 0/5 and Precision is 1/5—you're describing phenomena without anchoring them to measurable reality. Quantification turns expert observation into expert research. It's the difference between 'LLMs hedge compulsively' and 'Claude 3 uses epistemic hedges 340% more frequently than humans in matched contexts.'

⚡ Quick Win: Take your 'LLM-signature vocabulary' list. Open ChatGPT. Generate 20 essays on different topics. Count occurrences of each word. Compare to 20 human essays from Medium on same topics. Create a simple frequency table. Add it. This takes 2 hours and multiplies your credibility by 10x.

Impact: 7/10

Experience Depth

⛔ Stop: Writing from the detached observer position. Phrases like 'If you have read a sufficient number' and 'this is what you have probably perceived' create distance. You're telling readers what they've seen rather than showing what you've discovered.

✅ Start: Add 2-3 origin stories for your pattern recognition. 'I first noticed compulsive both-sidesing when reviewing 50 AI-generated policy briefs for a client—37 contained "reasonable people disagree" in contexts where expert consensus was clear.' Or: 'The correctio pattern became obvious after I edited my third AI-drafted article and found myself deleting "Not X, but Y" seventeen times.' Personal discovery moments prove you've done the work and give readers experiential anchors.

💡 Why: Your Human Markers rubric score is 4/5—good but not great. The missing point is lived experience. Right now, this reads like you're very observant. With origin stories, it reads like you've been in the trenches debugging AI output for months. That shift—from observer to practitioner—is what separates hybrid content from thought leadership. Readers trust pattern recognition more when they see how you developed the eye for it.

⚡ Quick Win: Add one paragraph before your list: 'I started keeping this list after editing AI-generated content for [specific context]. Here's what emerged after analyzing [specific number] of outputs.' Personalize the frame without changing the taxonomy. Takes 15 minutes, adds experiential weight.

Impact: 6/10

Integrity

⛔ Stop: Ending with identification. Your Actionability rubric score is 2/5. You've brilliantly cataloged what's wrong but left readers with 'now what?' Every pattern you identify should connect to an intervention.

✅ Start: Add a 'Detection to Correction' section. For each pattern category, provide the remedy. 'Hedging: Delete "it's worth noting," "arguably," "to some extent" in first-pass editing. If the claim needs qualification, rebuild the sentence to embed nuance structurally rather than lexically.' Or: 'Epistemic cowardice: Force binary commitment. If you write "it depends," immediately follow with "but in 80% of cases, X." Provide the dependency structure with default recommendations.' Make this a diagnostic and repair manual, not just a field guide.

💡 Why: Right now, your piece is valuable for AI detection but not for AI improvement. Writers using LLMs can read this and think 'guilty as charged' without knowing how to fix it. Editors can spot problems but not solve them. Adding actionability transforms this from commentary into infrastructure—something people return to repeatedly. Your Precision score (1/5) and Actionability score (2/5) both signal this gap. You've identified the disease; now prescribe treatment.

⚡ Quick Win: Take your top 5 most egregious patterns. Add one-sentence fixes after each. 'Resumptive parroting → Delete the restatement. Start with your actual answer.' 'Hollow intensifiers → Delete truly/really/incredibly unless you can replace it with a measurement: "incredibly fast" → "3x faster than the baseline."' Tactical edits readers can execute immediately.

Transformation Examples

🧠 Deepen Your Thinking

❌ Before

If you have read a sufficient number of LLM-written and Human-unedited texts, this is what you have probably perceived: [list of patterns]. Individually fine, but LLMs reach for them at 10x the normal human rate.

✅ After

These patterns form a recognizable signature—for now. I've trained 30 editors to spot them, and detection accuracy approaches 90% after reviewing just 50 examples. But here's the evolutionary pressure: as detection improves, the next generation of models will be fine-tuned on 'remove phrases that trigger detection.' We're documenting tells in a game where both players adapt. The real question isn't 'can we spot AI writing' but 'what happens when we can't?' If future models eliminate these surface markers while retaining the underlying shallow reasoning, we'll face a harder problem: content that passes the stylistic test but fails the substance test. We're pattern-matching our way toward a world where the only reliable signal is depth—whether ideas are built on evidence, experience, and original research. Surface tells are a temporary advantage. The lasting skill is recognizing empty reasoning regardless of its packaging.

How: Explore the downstream consequences and strategic dynamics. What happens when pattern recognition becomes widespread? You're documenting tells in a game where both sides adapt. Push to third-order: If everyone can spot these patterns, how do they evolve? Does recognition create an arms race? What's the cognitive cost of constant authenticity-checking? What happens to reader trust when the tells disappear but the lack of substance remains?

🎤 Add Authentic Voice

❌ Before

Individually fine, but LLMs reach for them at 10x the normal human rate.

✅ After

LLMs reach for them at roughly 10x the rate I observe in human writing—though I'm quantifying that claim properly in the section below. Each word in isolation is perfectly serviceable. The pathology is frequency. When 'delve' appears in 40% of your outputs versus 3% of human texts, you've moved from vocabulary to signature.

Removed softening hedge—stated the claim directly upfront instead of walking into it
Added accountability—acknowledged the need for evidence and promised to deliver it
Explained the mechanism—frequency makes the difference, not the words themselves
Added specific (though illustrative) percentages to show what '10x' actually looks like in practice

💡 Originality Challenge

❌ Before

Derivative Area: The identification of patterns is original; the missing originality is in explaining *why* these patterns emerge from LLM training dynamics

✅ After

Everyone complains AI writing is 'too polished' or 'sounds robotic.' Your taxonomy is more precise, but you could go further: argue that these patterns aren't bugs but features—evidence that RLHF has successfully optimized for user satisfaction at the expense of truth-seeking. The hedging and both-sidesing aren't accidental; they're what users reward during training. Challenge the framing: maybe the problem isn't LLM design but human preference. We trained these models to sound like this by rating confident, direct outputs as 'unsafe' or 'biased.' The patterns you're cataloging are a mirror of our collective cowardice.

Causal mechanisms: Which patterns come from RLHF reward modeling vs. base model pretraining vs. instruction-tuning? Test across model variants.
Economic incentives: Why haven't model developers eliminated these patterns? Is there a safety-quality tradeoff where hedging reduces hallucination liability?
Adversarial dynamics: What happens when models are explicitly trained to avoid these patterns? Do they retain deeper reasoning failures while fixing surface tells?
Cross-model comparison: Do all transformers show these patterns equally, or are some architectures more prone to epistemic cowardice?
Human complicity: How often do human editors *add* these patterns when editing AI output, mistaking verbosity for professionalism?

30-Day Action Plan

Week 1: Evidence gathering (Specificity crisis)

Collect 100 AI-generated texts and 100 human comparisons. Use ChatGPT, Claude, and Gemini (33 each, same prompts). For humans, pull from Medium, Substack, and academic preprints in the same topic areas. Create a spreadsheet counting your top 10 patterns (hedge phrases, hollow intensifiers, signature vocab). Calculate frequency per 1,000 words. Generate comparison table.

Success: You have a data table showing concrete frequency differentials. Can confidently say: 'Analysis of 100 LLM outputs shows X pattern appears Y times per 1,000 words vs. Z in human writing.' Ready to insert into your piece.

Week 2: Personal narrative (Experience Depth)

Write 3 origin stories (250 words each). How did you first notice these patterns? What were you doing—editing client work, debugging your own AI use, analyzing submissions? Pick specific moments: the essay where you deleted 'arguably' 12 times, the policy brief that both-sided a clear issue, the blog post that had 8 instances of 'delve.' Make the pattern recognition personal and earned.

Success: You have 3 concrete anecdotes proving you've done hands-on pattern detection work, ready to weave into the piece. Each includes specific numbers, contexts, and discovery moments.

Week 3: Causal mechanisms (Nuance upgrade)

Research *why* these patterns exist. Read 3-5 papers on RLHF training dynamics and safety fine-tuning. Interview someone who's trained LLMs or worked on safety teams (Twitter DMs work—many researchers are accessible). Develop 2-paragraph explanation: 'Hedging emerges because RLHF training penalizes confident claims that could be wrong. Models learn that qualified statements get higher reward scores.' Connect patterns to training incentives.

Success: You can explain the mechanistic origin of at least 3 major patterns with specific references to training processes. You've moved from 'LLMs do X' to 'LLMs do X because the reward function incentivizes Y.'

Week 4: Actionable remedies (Integrity boost)

Create a 'Pattern Detection & Correction Guide'—add to the end of your piece. For each of your 18 patterns, provide: (1) Example sentence showing the problem, (2) One-sentence diagnostic (how to spot it), (3) Specific fix (delete X, replace with Y, restructure as Z). Make it a checklist editors can use. Test it: run 5 AI-generated pieces through your own guide and track improvement.

Success: You have an actionable appendix that transforms your piece from diagnostic tool to repair manual. Someone can take AI-generated content, apply your checklist, and measurably improve it. Your Actionability score moves from 2/5 to 4/5.

Before You Publish, Ask:

Can you measure it? If you claim something happens '10x more' or 'compulsively,' can you point to a frequency count, corpus analysis, or comparative study?

Filters for: Specificity and evidence quality. Separates observed patterns from proven patterns. High-CSF content doesn't just identify trends—it quantifies them.

What did you personally discover that contradicted your initial assumption? Where did your investigation surprise you?

Filters for: Genuine research vs. confirmation bias. If everything confirmed what you already believed, you didn't investigate—you illustrated. Real inquiry produces unexpected findings.

If someone implemented your advice, what would they do differently tomorrow? Can you watch them execute it?

Filters for: Actionability. Thought leadership creates change, not just awareness. If readers can't translate your work into modified behavior, it's commentary, not guidance.

What's the causal mechanism? You've identified that X happens—can you explain *why* it happens at the system level?

Filters for: Depth and nuance. First-order thinking describes what; second-order explains why; third-order explores implications. Low-scoring content stops at description.

What would you need to observe to conclude you're wrong? What evidence would falsify your claim?

Filters for: Intellectual integrity. Unfalsifiable claims are rhetoric, not analysis. Strong content makes predictions that can be tested and potentially disproven.

💪 Your Strengths

Exceptional originality—uses classical rhetorical terminology (correctio, cataphoric, epistemic cowardice) to formalize what others describe colloquially
Authentic voice with minimal hedging—you practice what you preach, writing with confidence and directness
Sophisticated second-order thinking—recognizes these patterns reflect training incentives, not random stylistic choices
Immediately actionable taxonomy—readers can spot these patterns instantly after reading your list
Structural variety—list format feels intentional rather than lazy; each entry is crafted, not templated

Your Potential:

You're 85% of the way to a reference document that gets cited every time someone discusses AI writing detection. The taxonomy is publication-ready. What's missing is the empirical foundation and actionable framework. Add quantification (Week 1), causal explanation (Week 3), and remediation guidance (Week 4), and this becomes the definitive guide to LLM writing patterns—the piece people bookmark and return to repeatedly. You have the eye; now build the evidence base. You're a naturalist who's identified 18 new species. The next step is collecting specimens, measuring them, and explaining where they fit in the ecosystem. Your insight is rare; your documentation needs to match it.

Detailed Analysis

Score: 18/100

Rubric Breakdown

Cliché Density 5/5

Pervasive None

Structural Variety 5/5

Repetitive Varied

Human Markers 4/5

Generic Strong Personality

Hedge Avoidance 4/5

Hedged Confident

Conversational Authenticity 4/5

Stilted Natural

Overall Assessment

This is a masterclass in identifying AI patterns—meta, self-aware, and written with genuine voice. The author uses specific examples, confident assertions, and conversational directness. The list format feels intentional rather than formulaic. Minor hedging in concluding statements prevents a perfect score.

Strengths:

• Authoritative and opinionated without being arrogant—writes like someone who has actually studied this problem deeply
• Structural variety: Uses long lists strategically, then breaks into prose analysis. Format choices feel deliberate, not default
• Sharp, specific observations with proper nouns and concrete examples rather than vague generalities

Weaknesses:

• Could use one personal anecdote or lived example ('I once read a 10,000-word LLM essay that used...')
• Minimal humor—could lean into irony when describing AI clichés (e.g., describing hedging while avoiding it yourself)
• Ending lacks a strong closing statement; trails off slightly

Original Post

If you have read a sufficient number of LLM-written and Human-unedited texts, this is what you have probably perceived: Correctio / Rhetorical reframing — "Not X, but Y." Hedging — compulsive softening to avoid commitment. "It's worth noting," "arguably," "in many ways," "to some extent," "it could be said that," "it's important to remember," "there's a sense in which." Sycophantic openers — praising the user's input before answering. "Great question!", "That's a really fascinating point!", "Absolutely!", Throat-clearing — long preambles before getting to the point. "Before we dive in, let's first understand…," "To answer this, we need to take a step back…," Resumptive parroting — restating what the user just said before responding. "So you're asking about X. X is a really important topic. When it comes to X…" Formulaic transitions — "Let's dive in," "Let's unpack this," "Let's break this down," Hollow intensifiers — "truly," "really," "incredibly," "remarkably,", "profoundly," used where they add nothing. Gratuitous metacommentary — narrating the difficulty of the topic instead of just addressing it. "This is a nuanced topic," "This is a complex issue with no easy answers," "There's a lot to unpack here." Epistemic cowardice / compulsive both-sidesing — refusing to commit. "There are valid points on both sides," "It really depends on context," "Reasonable people disagree," "There's no one-size-fits-all answer." The uninvited caveat — inserting unnecessary warnings or qualifications. "However, it's important to note…," "That said…," "With that being said…," "But keep in mind…" Compulsive exhaustiveness — listing every possible angle, exception, and edge case instead of giving a direct answer. The magic triple — always grouping things in threes. "It's fast, reliable, and scalable." Every time. Synonym stacking — "robust, comprehensive, and thorough," "clear, concise, and accessible" — padding with near-synonyms for false emphasis. Faux profundity — "At the end of the day," "At its core," "When we really think about it," "Fundamentally," "In a very real sense." The pivot to positivity — always ending on an uplifting or hopeful note regardless of whether the content warrants it. Inspirational summary — wrapping up with a sweeping statement that sounds wise but says nothing. "Ultimately, it's about finding what works for you." "The possibilities are truly endless." LLM-signature vocabulary — "delve," "landscape," "tapestry," "navigate," "leverage," "nuanced," "multifaceted," "underscores," "fostering." Individually fine, but LLMs reach for them at 10x the normal human rate. Cataphoric teasing — "Here's the thing…," "Here's where it gets interesting…," — false buildup to something that didn't need buildup. Compulsive listing — turning everything into bullet points or numbered lists when prose would be more natural. Excessive bolding — bolding every third phrase as if the reader can't find the important parts themselves.

CritPost Analysis

Luciano Floridi

3d (at the time of analysis)

61/100

Dimension Breakdown

🎤 Voice

🎯 Specificity

🧠 Depth

💡 Originality

Priority Fixes

Transformation Examples

30-Day Action Plan

Week 1: Evidence gathering (Specificity crisis)

Week 2: Personal narrative (Experience Depth)

Week 3: Causal mechanisms (Nuance upgrade)

Week 4: Actionable remedies (Integrity boost)

Before You Publish, Ask:

💪 Your Strengths

Detailed Analysis

Rubric Breakdown

Overall Assessment

Rubric Breakdown

Rubric Breakdown

Rubric Breakdown

Original Post