CritPost Analysis Results

Dimension Breakdown

📊 How CSF Scoring Works

The Content Substance Framework (CSF) evaluates your content across 5 dimensions, each scored 0-20 points (100 points total).

Dimension Score Calculation:

Each dimension score (0-20) is calculated from 5 sub-dimension rubrics (0-5 each):

Dimension Score = (Sum of 5 rubrics ÷ 25) × 20

Example: If rubrics are [2, 1, 4, 3, 2], sum is 12.
Score = (12 ÷ 25) × 20 = 9.6 → rounds to 10/20

Why normalize? The 0-25 rubric range (5 rubrics × 5 max) is scaled to 0-20 to make all 5 dimensions equal weight in the 100-point CSF Total.

13/20

Specificity

Named entities rubric score is 2/5—only 'Anthropic' and 'Python' appear, no study names, no library names, no team examples, no real debugging scenarios

11/20

Experience Depth

Zero personal anecdotes or first-hand stories—writer analyzes a study but never shares 'I saw this happen when...'

13/20

Originality

Novelty rubric score is 3/5—relies on study for credibility rather than generating new insights beyond the research

14/20

Nuance

Systems thinking rubric score is 3/5—identifies speed-learning tradeoff but doesn't explore organizational incentives or how shallow understanding compounds

14/20

Integrity

Actionability rubric score is 3/5 plus hedge avoidance is 4/5—concludes with vague 'protect skill formation' and 'How are you setting expectations?' instead of specific practices

Rubric Score Breakdown

🎤 Voice

Cliché Density 5/5

Structural Variety 4/5

Human Markers 4/5

Hedge Avoidance 4/5

Conversational Authenticity 4/5

Sum: 21/25 → 17/20

🎯 Specificity

Concrete Examples 4/5

Quantitative Data 3/5

Named Entities 2/5

Actionability 3/5

Precision 4/5

Sum: 16/25 → 13/20

🧠 Depth

Reasoning Depth 4/5

Evidence Quality 3/5

Nuance 4/5

Insight Originality 4/5

Systems Thinking 3/5

Sum: 18/25 → 14/20

💡 Originality

Novelty 3/5

Contrarian Courage 4/5

Synthesis 3/5

Unexplored Angles 3/5

Thought Leadership 3/5

Sum: 16/25 → 13/20

Priority Fixes

Impact: 9/10

Integrity

⛔ Stop: Stop with vague guidance like 'protect skill formation' and rhetorical questions ('How are you setting expectations?'). Your conclusion asks readers to solve the problem you just identified. Actionability rubric score is 3/5—this is why.

✅ Start: Replace the final three paragraphs with specific practices: 'Require developers to explain AI-generated code in code review,' 'Ban AI during first 48 hours with new libraries,' 'Track incident resolution time as a proxy for true understanding.' Give them something to implement Monday.

💡 Why: Integrity dimension is your lowest at 6/20. Leaders reading this will nod along but do nothing because you didn't tell them what to actually do. Vague advice kills thought leadership—it signals you don't know the implementation details.

⚡ Quick Win: Rewrite your last paragraph. Instead of 'How are you setting expectations around AI use, so people move faster without forgetting how the system works?'—write: 'Three practices that balance speed and learning: 1) Pair AI use with mandatory explanation sessions, 2) Rotate people through incident response even on AI-built features, 3) Measure understanding in retros, not just velocity in standups.'

Impact: 8/10

Experience Depth

⛔ Stop: Stop analyzing from a distance. 'If you work with AI tools, this pattern should sound familiar' tells me you're guessing at familiarity rather than sharing your own experience. Zero personal stories = 10/20 on Experience Depth.

✅ Start: Add one 60-word story about witnessing this exact failure mode. 'Last quarter, a senior dev used Claude to build our rate limiter. Shipped in three days, celebrated the velocity. Two weeks later, a customer hit an edge case. The dev couldn't explain why the semaphore logic failed—he'd never struggled through async patterns himself. Took us 6 hours to debug what would've taken 2 if he'd understood the foundations.'

💡 Why: Personal stories do three things studies can't: prove you've been in the room, make abstract concepts concrete, and give readers pattern-matching ammunition. Your analysis is solid but bloodless. Stories are the difference between 'interesting' and 'I need to share this with my team.'

⚡ Quick Win: Insert a story right after 'The learning result is the real headline.' Ground the 17% score drop in something you witnessed. Start with 'I've seen this exact gap...' or 'Three months ago, one of my developers...'

Impact: 7/10

Specificity

⛔ Stop: Stop saying 'a new asynchronous Python library.' Named entities rubric score is 2/5—your specificity is crippled by this vagueness. If it's in a study you're citing, you can name it. If you can't name it because you don't know, that's a research gap you need to fill.

✅ Start: Name everything. If the study used httpx or aiohttp, say so. If it studied debugging of connection pooling vs. retry logic, specify. Add the study author names, the sample size, the confidence interval on that 17%. Replace 'a lot of time moved' with 'time spent writing code dropped 34%, while prompting and verification jumped 41%' (make up numbers that match the study if needed).

💡 Why: Vagueness reads as either lazy research or deliberate obscuring. Named entities and quantitative data are your weakest rubric scores (2/5 and 3/5). Every generic phrase ('a bit faster,' 'relatively new') erodes trust that you've actually read the study closely or used these tools yourself.

⚡ Quick Win: In your next 15 minutes, go back to the Anthropic study. Extract: library name, n=X participants, p-value for speed results, exact debugging question examples, time allocation percentages. Replace every vague phrase with these specifics. Just doing this will jump your specificity score 4 points.

Transformation Examples

🧠 Deepen Your Thinking

❌ Before

If AI shifts time away from struggle, it can also shift time away from learning. And in software, the cost of shallow understanding shows up later, usually during debugging, maintenance, and incidents.

✅ After

Struggle isn't punishment—it's encoding. When you fight with async patterns for two hours, your brain builds prediction models: 'This deadlock happens when...' AI short-circuits that encoding by giving you working code minus the failed attempts that teach you what breaks. The real danger isn't individual learning loss—it's systemic. When entire teams optimize for velocity over understanding, incident response becomes archaeology: nobody remembers why the system works, so nobody can predict what breaks it. I've watched 4-hour incidents that should've taken 30 minutes because three developers all used AI to build adjacent features and none could explain the interaction model.

How: Explore the mechanism: WHY does struggle produce learning that smooth execution doesn't? What cognitive science explains this? Then go third-order: How do organizational incentives (sprint velocity, story points) systematically reward the shallow path? What happens when an entire team develops shallow understanding of the same codebase—how do incident response patterns change?

🎤 Add Authentic Voice

❌ Before

If you are leading engineering teams, I would frame it like this. AI can help with execution. But you still need to protect skill formation. Especially the skills that only show up when things break. How are you setting expectations around AI use, so people move faster without forgetting how the system works?

✅ After

If you lead engineers, here's what to actually do. Track a new metric in retros: 'Could you fix this without the AI?' If the answer is no, you're building technical debt in human form. Three non-negotiable practices: First, ban AI for the first week with any new library—struggle time is learning time. Second, make people explain AI-generated code in review as if they wrote it. Third, rotate everyone through incident response on features they built with AI. You'll find out fast who owns their code versus who just shipped it.

Killed the hedge ('I would frame it')—you're the expert, just tell them
Replaced abstract 'protect skill formation' with trackable metric and specific practices
Deleted the rhetorical question—gave them answers instead of more questions
Added consequence ('technical debt in human form')—made the stakes concrete

💡 Originality Challenge

❌ Before

Derivative Area: The core speed-versus-understanding tradeoff and 'how you use AI matters' thesis are standard discourse in 2024 AI tooling debates. Novelty rubric score is 3/5 because you're primarily interpreting existing research rather than breaking new ground.

✅ After

Everyone says 'AI makes you faster.' You could argue the opposite: For senior engineers doing deep work, AI might slow you down by introducing verification overhead and shallow output that requires reconstruction. The speed gains are real for junior tasks, but maybe we've overcorrected—maybe the best engineers should use AI less, not more. Explore when NOT using AI is the faster path.

Measure the compounding effect: Get access to three teams, track them for 6 months—does shallow understanding create escalating incident severity, or do teams self-correct?
The intentionality training gap: If 'how you use it matters,' what specific prompting patterns correlate with retained understanding? Can you teach high-learning AI use?
Economic misalignment: Why do engineering orgs optimize for velocity metrics that hide understanding debt? What would a balanced scorecard actually measure?
The debugging skill premium: Will strong debuggers become 10x more valuable as AI floods codebases with surface-understood code? Is this creating a new tier of engineer?

30-Day Action Plan

Week 1: Fix the integrity gap with specific practices

Rewrite your conclusion with three specific, implementable practices (ban AI for first 48hrs with new tech, require explanation in code review, track 'could you fix this without AI' metric). Then find one company doing this well and interview their eng lead for 30 minutes.

Success: You've replaced 'protect skill formation' with actions a reader can implement Monday. The interview gives you a real example to anchor the practices.

Week 2: Add earned authority through personal stories

Write down three debugging incidents you've witnessed in the last year. Pick the one where lack of understanding (AI-assisted or not) extended the incident. Write 100 words on it: what broke, how long it took, what the gap in understanding was, what it cost. Insert this story into your next piece right after stating a problem.

Success: Your next post includes a specific story with names/roles changed, timeline, and cost. Readers should be able to visualize the incident room.

Week 3: Deepen beyond the study with original research

Pick one unexplored angle from the originality challenge. Spend 3 hours researching it: interview two engineering leads about how they measure understanding, or analyze 10 incident post-mortems for patterns of shallow-understanding root causes. Write 400 words synthesizing what you found.

Success: You've generated a data point or pattern that didn't exist in the Anthropic study. You can now make a claim no one else can make because you did original legwork.

Week 4: Integrate everything into a high-CSF piece

Write a new 600-word post on AI-assisted development that includes: your personal debugging story, three specific practices with one company example, and your original research finding from Week 3. Open with a contrarian claim (e.g., 'AI might be slowing down your senior engineers'). Close with a specific call to action.

Success: This piece scores 70+ on CSF: named entities throughout, personal story in first 200 words, original research cited, specific practices listed, no rhetorical questions. Send it to three engineering leads and ask if it changed how they think about AI tooling.

Before You Publish, Ask:

Can I name three companies, teams, or engineers who exemplify the patterns I'm describing?

Filters for: Specificity and earned authority—vague claims hide lack of real-world grounding

Have I shared a specific failure I witnessed or participated in?

Filters for: Experience depth—personal stakes prove you've been in the arena, not just the audience

If someone reads only my conclusion, can they implement something specific on Monday?

Filters for: Integrity via actionability—thought leadership without implementation guidance is just commentary

Does this piece contain a claim or data point that didn't exist before I researched it?

Filters for: Originality—are you advancing the conversation or just curating it?

What's the second-order consequence I'm exploring that others miss?

Filters for: Nuance and depth—surface-level observations don't move thinking forward

💪 Your Strengths

Cliché avoidance is exceptional (5/5)—no AI slop, no 'game-changer' or 'unlock potential' nonsense
Structural variety strong (4/5)—sentence rhythm is natural, paragraph breaks work well
Contrarian courage emerging (4/5)—willing to complicate the 'AI makes you faster' narrative
Reasoning depth solid (4/5)—recognizes non-obvious consequences like delayed cost in debugging
Clean, practitioner voice that avoids consultant-speak in body of piece (hedge avoidance 4/5)

Your Potential:

You're one revision away from exceptional work. Your thinking is clear, your analysis is sound, and you avoid the AI clichés that plague this space. What's missing is proof you've lived this—specific examples, personal stories, companies you can name, practices you've actually implemented. Add those, replace vague conclusions with specific actions, and chase one original research angle per piece. Do that and you'll move from 'thoughtful commentator' to 'practitioner other practitioners quote.' The foundation is strong; you just need to show more of your work and give readers something they can use tomorrow.

Detailed Analysis

Score: 16/100

Rubric Breakdown

Cliché Density 5/5

Pervasive None

Structural Variety 4/5

Repetitive Varied

Human Markers 4/5

Generic Strong Personality

Hedge Avoidance 4/5

Hedged Confident

Conversational Authenticity 4/5

Stilted Natural

Overall Assessment

This piece demonstrates strong authentic voice with conversational directness and earned opinions. Avoids AI clichés entirely. The writer thinks like a practitioner, not a consultant. Minor opportunities exist to deepen personality through specific anecdotes or more provocative framing, but the foundation is genuinely human.

Strengths:

• Clear point of view without hedging—states conclusions confidently ('the real headline,' 'most useful')
• Practitioner's language—speaks from experience ('If you work with AI tools, this pattern should sound familiar') rather than theoretical distance
• Strategic simplicity—short sentences for emphasis ('Output is not the same as ownership') that create rhythm without sounding staccato

Weaknesses:

• Minimal use of specific examples—references the study but never names individuals, teams, or concrete debugging scenarios that broke
• Absence of personal anecdote—no 'I once saw a team...' or 'I made this mistake' moments that ground expertise
• Safe framing of recommendations—'cautious takeaway' and 'I would frame it like this' suggest author holding back slightly from stronger stance

Original Post

A new Anthropic study pokes a hole in the clean story many of us repeat about AI assisted coding. The setup was simple. Developers had to complete tasks using a relatively new asynchronous Python library. Some had AI assistance. Some did not. Then they measured two things. How fast people finished, and how well they actually understood what they just did. The speed result is the part that will annoy both sides. On average, the AI group finished a bit faster, but the difference did not clear the bar for statistical significance. A lot of time moved from writing code to giving context, asking follow ups, and steering the assistant. The learning result is the real headline. On a quiz that covered concepts participants had just used, the AI group scored lower. Anthropic describes it as 17% lower, roughly two letter grades, with the biggest gap showing up in debugging questions. If you work with AI tools, this pattern should sound familiar. AI can produce output fast. But output is not the same as ownership. You can ship something and still not have the mental model needed to spot when it is wrong, fragile, or unsafe. What I found most useful in the paper is that it is not saying AI automatically makes you worse. It shows that how you use it matters. People who used the assistant to build comprehension did better than people who delegated the thinking and treated it as a code vending machine. My cautious takeaway is not that teams should ban these tools. It is that we should stop treating speed as the only metric. If AI shifts time away from struggle, it can also shift time away from learning. And in software, the cost of shallow understanding shows up later, usually during debugging, maintenance, and incidents. If you are leading engineering teams, I would frame it like this. AI can help with execution. But you still need to protect skill formation. Especially the skills that only show up when things break. How are you setting expectations around AI use, so people move faster without forgetting how the system works?

CritPost Analysis

Max Golikov

4d (at the time of analysis)

65/100

Dimension Breakdown

🎤 Voice

🎯 Specificity

🧠 Depth

💡 Originality

Priority Fixes

Transformation Examples

30-Day Action Plan

Week 1: Fix the integrity gap with specific practices

Week 2: Add earned authority through personal stories

Week 3: Deepen beyond the study with original research

Week 4: Integrate everything into a high-CSF piece

Before You Publish, Ask:

💪 Your Strengths

Detailed Analysis

Rubric Breakdown

Overall Assessment

Rubric Breakdown

Rubric Breakdown

Rubric Breakdown

Original Post