RealityCheck: the rejection email that became a resume roaster

I replied to a rejection email, the CTO of HackerRank replied back, and it turned into the least clever and most used thing I have ever shipped. A story about real users, a viral loop, and the bar that quietly moved.

I once replied to a rejection email. Not a polite "thank you for your time" one. A frustrated, slightly unhinged one, basically asking: what are you people actually looking for? Good ranks, deployed projects, a decent CGPA, I had the checkboxes. Why not me?

I fully expected to be ignored. Instead the co-founder and CTO of HackerRank replied. Personally. And he did not explain my rejection. He told me to run my own resume through their hiring agent, which they had quietly open-sourced on GitHub.

That rearranged something in my head. The thing screening me was not a recruiter's mood or a vibe. It was an AI agent with an actual rubric, sitting right there in public, and it was not impressed by a good CGPA and a todo app. The bar had moved, and most of us were still preparing for the old one.

So I built RealityCheck: upload your resume, see exactly how that AI scores you, before a recruiter does. I made the uncomfortable thing one click. Then I posted about it. And a lot of people showed up. More than 10,000 resumes got run through it, and at the peak a couple thousand people did it in a single day. This is the least clever code I have ever written and the most real users I have ever had. Here is the whole thing, and what it taught me.

1. What it actually does

The pitch is one line: drop in your resume, see how an AI hiring agent scores you, before a recruiter does.

flowchart TD
    UP([drop in your resume, a PDF, no sign-up]) --> RD["reading it: PDF → clean text"]
    RD --> EX["extracting: six AI passes over every section"]
    EX --> GH["github: repos, stars, real contributions"]
    GH --> SC["scoring: 4 dimensions, like a real hiring agent"]
    SC --> RES["a score out of 120 + the part that stings:<br/>what is sinking it · line-by-line fixes · a roast"]
    RES --> SH([one tap to share → /s/id → a friend uploads theirs])

You upload a PDF. No sign-up, no account, no email. A few stages tick by while it reads your resume, pulls apart every section, peeks at your GitHub, and scores you across the four dimensions a real hiring agent cares about: open-source work, your own projects, production experience, and raw technical skill.

Out comes a number out of 120, a tier (somewhere from "Early Stage" to "Exceptional"), and then the part that actually stings: exactly what is dragging your score down, line-by-line rewrites of your weakest bullet points, and a "roast my resume" button with a gentle, medium, and brutal dial. The roast is the one everybody screenshots.

2. There is real engineering under the wrapper

Let me be honest up top: the scoring brain is HackerRank's, not mine. It is their open-source agent, used under their license. What I built is everything around it that makes it fast, fair, and shippable to thousands of strangers at once.

flowchart TD
    PDF([resume.pdf]) --> MD["PyMuPDF → markdown<br/>(no LLM, basically free)"]
    MD --> PAR
    subgraph PAR["these two run at the same time"]
      SEC["6 section extractions in parallel<br/>basics · work · skills · projects · edu · awards"]
      GH["GitHub enrichment<br/>repos, stars, real OSS vs a solo repo"]
    end
    PAR --> EV["1 evaluation call → a quality score 0-100<br/>for each of 4 dimensions"]
    EV --> FIN([+ bonus - deductions = your score out of 120])

A naive version of this makes one giant, slow AI call and you stare at a spinner forever. Mine does not. The PDF gets turned into clean text with zero AI (free and instant). Then six separate sections get extracted in parallel, and at the very same moment it is hitting the GitHub API to check whether your "open source" is actually open source or just five solo repos with two stars between them. Only after all of that does one final call score you. The whole thing is 7 to 8 model calls, but it finishes in well under a minute because the slow parts run side by side instead of single file.

One more speed trick. The model I use can "think" before it answers, which sounds great and is actually a 25-second tax per call. Switching that off dropped each call to about 8 seconds. Across seven calls, that is the entire difference between "people wait" and "people leave."

3. The one idea I am actually proud of

Here is the design decision that holds the whole product up: the AI never sees the weights.

flowchart TD
    AI["the AI scores QUALITY only<br/>(it never sees the weights)"] --> Q["open source 72 · self projects 61<br/>production 40 · technical skills 80"]
    Q --> CALC["weights are just arithmetic, in code:<br/>score = sum(weight × quality)/100 + bonus - deductions"]
    CALC --> S1([drag a slider → instant rescore, no AI call])
    CALC --> S2([paste a job description → weights retune for the role])
    CALC --> S3([never scored on name, college, CGPA, or city])

It only scores quality. How good is your open-source work, your projects, your production experience, your skills, each out of 100, judged in a vacuum. How much each of those matters (the weighting) is just arithmetic I do in code afterward, completely separate from the model.

That one split buys three things for free. First, you can drag a slider on the results page and your score updates instantly, right in your browser, with no new AI call, because it is literally just multiplication. Second, you can paste a job description and the app retunes the weights for that exact role (a senior backend job weights production way above open source). Third, and the one I care about most, fairness: because the model only ever judges your work, I could hard-code a rule that it must never score you on your name, gender, college, CGPA, or city. The stuff that should not matter structurally cannot.

4. Why it actually spread

This is the real story, because the code is the boring part. RealityCheck got users because of a few deliberate, slightly ruthless product choices.

The roast. Nobody shares a number. Everybody shares a roast. The brutal mode is a single AI call cranked up that finds the most embarrassing true thing about your resume, says it out loud, and then leaves you exactly one real fix. It is funny, it is a little too accurate, and it is extremely screenshot-able. It carries the whole thing.

Zero friction. No sign-up, no login, no email gate. You land, you drop a PDF, you get judged. Every extra field would have quietly killed a chunk of people at the door, so there are none.

A share that recruits. Every result has a one-tap share that gives you a permanent little page at /s/your-id. Under your scorecard or your roast sits one button: "Check my resume." So every share is also an ad, aimed at the single person most likely to want one: your equally anxious friend.

flowchart TD
    R([you get roasted]) --> SS["you screenshot it<br/>(funny, and a little too accurate)"]
    SS --> SH[you share /s/id]
    SH --> FR["a friend sees it and thinks:<br/>oof, what would mine say?"]
    FR --> UP[the friend uploads their resume]
    UP -->|no ads, just this loop| R

That is the whole growth engine. You get roasted, you screenshot it, you send it to a friend, the friend spirals about their own resume, uploads it, gets roasted, shares it. I did not run a single ad. The product did the distribution because the output was worth passing on.

5. What thousands of resumes taught me

The quiet best part of all this is that I ended up with a dataset basically nobody else has: thousands of real resumes, scored, with each candidate's exact wording attached.

So I mined it. A separate batch pipeline went back over every run, pulled out more than 5,000 real projects from a few thousand candidates, classified each one (research, applied AI, backend, frontend, devops), ranked them by how well they actually scored, and produced a set of "this is how strong candidates describe their projects" lists. Not what to build. How the people who scored well actually wrote about what they built.

And the recurring lesson, the one the CTO was right about, is brutal and simple: the agent is not impressed by checkboxes. A todo app is a todo app no matter how you phrase it. What scored was depth, real contributions other people use, things actually shipped to production, proof you can build and think. Most resumes, very much including an older version of mine, were polished to clear a bar that does not exist anymore.

What I learned shipping it

This is, by a wide margin, the least technically impressive thing I have built. It wraps someone else's model. The "algorithm" is multiplication. And it is also, by a wide margin, the thing the most real humans have ever touched. That gap is the whole lesson.

Shipping something people actually use is a completely different skill from building something technically deep, and it turns out to be the rarer one. The hard parts here were never the model. They were stripping out every ounce of friction, making the output worth sharing, and turning one honest LinkedIn post into a loop. I have built things with ten times the engineering and a tenth of the impact.

A rejection email got me a reply from a CTO, which got me a product, which got thousands of people their own uncomfortable little reality check. I will take that trade every single time. 🫡