| The comfort of numbers
Founders like scorecards because they feel rational.
The numbers look clean. They’re comparable, and they’re defensible.
They also create the impression that hiring decisions are evidence-based.
That impression can be deceptive (and often false!)
| What interview scores actually do
Most scorecards fail in the same way.
They compress complex judgment into a single number.
Normally 6, 7 or 8 out of 10.
If everyone gives a candidate a similar score, it feels like alignment.
In reality, it usually means everyone interpreted the interview differently.
The score ends up disguising that disagreement.
| False precision is worse than no precision
A numeric score suggests accuracy that does not exist.
Think about it this way: two interviewers can both give a candidate a 7 for entirely different reasons:
One is worried about pace
Another is worried about judgment
Neither concern is visible in the number
So the team debates the score instead of the risk.
That is how clarity disappears.
| Scores encourage safety, not conviction
There is another problem: Scorecards reward moderation.
Extreme scores feel risky, and mid-range scores feel safe.
Over time, teams tend to drift toward:
“Solid 7s”
“No major red flags”
“Good enough”
This is how founders end up hiring people they feel relieved about rather than excited by.
This relief, however, is risk avoidance dressed-up as conviction.
| What high-signal teams do instead
High-performing teams still capture feedback.
They just do not “average” it.
They force trade-offs.
Instead of asking “What score would you give them?”, they ask:
Would I hire this person tomorrow for this role?
What would break if we hired them?
What would break if we did not?
Those questions surface judgment, but judgment still needs structure.
Without it, teams fall back into numbers or debate.
That is where anchored decisions come in.
| Replace numbers with anchored decisions
If interview scores are lying, the fix is not better calibration sessions or longer debriefs.
It is removing unanchored numbers altogether.
Replace 1–10 scales with four decision buckets:
“Strong Yes” = Clear evidence with low risk
“Leaning Yes” = Evidence present. Risk is understood and manageable
“Leaning No” = Gaps, inconsistencies, or unanswered questions
“Strong No” = Clear risk. Clearly missing our bar
Then, importantly, add one mandatory field:
Evidence: {Add a quote or a specific observed behaviour}
If an interviewer cannot point to evidence, the judgment is a guess.
If the evidence is weak, the decision should reflect that.
Anchored decisions improve signal, and they make any disagreement visible early.
| Force a final decision
Better signal alone, however, does not guarantee a decision.
That is why you need to add one final constraint.
After each interview loop, require one written sentence:
“Based on what I observed, I would or would not hire this person for this role, because…”
No averages. No hedging.
No hiding behind frameworks.
If the answer is unclear, the signal was weak.
If the answers conflict, the disagreement is visible.
Most importantly, ownership is clear.
| When you remove unanchored scores
Disagreement surfaces faster
Decision quality improves
Weak signal is obvious
Conviction becomes explicit
Hiring stops feeling vague because the judgment is no longer hidden.
In the next issue, we will cover the final failure point.
Why founders still make bad hires even with clean signal and fast decisions.
Cheers
Neil
