Journal/integrity

We Built AI That Catches Candidates Cheating (Here's How)

Candidates are using ChatGPT to game AI assessments. Here's how Miki's patent-pending Active Integrity Probing detects AI-assisted answers, coached responses, and cheating in real time.

integrityAI cheatingsales assessmentActive Integrity Probing

In late 2025, a candidate took an AI sales assessment. Their responses were articulate, well-structured, and hit every evaluation criteria perfectly. Textbook objection handling. Flawless discovery questions. Persuasive value articulation.

Too perfect.

Their response times averaged 2.3 seconds for 80-word answers. A human types at roughly 40 words per minute. Producing 80 polished words in 2.3 seconds requires either superhuman typing speed or a second source generating the text.

Something didn't add up. And this wasn't an isolated case.

The Cheating Problem Is Real and Growing

AI-powered assessment tools have a fundamental vulnerability: the same AI technology that powers the assessment can be used to game it.

It's not theoretical. It's happening now:

  • ChatGPT copy-paste. The most common method. Candidate opens ChatGPT in a second tab, feeds it the AI buyer's question, copies the generated response, and pastes it into the assessment. Time from question to "answer": under 5 seconds. Quality: suspiciously polished.

  • Browser extensions. AI writing assistants that suggest responses in real time. Some are designed specifically for interview and assessment scenarios. The candidate types a rough draft; the extension rewrites it into a polished response.

  • Live coaching. Someone else is in the room — or on a call — feeding the candidate responses. Harder to detect in text, but patterns emerge: inconsistent expertise levels, responses that shift in style mid-conversation, or references to information the coaching partner has but the candidate wouldn't.

  • Pre-prepared scripts. Candidate researches common assessment scenarios beforehand, prepares ideal responses, and displays them on a second screen. The "conversation" is actually a performance with rehearsed lines.

  • AI rewriting tools. The candidate types a rough draft, then runs it through a tool that polishes the language, adds structure, and improves persuasion. The thinking is theirs; the expression isn't.

The irony is bitter: AI-powered assessments are vulnerable to AI-powered cheating. If you don't address this, your assessment isn't measuring who can sell — it's measuring who has the best AI assistant.

Why This Matters More Than You Think

Assessment validity is the foundation of hiring accuracy. If the assessment can be gamed, every score is suspect. And if every score is suspect, you're back to gut feel — except now you have expensive gut feel with a dashboard.

Consider the downstream impact:

The cheating candidate gets hired. They scored 92 on the assessment. Their real ability is closer to 65. You discover this at month 3 when their pipeline is empty and their discovery calls are rambling. The cost of that bad hire: $250K+.

The honest candidate loses the job. They scored 78 — an honest 78 that reflects real skill. But the cheating candidate scored 92. The honest candidate doesn't get the interview. Your best actual prospect is eliminated by someone who's better at cheating than selling.

Your team loses trust in the tool. Hiring managers see high-scoring assessment candidates underperform. They conclude the assessment doesn't work. The real problem isn't the assessment — it's that the input data was compromised. But the outcome is the same: the team stops using it.

The assessment becomes a cost, not an investment. You're paying for a tool that provides false confidence. That's worse than no tool at all — at least with no tool, you know you're guessing.

This is why we built Active Integrity Probing. Not because it's a cool feature. Because without it, AI-powered sales assessment is a $250K coin flip.

How Active Integrity Probing Works

We'll share enough to build trust in the system. We won't share enough to help people circumvent it. That balance is deliberate.

Response Timing Analysis

Humans have consistent, measurable patterns when they read, think, and type.

Reading a 50-word question takes 5–8 seconds. Processing the question and formulating a response takes 3–15 seconds depending on complexity. Typing an 80-word response at normal speed takes 60–120 seconds.

AI-assisted responses break these patterns. Common signatures:

  • Impossibly fast responses. An 80-word, well-structured answer arriving 5 seconds after a complex question is a signal. Not proof — some people think fast — but a signal that's tracked across the full conversation.
  • Uniform timing regardless of complexity. A simple "What's your name?" gets a 3-second response. "How would you handle a VP who says they're evaluating your competitor and need 40% off to stay in the deal?" also gets a 3-second response. Humans show variable latency that correlates with question difficulty. AI assistants don't.
  • Burst patterns. Long pause (30+ seconds — the candidate is prompting their AI assistant) followed by a perfectly formed response that arrives all at once (paste).

Timing analysis alone doesn't prove cheating. It flags anomalies that trigger deeper analysis.

Consistency Probing

Real sales conversations build on themselves. What you said in turn 3 shapes what you discuss in turn 8. Genuine participants maintain deep consistency because they're experiencing a single continuous conversation.

AI-generated responses struggle with this. Each response is generated in relative isolation. The model may produce a brilliant discovery question in turn 4 that contradicts something the "candidate" said in turn 2 — because the AI assistant doesn't have the same conversational memory.

Active Integrity Probing exploits this. Mid-conversation, the AI buyer asks follow-up questions that specifically reference earlier exchanges:

"Earlier you mentioned the budget concern — you said your approach would be to connect it to pipeline impact. Can you walk me through exactly how you'd calculate that for a team of 15 reps?"

A candidate who genuinely made that point can elaborate naturally. A candidate who pasted a ChatGPT response about "connecting to pipeline impact" often can't substantiate it because they didn't generate the idea — they copied it.

Consistency probing gets progressively more specific as the conversation advances. By turn 12, the assessment is testing whether the candidate can maintain a coherent narrative across 15+ exchanges about the same deal scenario. AI-assisted responses can't maintain this depth of consistency without the candidate understanding every prior response well enough to extend it.

Behavioral Pattern Detection

Real sales conversations have a natural rhythm. Thinking pauses. Partial restarts. Varying quality across responses — some sharp, some mediocre, some that get better as the candidate finds their footing.

Artificial uniformity is itself a signal. When every response is equally polished, equally structured, and equally strong, it often means a tool is smoothing out the natural variance of human performance.

The system looks for:

  • Suspiciously uniform quality. Real candidates have strong turns and weak turns. If every turn reads like it was crafted by the same optimization function, that's notable.
  • Style shifts. The candidate's first 3 responses are casual and direct. Response 4 suddenly uses complex sentence structures, formal vocabulary, and perfect paragraph breaks. Something changed — possibly a tool was activated.
  • Vocabulary jumps. A candidate who uses straightforward language suddenly produces responses with "synergistic value proposition" and "operationalize the implementation framework." The vocabulary jump suggests an external source.

Active Challenges

This is where "probing" gets its name.

At strategic points in the conversation, the AI buyer introduces unexpected scenario pivots that only someone genuinely tracking the conversation can handle naturally:

"Actually, I just got a text from your competitor's rep. They're offering 30% less. Before you respond — what did I tell you my main concern was, back when we first started talking?"

The challenge tests two things simultaneously: (1) can the candidate recall a specific detail from earlier in the conversation, and (2) can they handle an unexpected pivot without breaking composure?

Genuine candidates adapt. They recall the detail, address the competitive threat, and reconnect to the buyer's original concern. AI-assisted candidates often fumble the recall — because their AI assistant wasn't tracking the cumulative conversation context the same way a genuinely engaged participant would.

What Active Integrity Probing Is NOT

Transparency matters. Here's what we don't do:

We don't monitor the candidate's screen. No screen recording, no browser lockdown, no webcam surveillance (beyond the standard video call if using video assessment). The candidate's device is their business.

We don't require a lockdown browser. Tools like ProctorU or Examity force candidates into surveillance environments. That's appropriate for standardized academic exams. It's hostile and inappropriate for a professional sales assessment. We want candidates to feel challenged, not surveilled.

We don't penalize articulate candidates. Being well-spoken isn't a red flag. Integrity probing analyzes patterns across the entire conversation — timing, consistency, behavioral variance, challenge responses — not the quality of individual answers. A legitimately brilliant candidate scores high on both performance AND integrity.

We don't make the final call. Integrity probing produces a verification status: verified (no anomalies), flagged (anomalies detected — review recommended), or inconclusive. The hiring decision is always human. We provide the data. You decide what to do with it.

Why We Patented It

Assessment integrity isn't a feature. It's the foundation. If the assessment can be gamed, every score is worthless. Every hiring decision built on those scores is compromised. Every dollar spent on the tool is wasted.

We believe Active Integrity Probing is important enough to protect. The patent-pending technology covers the methods described above — not the general concept of "detecting cheating," but the specific technical approach of in-conversation probing, timing analysis, and behavioral pattern detection in AI-mediated sales simulations.

We're not the only ones who'll need to solve this problem. But we intend to be the ones who solved it first and solved it best.

The Uncomfortable Question No One's Asking

Every company using AI assessments should be asking: "What percentage of our assessment results are compromised by AI-assisted cheating?"

Most companies aren't asking because the answer is uncomfortable. If 10% of candidates are cheating — a conservative estimate given the accessibility of ChatGPT and AI writing tools — then 10% of your "high-scoring" candidates aren't who they appear to be.

For a team that assesses 100 candidates per quarter, that's 10 potentially compromised assessments. If just one of those candidates gets hired based on inflated scores, the cost isn't the assessment subscription — it's the $250K bad hire that follows.

Every Miki assessment — on every plan, for every candidate — includes Active Integrity Probing. Because validity shouldn't be a premium feature.


Every assessment on Miki is integrity-verified. Patent-pending technology, included on every plan.

Learn about Active Integrity Probing

Next step

See how Miki turns editorial thinking into a live hiring workflow.

Browse more essays or jump straight into a product demo if you want to see the assessment layer behind the ideas.

Book a demoSee Miki in action