AI product prototype
Medical Coding Copilot
I prototyped an AI assistant for recording and processing billable medical services using an agentic workflow over case context, reports, tariff data, and deterministic validation.
Key metrics
- Workflow
- human in the loop
- Output
- structured suggestions
- Validation
- deterministic boundary
Problem
Tariff-relevant information is rarely stored in one neat place. It can be spread across reports, progress notes, diagnoses, appointments, and earlier cases for the same patient.
Today, much of the work of finding, cross-reading, and interpreting that context happens in the head of the person recording services.
The existing validator already checks whether recorded services are technically valid. That matters, but it only solves the downstream part of the workflow: once a service has been entered, it can be validated.
The harder upstream question is whether the system can detect tariff-relevant evidence in the patient journey before the user has manually connected all the dots.
- It takes time to collect the relevant context before a service can be recorded.
- Plausible billable services can be missed or noticed late.
- Users need support for discovery and interpretation without turning billing decisions into opaque automation.
Product thesis
The most useful AI experience here is not a generic chatbot. It is a controlled proposal workflow.
The assistant should gather relevant case context, translate it into the right tariff and search space, propose candidate services, show its evidence, and let the user accept or reject the result.
Validation then remains deterministic and auditable instead of being hidden inside a model response. In short: use AI for discovery and reasoning support, not as an unreviewed billing authority.
Technical shape
The prototype was designed around a simple boundary: probabilistic AI can discover and explain candidates, but deterministic application logic stays responsible for validation and persistence.
I treated structured output as a product requirement, not just an implementation detail. A suggestion needs to be machine-readable enough for the UI and validator, but clear enough for a domain expert to review.
{
"candidateService": {
"code": "TARIFF_CODE",
"label": "Human-readable service name"
},
"confidence": 0.82,
"evidence": [
{
"sourceType": "report | note | appointment | diagnosis | service-session",
"summary": "Why this source supports the suggestion"
}
],
"assumptions": ["Context that should be checked by the user"],
"missingInformation": ["Information needed before safe acceptance"],
"recommendedAction": "accept | review | reject"
} | Structured field | Purpose |
|---|---|
| candidateService | Code and human-readable label for the proposed service |
| confidence | Plausibility signal for review and prioritization |
| evidence | Source summaries explaining why the candidate is supported |
| assumptions | Context the user should explicitly check |
| missingInformation | Open information needed before safe acceptance |
| recommendedAction | Accept, review, or reject recommendation for the UI |
- Context assembly collects relevant information from case documentation, appointments, diagnoses, and existing service records.
- The skill layer narrows the task into the right tariff, language, and domain-specific search space.
- The agentic loop searches, proposes, checks evidence, refines weak candidates, and returns structured output.
- The review UI shows the suggestion, confidence, evidence, and assumptions before a user takes action.
- The validation boundary checks accepted suggestions with deterministic business and tariff rules before recording a service.
What I built
I built a prototype for a Medical Coding Copilot that combines two interaction modes: free-text questions about a case, tariff logic, or possible services, and an agentic proposal loop that actively searches for billable-service candidates.
The assistant works over a shared multi-case context that can include reports, progress notes, diagnoses, appointments, and existing service sessions. On top of that context, skills can be enabled for specific tariff or knowledge domains, such as TARDOC or future language- and tenant-specific rule spaces.
Each suggestion is treated as a reviewable object rather than a loose text answer.
- What service might be relevant?
- Which clinical hints support it?
- Where does the evidence come from?
- What assumptions were made?
- How confident is the assistant?
- What validation or human review is still required?
Why validation matters
Medical coding is not a place for opaque automation. A plausible suggestion still needs to survive domain rules, tariff constraints, and professional review.
That separation makes the assistant safer, easier to debug, and easier to explain to non-technical stakeholders.
- The assistant finds and explains possible services.
- The user decides whether the suggestion makes sense.
- The validator checks technical and tariff-level correctness.
- The product keeps the evidence chain visible.
Potential impact
The prototype targets both efficiency and completeness.
On the efficiency side, it reduces the manual effort of searching, cross-reading, and translating clinical context into billing logic. On the revenue side, more complete service recording can help teams identify plausible services earlier and more consistently.
A future production version could run suggestions proactively, but only as auditable drafts with visible evidence and deterministic validation.
- Time to record a service
- Suggestion acceptance rate
- Rejection and correction rate
- Services identified after initial documentation
- Validation errors after one-click creation
- Gap between documented work and recorded services
Scalability
The core workflow is reusable across clinical environments because documentation is distributed, coding expertise is specialized, and service recording has to stay correct and explainable. The skill-based architecture lets tariff logic, language, tenant configuration, and domain-specific search vary without changing the basic assistant workflow.
The main limitation is documentation quality: incomplete or vague clinical notes reduce the reliability of suggestions. That is also a useful product signal, because weak suggestions can expose documentation gaps that teams may want to fix upstream.
What this shows about my work
This project was interesting because the value was not in making an AI write text. The value was in shaping a workflow around a high-friction decision: what should be recorded, why, and how can the system make that decision easier to review?
- I turn ambiguous domain problems into concrete product flows.
- I design AI features around context, evidence, validation, and user control.
- I connect technical feasibility with UX and business value.
- I build prototypes that are useful beyond a demo and can inform a production implementation.
Result
The Medical Coding Copilot prototype showed a practical path for connecting clinical documentation with billable service recording. Instead of pushing blind automation into a sensitive workflow, it used AI to surface evidence, propose actions, and keep validation deterministic.
The key idea is simple: if relevant evidence already exists in the documentation, the product should help users find it, evaluate it, and turn it into a validated service without making them manually reconstruct the whole patient journey every time.