AI product prototype

Medical Coding Copilot

I prototyped an AI assistant for recording and processing billable medical services using an agentic workflow over case context, reports, tariff data, and deterministic validation.

Role: product engineering, AI workflow design, prototype implementation
Result: the prototype shaped a production-oriented path where AI supports discovery and reasoning, while humans decide and deterministic validators enforce correctness.

Workflow: human in the loop
Output: structured suggestions
Validation: deterministic boundary

Problem

Tariff-relevant information is rarely stored in one neat place. It can be spread across reports, progress notes, diagnoses, appointments, and earlier cases for the same patient.

Today, much of the work of finding, cross-reading, and interpreting that context happens in the head of the person recording services.

The existing validator already checks whether recorded services are technically valid. That matters, but it only solves the downstream part of the workflow: once a service has been entered, it can be validated.

The harder upstream question is whether the system can detect tariff-relevant evidence in the patient journey before the user has manually connected all the dots.

It takes time to collect the relevant context before a service can be recorded.
Plausible billable services can be missed or noticed late.
Users need support for discovery and interpretation without turning billing decisions into opaque automation.

Product thesis

The most useful AI experience here is not a generic chatbot. It is a controlled proposal workflow.

The assistant should gather relevant case context, translate it into the right tariff and search space, propose candidate services, show its evidence, and let the user accept or reject the result.

Validation then remains deterministic and auditable instead of being hidden inside a model response. In short: use AI for discovery and reasoning support, not as an unreviewed billing authority.

AI-assisted service proposal The assistant connects distributed context to tariff skills, then returns a reviewable suggestion instead of directly recording a service.


                          flowchart TD
    documents[Reports, notes, diagnoses] --> context[Shared case context]
    activity[Appointments and existing service sessions] --> context

    context --> skills[Tariff and domain skills]
    skills --> loop[Agentic proposal loop]
    loop --> suggestion[Reviewable service suggestion]

    suggestion --> decision{User decision}
    decision -->|Accept| create[One-click service creation]
    decision -->|Reject or edit| review[Manual review]

    create --> validation[Deterministic validation]
    validation --> service[Recorded billable service]
    review --> validation

Technical shape

The prototype was designed around a simple boundary: probabilistic AI can discover and explain candidates, but deterministic application logic stays responsible for validation and persistence.

I treated structured output as a product requirement, not just an implementation detail. A suggestion needs to be machine-readable enough for the UI and validator, but clear enough for a domain expert to review.

Agentic proposal loop Weak candidates can be refined before the assistant returns a structured suggestion for review.


                          flowchart TD
    start[User question or proposal run] --> scope[Define coding task]
    scope --> gather[Gather multi-source case context]
    gather --> extract[Extract tariff-relevant hints]
    extract --> map[Map hints to enabled skills]
    map --> search[Search tariff and rule space]
    search --> candidates[Generate service candidates]
    candidates --> evidence[Attach evidence and assumptions]
    evidence --> score[Score plausibility and confidence]

    score --> decision{Enough evidence?}
    decision -->|No| refine[Refine query or context]
    refine --> search

    decision -->|Yes| output[Structured suggestion]
    output --> user[User review]
    user --> validation[Deterministic validation]

Suggestion contract The assistant returns a machine-readable object that the UI can render and the validator can process.

{
  "candidateService": {
    "code": "TARIFF_CODE",
    "label": "Human-readable service name"
  },
  "confidence": 0.82,
  "evidence": [
    {
      "sourceType": "report | note | appointment | diagnosis | service-session",
      "summary": "Why this source supports the suggestion"
    }
  ],
  "assumptions": ["Context that should be checked by the user"],
  "missingInformation": ["Information needed before safe acceptance"],
  "recommendedAction": "accept | review | reject"
}

Structured field	Purpose
candidateService	Code and human-readable label for the proposed service
confidence	Plausibility signal for review and prioritization
evidence	Source summaries explaining why the candidate is supported
assumptions	Context the user should explicitly check
missingInformation	Open information needed before safe acceptance
recommendedAction	Accept, review, or reject recommendation for the UI

Context assembly collects relevant information from case documentation, appointments, diagnoses, and existing service records.
The skill layer narrows the task into the right tariff, language, and domain-specific search space.
The agentic loop searches, proposes, checks evidence, refines weak candidates, and returns structured output.
The review UI shows the suggestion, confidence, evidence, and assumptions before a user takes action.
The validation boundary checks accepted suggestions with deterministic business and tariff rules before recording a service.

What I built

I built a prototype for a Medical Coding Copilot that combines two interaction modes: free-text questions about a case, tariff logic, or possible services, and an agentic proposal loop that actively searches for billable-service candidates.

The assistant works over a shared multi-case context that can include reports, progress notes, diagnoses, appointments, and existing service sessions. On top of that context, skills can be enabled for specific tariff or knowledge domains, such as TARDOC or future language- and tenant-specific rule spaces.

Each suggestion is treated as a reviewable object rather than a loose text answer.

What service might be relevant?
Which clinical hints support it?
Where does the evidence come from?
What assumptions were made?
How confident is the assistant?
What validation or human review is still required?

Why validation matters

Medical coding is not a place for opaque automation. A plausible suggestion still needs to survive domain rules, tariff constraints, and professional review.

That separation makes the assistant safer, easier to debug, and easier to explain to non-technical stakeholders.

The assistant finds and explains possible services.
The user decides whether the suggestion makes sense.
The validator checks technical and tariff-level correctness.
The product keeps the evidence chain visible.

Potential impact

The prototype targets both efficiency and completeness.

On the efficiency side, it reduces the manual effort of searching, cross-reading, and translating clinical context into billing logic. On the revenue side, more complete service recording can help teams identify plausible services earlier and more consistently.

A future production version could run suggestions proactively, but only as auditable drafts with visible evidence and deterministic validation.

Time to record a service
Suggestion acceptance rate
Rejection and correction rate
Services identified after initial documentation
Validation errors after one-click creation
Gap between documented work and recorded services

Scalability

The core workflow is reusable across clinical environments because documentation is distributed, coding expertise is specialized, and service recording has to stay correct and explainable. The skill-based architecture lets tariff logic, language, tenant configuration, and domain-specific search vary without changing the basic assistant workflow.

The main limitation is documentation quality: incomplete or vague clinical notes reduce the reliability of suggestions. That is also a useful product signal, because weak suggestions can expose documentation gaps that teams may want to fix upstream.

What this shows about my work

This project was interesting because the value was not in making an AI write text. The value was in shaping a workflow around a high-friction decision: what should be recorded, why, and how can the system make that decision easier to review?

I turn ambiguous domain problems into concrete product flows.
I design AI features around context, evidence, validation, and user control.
I connect technical feasibility with UX and business value.
I build prototypes that are useful beyond a demo and can inform a production implementation.

Result

The Medical Coding Copilot prototype showed a practical path for connecting clinical documentation with billable service recording. Instead of pushing blind automation into a sensitive workflow, it used AI to surface evidence, propose actions, and keep validation deterministic.

The key idea is simple: if relevant evidence already exists in the documentation, the product should help users find it, evaluate it, and turn it into a validated service without making them manually reconstruct the whole patient journey every time.

Key metrics