← Back to work
01 AI Consultant POC complete · progressing to production

InfraInspect

AI voice inspection pipeline for water infrastructure — from 3D desk-based damage planning to hands-free field capture, turning spoken observations into structured, validated damage records.

LangGraph Voice AI Whisper STT GPT-4o 3D Model Integration Async UX GDPR / DPIA EU AI Act Govtech
InfraInspect 3D workspace — damage markers and audio capture panel
Client HydroMapper / InfraCloud · Germany
Type SaaS · Digital inspection of water infrastructure
Role AI Consultant · End-to-end ownership
Timeline 2026

A platform that worked. A workflow that didn't.

HydroMapper's InfraCloud platform centralises inspection data, 3D context, and structured damage records for water infrastructure: quay walls, harbours, bridges, hydraulic structures, and underwater assets. The platform worked. The field documentation workflow didn't match the realities of high-volume on-site inspection.

InfraCloud is web-based, not optimised for mobile, and requires substantial manual input for every damage record. In practice: one person inspects, another enters findings manually. Under real field conditions — bad weather, generator noise, poor connectivity, underwater work — the process becomes slow, error-prone, and fragile.

The core challenge was not identifying damages. It was capturing structured inspection data faster, more reliably, and with less manual effort — across two very different working contexts: the office, where damage suspicions are planned and reviewed against 3D models, and the field, where the same person is standing in front of a quay wall in bad weather with no free hand. Bridging those two contexts without disrupting either shaped every decision in this project.

For inspectors

Zero waiting. Record, move, record, move. Processing happens later. The AI never blocks the inspection flow.

For the product

An AI pipeline that assists without blocking — and that can be evaluated honestly from the first production submission.


Five scoping decisions shaped everything that followed.

01

Phase 2 only — on-site capture, not pre-inspection office work

The most important decision was what not to build. The inspection lifecycle has two phases. InfraInspect targets only Phase 2. That kept the POC deliverable in weeks rather than months and proved value on the highest-impact part of the workflow before adding complexity.

02

Async-first over real-time field assistance

Inspectors work in sequences under time pressure. Any system that introduces waiting destroys the workflow. Capture happens in the field; processing happens when connectivity allows. The only real-time exception is the audio quality check — because a bad recording in the field can't be recovered from in the office.

03

Confidence is information, not a gate

Submit is never blocked by low confidence — only by missing required fields or invalid cross-field values. A low-confidence suggestion the inspector can see and correct is better than a blocked workflow that creates a different kind of rework. This was a deliberate product decision, not a UX default.

04

Success targets defined before the architecture

Three measurable outcomes were committed to before any pipeline was drawn: 30% reduction in documentation time, 85% extraction accuracy on core fields, and fewer than 2 corrections per record on average. The evaluation schema was designed alongside the data model, not added after.

What I cut

Direct InfraCloud API integration in iteration one. Dual operating modes. Offline-first storage. Underwater workflows. Multilingual support. Real-time field assistance. Each exclusion was a product decision backed by a reason, not a casualty of time pressure. The out-of-scope list was as deliberate as the in-scope list.

Compliance as a design input

I authored the DPIA and mapped the system against EU AI Act risk categories before the architecture was finalised. Transcript retention for auditability is confirmed. The final GDPR-compliant approach to audio retention and continuous learning is defined and under review with the client — treated as an open design decision with a known resolution path, not a risk to be addressed post-launch.

Built first
  • Async audio capture flow
  • 7-node LangGraph extraction pipeline
  • Per-field confidence + delta view
  • 3D model with phase-coded markers
  • Audit log + corrected_fields metric
  • DPIA + EU AI Act mapping
Cut
  • InfraCloud API integration (v1)
  • Offline-first storage
  • Underwater workflows
  • Multilingual support
  • Real-time field assistance
Deferred
  • Direct API integration (v2)
  • Dual operating modes
  • Continuous learning loop
  • Real audio calibration
  • Mobile-native capture app

Six phases. Four surfaces. One coherent state.

Every damage record moves through six pipeline phases. The complexity was making that state legible and consistent across four surfaces simultaneously: the 3D model markers, the progress UI in the dashboard, the review banner on the damage record, and the back-end job queue.

AI PIPELINE Field Capture Inspector speaks + photo evidence German field audio WAV · M4A · offline Offline-ready Speech-to-Text Whisper (German) Audio → transcript Punctuation-aware Regional accent support Field vs. background noise Whisper large-v3 IN INT EXT OUT Intent + Extract GPT-4o · LangGraph Classify intent first Extract 11 WSV fields Confidence score per field Contradiction detection VALIDATE REJECT CREATE LangGraph multi-node graph Catalog Validate BAW catalog — offline 246 WSV damage types Hierarchy enforcement No internet required Self-correction pass VV-WSV 2101 catalog Office Review Inspector confirms AI-extracted field values Hover: prev value + transcript source Inspector edits fields Corrections auto-logged confidence per field React · Node.js · Express Confirmed Record InfraCloud → WSVPrüf damage.status updated extraction_log written corrected_fields diff WSVPrüf API queued NeonDB EU WSVPrüf API Inspector submits + signs off VV-WSV 2101 §6 unchanged

Fig. 01 — Field capture → AI pipeline → human review → confirmed record

In the field, the inspector's flow is deliberately simple: pick a damage location, record a voice note, attach photos, move on. No waiting, no form-filling, no connectivity required. Processing happens in the background once signal allows.

Behind the scenes, the recording goes through a sequence of checks before anything reaches a reviewer: the audio quality is assessed first — a bad recording in the field can't be recovered from the office — then the speech is transcribed, the intent is classified, the relevant fields are extracted, and everything is validated against the official German water infrastructure damage catalogue before a human ever sees it.

The intent step matters more than it sounds. The system has to decide whether the inspector is confirming a suspected damage, rejecting one, or logging a new one entirely. Getting that wrong isn't a minor inconvenience — it would mean a dismissed damage appearing as confirmed in the record. So that classification is held to a higher standard than any other step in the pipeline.

When the office reviewer opens a record, they see the values the AI extracted from the field recording — each carrying a confidence signal: green for high confidence, amber for uncertain, red for a guess. To interrogate any value, the reviewer hovers over a field and immediately sees both the previous value that was already in the system, and the exact transcript excerpt the AI drew from to arrive at its suggestion. Both on a single hover, without navigating away. The reviewer can accept, edit, or revert any field individually. Nothing gets written to the database until a human signs off.

The 3D model isn't decorative. Every damage marker on the structure changes colour as the record moves through the pipeline — so at a glance, a reviewer knows what's been processed, what's still in progress, and what needs attention, without hunting through a list. Inspectors can tap directly on the model to select a location or log a new damage at the exact point on the structure where they found it.


POC complete. Production path confirmed.

30% Target reduction in documentation time per damage record
85% Extraction accuracy target on core fields pre-review
<2 Target field corrections per record post-extraction
≥98% Precision required on REJECT classification before production

Built & verified

  • Full capture → AI extraction → delta review → audit log loop built and verified across all test scenarios
  • Seven-node LangGraph extraction pipeline with intent classification, WSV catalogue validation, and structured audit log
  • Phase-tracked 3D model with real-time marker state across all six pipeline phases
  • Per-field confidence UI with colour-coded delta view and revert controls
  • Solution proposal accepted internally; InfraCloud API integration confirmed as next step
  • DPIA authored and EU AI Act risk mapping completed before architecture finalised

Deliberately not claimed

  • Success targets are from the spec, not production data. The evaluation framework exists and is ready. Real inspector recordings under field conditions are the constraint — not the architecture.
  • The POC used synthetic TTS-generated German audio. Real field performance under generator noise, wind, and distance from microphone is unknown until real data is collected.
  • InfraCloud API integration is confirmed, not shipped. The endpoint isn't available yet; integration is scoped and sequenced, not delivered.

"Elsa took a workflow that required two people and a lot of manual entry and designed and built the AI pipeline to replace it — complete with compliance documentation and a system that an inspector can actually use in the field."

TG
Tobias Grün Product Owner, HydroMapper / InfraCloud

What I'd do differently. What I underestimated. What transfers.

i.

Earlier real audio data. Iteration one deliberately chose the async office review model over real-time field assistance — the right call for scope and speed of value. But the pipeline is calibrated on synthetic TTS-generated German audio. Real field performance under actual inspection conditions is unknown. I would have pushed earlier for a small set of real audio samples before the architecture was finalised. The evaluation framework exists and is ready; data collection is the constraint I'd sequence differently.

ii.

Where I underestimated — state coherence across async boundaries. Six pipeline phases (open → evidence_added → extracting → ai_processed → reviewed → submitted) had to stay consistent across the 3D markers, the progress UI, the review banner, and the back-end job queue. Keeping that coherent was more product work than model work — the hidden cost of building real async AI UX. Getting the phase machine right was the hardest engineering and design problem in the project, and I'd have allocated it more time upfront if I could replay the scoping.

iii.

What transfers to the next project. The pattern — capture → AI draft → diff review → gated submission → structured audit, with phase-tracked UX and a spatial anchor where it fits — is reusable for any high-stakes workflow where a model drafts on behalf of an expert. Clinical notes, claims triage, KYC review, legal drafting. Same shape, different vocabulary. The structural decisions that matter are the same: what the AI proposes vs. what currently exists, who reviews and when, what the confidence signal means, and what the audit trail needs to contain. The domain changes; the pattern doesn't.