DodoForm Joins NVIDIA Inception to Accelerate AI-Native Form Intelligence
DodoForm has been accepted into the NVIDIA Inception Program, gaining access to cutting-edge GPU infrastructure, AI tooling, and technical expertise to push the boundaries of voice-to-form, image extraction, and conversational BI.

We're in.
DodoForm has been accepted into the NVIDIA Inception Program — NVIDIA's exclusive accelerator for AI startups transforming industries with GPU-accelerated computing and deep learning.
This is a milestone for our team and our users. Here's what it means and why it matters for the future of AI-native form building.
What is the NVIDIA Inception Program?
NVIDIA Inception is the world's premier AI startup program. It provides early-stage and growth-stage AI companies with:
Inception is not open to everyone. Startups go through a rigorous vetting process. NVIDIA evaluates your AI architecture, product-market fit, technical depth, and long-term vision. DodoForm was accepted because our AI pipeline — voice transcription, image OCR, NLP entity extraction, and constrained SQL generation — is genuinely GPU-intensive and pushing the frontier of what's possible with form data.
Why this matters for DodoForm
Faster voice-to-form
Voice input is DodoForm's flagship feature. Respondents speak their answers, and our AI extracts structured data in real time — names, dates, budgets, locations, urgency signals. This requires:
- Large-vocabulary speech recognition — Whisper-large, fine-tuned for form contexts
- Real-time NLP entity extraction — BERT-based NER with custom schema awareness
- Confidence scoring — Bayesian model that flags low-certainty extractions for human review
With NVIDIA Inception, we gain access to TensorRT-optimized inference and NVIDIA NIM microservices, which means:
The result: respondents speak, and fields populate before they finish their sentence.
Better image extraction
DodoForm reads photos, screenshots, and handwritten notes — extracting text, numbers, dates, and structured context. This combines:
- OCR — Printed and handwritten text extraction
- Document understanding — Layout-aware parsing that knows a receipt has different zones than a business card
- Vision-language models — Understanding what an image means, not just what it says
NVIDIA's GPU infrastructure lets us:
A real estate agent photographs a hand-filled buyer intake sheet. DodoForm reads the handwriting, extracts the name, budget, timeline, and property type, and populates a structured CRM record — in under 2 seconds.
Conversational BI at production scale
DodoForm's "Ask Anything" feature lets you type a question in plain English — "Which enterprise leads mentioned pricing as a concern?" — and get an exact answer backed by constrained SQL.
This requires:
- Question-to-SQL generation — LLM-powered query planning with schema awareness
- Read-only Postgres execution — Security-hardened, role-constrained SQL execution
- Result-to-narrative generation — LLM turns query results into human-readable answers with charts
With NVIDIA's inference optimization:
Multimodal form intelligence
The biggest opportunity is multimodal — combining voice, images, and text in a single AI pipeline:
A field inspector takes a photo of a damaged pipe, records a voice note describing the location and severity, and the form auto-fills:
This kind of multimodal AI fusion is exactly what NVIDIA's accelerated computing platform is designed for. Inception gives us the tools to build it at production scale.
What changes for DodoForm users
Today — nothing breaks. Your forms, submissions, analytics, and integrations work exactly as they do now. Inception is an infrastructure and R&D investment, not a product change.
Next 3 months — faster and more accurate:
Next 6 months — new capabilities:
Our AI infrastructure, before and after
| Component | Before Inception | With Inception |
| Voice transcription | CPU Whisper-medium | TensorRT Whisper-large on GPU |
| Image OCR | CPU Tesseract + small VLM | TensorRT NVLM on GPU |
| Entity extraction | CPU BERT-base | TensorRT BERT-large on GPU |
| Conversational BI | CPU LLM (quantized) | TensorRT LLM (full precision) |
| Inference latency | 2-5 seconds | Under 1 second |
| Max concurrent requests | ~50 | ~500+ |
| Model retraining | Weekly (limited data) | Daily (full dataset, multi-GPU) |
Why we applied (and why NVIDIA said yes)
We applied to Inception because DodoForm's AI workloads — speech recognition, vision-language understanding, and constrained LLM generation — are fundamentally GPU-accelerated tasks running on CPU infrastructure. We were leaving performance on the table.
NVIDIA accepted us because:
What we're building toward
The endgame is an autonomous form engine — a system that doesn't just collect and structure data, but acts on it:
- Collect — Voice, photos, text, files, payments
- Structure — AI extracts entities, classifies intent, scores urgency
- Analyze — Conversational BI surfaces insights without SQL
- Act — AI drafts follow-up emails, updates CRMs, schedules meetings, triggers workflows
Steps 1-3 are live today. Step 4 is the Inception-accelerated roadmap. With NVIDIA's GPU infrastructure and AI expertise, we'll get there faster than we could alone.
Thank you
To our users — thank you for trusting DodoForm with your data capture. Your feedback, feature requests, and real-world usage data directly shaped the AI pipeline that NVIDIA recognized.
To the NVIDIA Inception team — thank you for the vote of confidence. We're excited to push form intelligence into territory that wasn't possible without GPU-accelerated AI.
And to everyone filling a DodoForm on their phone right now, speaking their answer instead of typing it — you're the reason we exist. We're going to make that experience even faster and even smarter.