DodoForm Joins NVIDIA Inception to Accelerate AI-Native Form Intelligence

We're in.

DodoForm has been accepted into the NVIDIA Inception Program — NVIDIA's exclusive accelerator for AI startups transforming industries with GPU-accelerated computing and deep learning.

This is a milestone for our team and our users. Here's what it means and why it matters for the future of AI-native form building.

What is the NVIDIA Inception Program?

NVIDIA Inception is the world's premier AI startup program. It provides early-stage and growth-stage AI companies with:

GPU hardware grants and discounts — Access to NVIDIA's latest datacenter GPUs (H100, B200, and beyond) at startup-friendly pricing

Deep learning expertise — Direct technical support from NVIDIA's AI research and engineering teams

NVIDIA software stack — Early access to CUDA, TensorRT, Triton Inference Server, NVIDIA NIM microservices, and the full AI enterprise toolkit

Go-to-market support — Co-marketing opportunities, VC introductions, and NVIDIA Deepstack partner network access

Community — A global network of 15,000+ AI startups sharing breakthroughs and best practices

Inception is not open to everyone. Startups go through a rigorous vetting process. NVIDIA evaluates your AI architecture, product-market fit, technical depth, and long-term vision. DodoForm was accepted because our AI pipeline — voice transcription, image OCR, NLP entity extraction, and constrained SQL generation — is genuinely GPU-intensive and pushing the frontier of what's possible with form data.

Why this matters for DodoForm

Faster voice-to-form

Voice input is DodoForm's flagship feature. Respondents speak their answers, and our AI extracts structured data in real time — names, dates, budgets, locations, urgency signals. This requires:

Large-vocabulary speech recognition — Whisper-large, fine-tuned for form contexts
Real-time NLP entity extraction — BERT-based NER with custom schema awareness
Confidence scoring — Bayesian model that flags low-certainty extractions for human review

With NVIDIA Inception, we gain access to TensorRT-optimized inference and NVIDIA NIM microservices, which means:

2-3x faster transcription — GPU-accelerated Whisper models process voice input in near-real-time, even on mobile

Lower latency for live forms — Real-time entity extraction as the respondent speaks, not after

Fine-tuning at scale — Retrain models on domain-specific form data faster with multi-GPU clusters

The result: respondents speak, and fields populate before they finish their sentence.

Better image extraction

DodoForm reads photos, screenshots, and handwritten notes — extracting text, numbers, dates, and structured context. This combines:

OCR — Printed and handwritten text extraction
Document understanding — Layout-aware parsing that knows a receipt has different zones than a business card
Vision-language models — Understanding what an image means, not just what it says

NVIDIA's GPU infrastructure lets us:

Run vision-language models (like NVLM and LLaVA) at production scale with TensorRT optimization

Process higher-resolution images without timeout constraints — every pixel matters for handwritten notes

Fine-tune on industry-specific document layouts — healthcare forms, legal contracts, real estate appraisals

A real estate agent photographs a hand-filled buyer intake sheet. DodoForm reads the handwriting, extracts the name, budget, timeline, and property type, and populates a structured CRM record — in under 2 seconds.

Conversational BI at production scale

DodoForm's "Ask Anything" feature lets you type a question in plain English — "Which enterprise leads mentioned pricing as a concern?" — and get an exact answer backed by constrained SQL.

This requires:

Question-to-SQL generation — LLM-powered query planning with schema awareness
Read-only Postgres execution — Security-hardened, role-constrained SQL execution
Result-to-narrative generation — LLM turns query results into human-readable answers with charts

With NVIDIA's inference optimization:

Sub-second answers — Even on datasets with hundreds of thousands of submissions

Concurrent query handling — Multiple team members asking questions simultaneously without queue delays

More complex queries — Multi-table joins, aggregations, and time-series analysis that previously timed out

Multimodal form intelligence

The biggest opportunity is multimodal — combining voice, images, and text in a single AI pipeline:

A field inspector takes a photo of a damaged pipe, records a voice note describing the location and severity, and the form auto-fills:

Damage type: Corrosion (from image analysis)

Location: Basement, pipe junction B-47 (from voice)

Severity: High (from image + voice sentiment)

Recommended action: Schedule replacement within 30 days

This kind of multimodal AI fusion is exactly what NVIDIA's accelerated computing platform is designed for. Inception gives us the tools to build it at production scale.

What changes for DodoForm users

Today — nothing breaks. Your forms, submissions, analytics, and integrations work exactly as they do now. Inception is an infrastructure and R&D investment, not a product change.

Next 3 months — faster and more accurate:

Voice transcription gets faster (lower latency on live forms)

Image extraction handles more complex documents

Conversational BI answers more complex questions

New language support for voice input (expanding beyond English)

Next 6 months — new capabilities:

Real-time voice transcription with live field population (speak and watch fields fill)

Multi-image extraction (upload 5 photos, AI combines them into one structured record)

Video question support (AI reads a short video clip and extracts structured data)

On-premises GPU deployment for enterprise security requirements

Our AI infrastructure, before and after

Component	Before Inception	With Inception
Voice transcription	CPU Whisper-medium	TensorRT Whisper-large on GPU
Image OCR	CPU Tesseract + small VLM	TensorRT NVLM on GPU
Entity extraction	CPU BERT-base	TensorRT BERT-large on GPU
Conversational BI	CPU LLM (quantized)	TensorRT LLM (full precision)
Inference latency	2-5 seconds	Under 1 second
Max concurrent requests	~50	~500+
Model retraining	Weekly (limited data)	Daily (full dataset, multi-GPU)

Why we applied (and why NVIDIA said yes)

We applied to Inception because DodoForm's AI workloads — speech recognition, vision-language understanding, and constrained LLM generation — are fundamentally GPU-accelerated tasks running on CPU infrastructure. We were leaving performance on the table.

NVIDIA accepted us because:

Our AI pipeline is real, not marketing — Voice transcription, image OCR, and constrained SQL generation are live in production, used by thousands of respondents daily

The architecture is GPU-native — Our models are transformer-based and will benefit immediately from TensorRT optimization

The market is underserved — Form data capture is a $5B+ market where AI penetration is under 5%. DodoForm is the only player combining voice, vision, and NLP in a single form platform

Security is architecturally sound — Row-level security, read-only SQL roles, and HMAC-signed webhooks meet enterprise requirements

What we're building toward

The endgame is an autonomous form engine — a system that doesn't just collect and structure data, but acts on it:

Collect — Voice, photos, text, files, payments
Structure — AI extracts entities, classifies intent, scores urgency
Analyze — Conversational BI surfaces insights without SQL
Act — AI drafts follow-up emails, updates CRMs, schedules meetings, triggers workflows

Steps 1-3 are live today. Step 4 is the Inception-accelerated roadmap. With NVIDIA's GPU infrastructure and AI expertise, we'll get there faster than we could alone.

Thank you

To our users — thank you for trusting DodoForm with your data capture. Your feedback, feature requests, and real-world usage data directly shaped the AI pipeline that NVIDIA recognized.

To the NVIDIA Inception team — thank you for the vote of confidence. We're excited to push form intelligence into territory that wasn't possible without GPU-accelerated AI.

And to everyone filling a DodoForm on their phone right now, speaking their answer instead of typing it — you're the reason we exist. We're going to make that experience even faster and even smarter.