Skip to content
Back to blog
June 19, 20266 min readNVIDIA, AI, Inception Program, Company News

DodoForm Joins NVIDIA Inception to Accelerate AI-Native Form Intelligence

DodoForm has been accepted into the NVIDIA Inception Program, gaining access to cutting-edge GPU infrastructure, AI tooling, and technical expertise to push the boundaries of voice-to-form, image extraction, and conversational BI.

DodoForm Joins NVIDIA Inception to Accelerate AI-Native Form Intelligence

We're in.

DodoForm has been accepted into the NVIDIA Inception Program — NVIDIA's exclusive accelerator for AI startups transforming industries with GPU-accelerated computing and deep learning.

This is a milestone for our team and our users. Here's what it means and why it matters for the future of AI-native form building.

What is the NVIDIA Inception Program?

NVIDIA Inception is the world's premier AI startup program. It provides early-stage and growth-stage AI companies with:

  • GPU hardware grants and discounts — Access to NVIDIA's latest datacenter GPUs (H100, B200, and beyond) at startup-friendly pricing
  • Deep learning expertise — Direct technical support from NVIDIA's AI research and engineering teams
  • NVIDIA software stack — Early access to CUDA, TensorRT, Triton Inference Server, NVIDIA NIM microservices, and the full AI enterprise toolkit
  • Go-to-market support — Co-marketing opportunities, VC introductions, and NVIDIA Deepstack partner network access
  • Community — A global network of 15,000+ AI startups sharing breakthroughs and best practices
  • Inception is not open to everyone. Startups go through a rigorous vetting process. NVIDIA evaluates your AI architecture, product-market fit, technical depth, and long-term vision. DodoForm was accepted because our AI pipeline — voice transcription, image OCR, NLP entity extraction, and constrained SQL generation — is genuinely GPU-intensive and pushing the frontier of what's possible with form data.

    Why this matters for DodoForm

    Faster voice-to-form

    Voice input is DodoForm's flagship feature. Respondents speak their answers, and our AI extracts structured data in real time — names, dates, budgets, locations, urgency signals. This requires:

    1. Large-vocabulary speech recognition — Whisper-large, fine-tuned for form contexts
    2. Real-time NLP entity extraction — BERT-based NER with custom schema awareness
    3. Confidence scoring — Bayesian model that flags low-certainty extractions for human review

    With NVIDIA Inception, we gain access to TensorRT-optimized inference and NVIDIA NIM microservices, which means:

  • 2-3x faster transcription — GPU-accelerated Whisper models process voice input in near-real-time, even on mobile
  • Lower latency for live forms — Real-time entity extraction as the respondent speaks, not after
  • Fine-tuning at scale — Retrain models on domain-specific form data faster with multi-GPU clusters
  • The result: respondents speak, and fields populate before they finish their sentence.

    Better image extraction

    DodoForm reads photos, screenshots, and handwritten notes — extracting text, numbers, dates, and structured context. This combines:

    1. OCR — Printed and handwritten text extraction
    2. Document understanding — Layout-aware parsing that knows a receipt has different zones than a business card
    3. Vision-language models — Understanding what an image means, not just what it says

    NVIDIA's GPU infrastructure lets us:

  • Run vision-language models (like NVLM and LLaVA) at production scale with TensorRT optimization
  • Process higher-resolution images without timeout constraints — every pixel matters for handwritten notes
  • Fine-tune on industry-specific document layouts — healthcare forms, legal contracts, real estate appraisals
  • A real estate agent photographs a hand-filled buyer intake sheet. DodoForm reads the handwriting, extracts the name, budget, timeline, and property type, and populates a structured CRM record — in under 2 seconds.

    Conversational BI at production scale

    DodoForm's "Ask Anything" feature lets you type a question in plain English — "Which enterprise leads mentioned pricing as a concern?" — and get an exact answer backed by constrained SQL.

    This requires:

    1. Question-to-SQL generation — LLM-powered query planning with schema awareness
    2. Read-only Postgres execution — Security-hardened, role-constrained SQL execution
    3. Result-to-narrative generation — LLM turns query results into human-readable answers with charts

    With NVIDIA's inference optimization:

  • Sub-second answers — Even on datasets with hundreds of thousands of submissions
  • Concurrent query handling — Multiple team members asking questions simultaneously without queue delays
  • More complex queries — Multi-table joins, aggregations, and time-series analysis that previously timed out
  • Multimodal form intelligence

    The biggest opportunity is multimodal — combining voice, images, and text in a single AI pipeline:

    A field inspector takes a photo of a damaged pipe, records a voice note describing the location and severity, and the form auto-fills:

  • Damage type: Corrosion (from image analysis)
  • Location: Basement, pipe junction B-47 (from voice)
  • Severity: High (from image + voice sentiment)
  • Recommended action: Schedule replacement within 30 days
  • This kind of multimodal AI fusion is exactly what NVIDIA's accelerated computing platform is designed for. Inception gives us the tools to build it at production scale.

    What changes for DodoForm users

    Today — nothing breaks. Your forms, submissions, analytics, and integrations work exactly as they do now. Inception is an infrastructure and R&D investment, not a product change.

    Next 3 months — faster and more accurate:

  • Voice transcription gets faster (lower latency on live forms)
  • Image extraction handles more complex documents
  • Conversational BI answers more complex questions
  • New language support for voice input (expanding beyond English)
  • Next 6 months — new capabilities:

  • Real-time voice transcription with live field population (speak and watch fields fill)
  • Multi-image extraction (upload 5 photos, AI combines them into one structured record)
  • Video question support (AI reads a short video clip and extracts structured data)
  • On-premises GPU deployment for enterprise security requirements
  • Our AI infrastructure, before and after

    ComponentBefore InceptionWith Inception
    Voice transcriptionCPU Whisper-mediumTensorRT Whisper-large on GPU
    Image OCRCPU Tesseract + small VLMTensorRT NVLM on GPU
    Entity extractionCPU BERT-baseTensorRT BERT-large on GPU
    Conversational BICPU LLM (quantized)TensorRT LLM (full precision)
    Inference latency2-5 secondsUnder 1 second
    Max concurrent requests~50~500+
    Model retrainingWeekly (limited data)Daily (full dataset, multi-GPU)

    Why we applied (and why NVIDIA said yes)

    We applied to Inception because DodoForm's AI workloads — speech recognition, vision-language understanding, and constrained LLM generation — are fundamentally GPU-accelerated tasks running on CPU infrastructure. We were leaving performance on the table.

    NVIDIA accepted us because:

  • Our AI pipeline is real, not marketing — Voice transcription, image OCR, and constrained SQL generation are live in production, used by thousands of respondents daily
  • The architecture is GPU-native — Our models are transformer-based and will benefit immediately from TensorRT optimization
  • The market is underserved — Form data capture is a $5B+ market where AI penetration is under 5%. DodoForm is the only player combining voice, vision, and NLP in a single form platform
  • Security is architecturally sound — Row-level security, read-only SQL roles, and HMAC-signed webhooks meet enterprise requirements
  • What we're building toward

    The endgame is an autonomous form engine — a system that doesn't just collect and structure data, but acts on it:

    1. Collect — Voice, photos, text, files, payments
    2. Structure — AI extracts entities, classifies intent, scores urgency
    3. Analyze — Conversational BI surfaces insights without SQL
    4. Act — AI drafts follow-up emails, updates CRMs, schedules meetings, triggers workflows

    Steps 1-3 are live today. Step 4 is the Inception-accelerated roadmap. With NVIDIA's GPU infrastructure and AI expertise, we'll get there faster than we could alone.

    Thank you

    To our users — thank you for trusting DodoForm with your data capture. Your feedback, feature requests, and real-world usage data directly shaped the AI pipeline that NVIDIA recognized.

    To the NVIDIA Inception team — thank you for the vote of confidence. We're excited to push form intelligence into territory that wasn't possible without GPU-accelerated AI.

    And to everyone filling a DodoForm on their phone right now, speaking their answer instead of typing it — you're the reason we exist. We're going to make that experience even faster and even smarter.

    Related articles