Case Studies

From 70% to 95%: The Accuracy Growth Journey

Real data from beta users showing how accuracy improves with knowledge base growth. The science behind calibration.

EP

Emily Park

Lead Engineer

Jan 30, 2026 6 min read

When a new Celestix user runs their first contract analysis, they get reasonable results — but not great ones. That's by design. Here's the journey from 70% to 95% accuracy, backed by real data from our 50-user beta program.

The Cold Start Problem

A fresh Celestix installation has no historical data specific to your business. It knows general market rates, RSMeans data, Davis-Bacon tables, and federal contracting norms. But it doesn't know YOUR overhead rate, YOUR preferred subcontractors, YOUR regional market dynamics, or YOUR competitive position.

On the first 10-20 contracts, expect accuracy around 65-70%. This isn't failure — it's the baseline. Every analysis is learning about your specific context.

The 50-Contract Milestone: ~70% Accuracy

By 50 contracts, the system has enough data to start making meaningful calibrations. The Calibrator agent (L4) begins adjusting each agent's tendency to over- or under-estimate. The Similarity Engine can find comparable past contracts in your knowledge base.

At this stage, the system knows basic patterns: your typical overhead rate, your common geographic areas, your frequent contract types. MAPE drops to around 12-15%, meaning most estimates are within 12-15% of actual award prices.

The 100-Contract Milestone: ~80% Accuracy

This is where things get interesting. With 100 contracts of feedback data, the Learning Overseer agent can identify systematic biases. Maybe your Cost Estimator agent consistently underprices electrical work by 8%. Maybe your Temporal agent overestimates winter premiums in your region.

The system automatically adjusts. Agent-specific bias corrections are applied. The hit rate (estimates within ±5% of actual) jumps to around 60-65%. MAPE drops to 8-10%.

The 200-Contract Milestone: ~90% Accuracy

At 200 contracts, you enter the precision zone. The Similarity Engine now has a rich library of comparable projects. When a new contract comes in, it can find 5-10 past contracts with similar scope, location, and agency — and use their outcomes to anchor the estimate.

The Evolution Engine starts suggesting agent improvements. Perhaps merging your top-performing Cost Estimator with your Geospatial agent creates a hybrid that outperforms both parents. Agent evolution is data-driven, not random.

Hit rate reaches 80-85%. MAPE drops below 6%. At this point, your AI team is competitive with senior human estimators who have decades of experience.

The 500-Contract Target: 95% Accuracy

Our target of 95% accuracy (MAPE < 3%, hit rate > 90%) becomes achievable with 500+ contracts in your knowledge base. At this scale, the system has seen enough variety to handle edge cases. Unusual contract types, rare geographic locations, new agencies — all become handleable because similar-enough precedents exist.

The Key Insight: Feedback Is Everything

The single most important thing you can do is enter actual award prices after bid results come in. Won or lost, the outcome data is what drives calibration. A system that analyzes 500 contracts but never receives feedback will plateau at ~75%. A system with 200 contracts and complete feedback will outperform it.

Every award price you enter makes every future estimate more accurate. That's the compound interest of AI-powered estimation.

Continue Reading

AI Technology

How 32 AI Agents Outperform a Single Estimator

Why a multi-agent debate system produces more accurate bid prices than any individual estimator — human or AI. The power of structured disagreement.

Feb 28, 2026

AI Technology

Understanding the 4-Round Consensus Process

A deep dive into how agents analyze, challenge, refine, and finalize bid prices through structured debate.

Feb 20, 2026