Retrieval Technology

Induced-Fit Retrieval

Retrieval that thinks while it searches.

+0.0%

R@5 vs best RAG

0.0M

articles tested

0ms

beam traversal

$0

per query

p = 0.0002 · Zero training · Any embedding model

The Problem

RAG treats every query as a single-shot lookup

Retrieve-Augmented Generation embeds a query, finds the nearest vectors, and returns results. This works when the answer lives in one place.

It fails when the answer requires connecting information across multiple documents — the kind of reasoning humans do effortlessly. Multi-hop questions are not edge cases.

In enterprise knowledge bases, legal discovery, medical research, and technical estimation, the most valuable answers connect disparate facts. RAG cannot follow these chains because it commits to a fixed query before it knows what it needs to find.

QueryVector StoreSingle-shot failsCityActressFilmAward4-hop pathIFR discovers

How It Works

The Induced Fit Principle

In 1958, Daniel Koshland proposed that enzymes change shape upon contact with their substrate. IFR applies the same principle to information retrieval.

Lock and Key

Traditional RAG

The query is a fixed key. The system finds the best static match. If the answer is semantically distant, retrieval fails.

Induced Fit

IFR by Celestix

The query adapts as it encounters information. After visiting each node, the query shifts toward discovered concepts while retaining its original intent through an anchoring mechanism.

Architecture

Three Stages of Retrieval

01

Entry

The system receives a query, generates its embedding, and identifies an entry point using HNSW indexing. Microseconds regardless of graph size.

02

Traversal with Induced Fit

CORE INNOVATION

A traversal agent walks the knowledge graph via beam search. At each node, the query representation mutates — adapting to discovered context while anchoring to original intent. An energy budget governs exploration: the agent gains energy on relevant nodes and loses it otherwise, naturally terminating when returns diminish.

03

Fusion and Ranking

Graph traversal candidates merge with standard nearest-neighbor results via reciprocal rank fusion. An optional cross-encoder re-ranking stage produces the final result set.

Benchmarks

Validated on 5.2 Million Wikipedia Articles

HotpotQA fullwiki · 500 queries · Bootstrap B=10,000

MethodR@5R@10R@20MRRLatency
RAG top-5 (cosine)29.0%29.0%29.0%0.4430.3ms
RAG-k20 (cosine)29.0%32.0%34.5%0.3ms
RAG + Cross-Encoder33.7%33.7%33.7%0.54816.9ms
IFR Beam 100hCelestix30.9%35.5%37.6%0.4759.7ms
IFR Hybrid+CE 100hCelestix36.6%36.6%36.6%0.55424.1ms

+0.0%

Overall R@5

36.6% vs 33.7%, p=0.0002

+0.0%

Multi-hop R@5

51.0% vs 46.5%, p<0.001

+0.0%

Traversal Reach R@10

35.5% vs 29.0%, p<0.0001

HotpotQA fullwiki, full English Wikipedia (5,233,329 articles). 500 questions (400 bridge + 100 comparison). 21.2M edges. Bootstrap B=10,000.

Scaling

IFR's Advantage Grows with Data

CorpusDocumentsIFR-beam vs RAG-k5Hybrid+CE vs RAG-rerankp-value
FCIS722−3.7%+1.4%n/s
MuSiQue21,000−6.3%−0.4%n/s
HotpotQA66,000+1.1%+3.0%<0.001
HotpotQA508,000+2.2%+4.5%<0.001
HotpotQA (full)5,233,329+1.9%+2.9%=0.0002

Hybrid+CE Advantage by Corpus Size

722
+1.4%
21K
−0.4%
66K
+3.0%
508K
+4.5%
5.2M
+2.9%

IFR has no advantage at <20K documents. At 66K+ it becomes statistically significant. The larger your knowledge base, the more IFR matters.

Resilience

17.5M Noisy Edges. Zero Degradation.

Without Hyperlinks

3.7M edges

R@5 36.5%p = 0.0006

With Hyperlinks

Improved

21.2M edges (+17.5M noisy links)

R@5 36.6%p = 0.0002

Pour in millions of messy documents with dirty links. IFR doesn't break. The induced-fit mechanism naturally filters noise during traversal.

Landscape

How IFR Compares

CategoryPlayersMethodMulti-hopCost
Tech GiantsGoogle, Meta, Microsoft, AppleRAG + cross-encoder + long contextNo (expand window to 2M)$$$$ GPU clusters
Vector DBPinecone, Weaviate, Qdrant, MilvusHNSW / ScaNN / DiskANNNo (single-shot k-NN)$50–500/mo
RAG FrameworksLangChain, LlamaIndex, HaystackRAG orchestration + rerankIterative (LLM per step)$$ API per query
Agent MemoryZep, Mem0, LettaTemporal knowledge graphBFS (1–2 hops, no mutation)$$–$$$ SaaS
Graph RAGMicrosoft GraphRAG, LightRAGKG + community detection + BFSPartial (no query mutation)$$ LLM for graph build
LLM RewritingSearch-o1, ICR, R1-SearcherLLM rewrites query → searchPartial (500ms–2s/query)$$$ LLM call/query
IFRCelestix AIGraph traversal + induced fitYes · 100 hops · 0 LLM calls$0 locally

IFR is the only system with mathematical multi-hop query adaptation, zero LLM calls, at sub-10ms latency.

Properties

What Makes IFR Different

Sub-linear Scaling

O(log N) query latency. 52,000× data growth = 3.9× latency growth. Effectively constant for practical corpus sizes.

Zero-shot

No fine-tuning, no training data, no domain adaptation. Point IFR at any pre-embedded corpus and query immediately.

Model-agnostic

Works with any embedding model. Operates on geometric structure, not specific model internals. Swap providers without rebuilding.

Complementary

IFR augments RAG, not replaces it. Fusion architecture combines graph-traversal discoveries with standard k-NN, outperforming either alone.

Deterministic

Every retrieval produces a traversal path — a sequence of nodes and edges explaining how each result was found. Full auditability.

Noise-resilient

+17.5M noisy edges caused zero degradation. Induced fit + beam search naturally filter irrelevant connections during traversal.

Specifications

Technical Details

ParameterValue
Embedding dimensions128D (PCA-compressed from 384D)
Graph typeHierarchical, multi-edge (semantic + cross-reference)
Traversal methodEnergy-bounded beam search with induced fit
Query latency (5.2M nodes)~10ms beam, ~31ms with cross-encoder
Index build timeLinear in corpus size
RAM (Level 0, all nodes)48 bytes per atom (~250 MB for 5M atoms)
Supported corpus sizesTested to 5.2M; designed for 1B+
Cross-encoderOptional; any BERT-class re-ranker
Embedding modelAny (model-agnostic)

Capacity Model

RAM BudgetSearchable AtomsWarm CacheActive (Full Content)
1 GB3.9M500K10–50K
8 GB31M4M80–400K
64 GB250M32M640K–3.2M

Use Cases

Where IFR Applies

Enterprise Knowledge Management

PoliciesProceduresSpecificationsStandards

Corporate knowledge is inherently multi-hop.

Legal Discovery

StatutesInterpretationsRulingsAmendments

IFR follows citation chains the way researchers think.

Medical Research

DrugsPathwaysInteractionsOutcomes

Connecting papers, databases, and clinical records.

Construction Estimating

Labor ratesMaterialsGeographic factorsRegulations

Where IFR was born. Connecting RSMeans, Davis-Bacon, and contracts.

Customer Support

DocumentationKnown issuesConfig guidesPast resolutions

Surface the complete picture, not just the closest FAQ.

Intellectual Property

Provisional patent application filed covering:

· Adaptive query mutation during graph traversal

· Energy-bounded beam search with relevance and novelty criteria

· Hierarchical 4-level compressed knowledge graph storage

· Multi-phase edge decay for temporal knowledge management

· Trail persistence for learning from traversal history

The IFR architecture, algorithms, and implementation are proprietary technology of Celestix AI.

Built by Celestix AI

Retrieval that doesn't just search — it reasons.

celestix.ai · Patent pending