DataBridge uses a two-stage ranking process for optimal search results:
Vector Similarity (First Stage)
Query is converted to an embedding vector
Cosine similarity is computed with document chunks
Initial ranking based on vector similarity
Fast but may miss some semantic nuances
Neural Reranking (Second Stage, Optional)
Top results from vector search are reranked
Uses a specialized neural model for scoring
More accurate but computationally intensive
Can be enabled with use_reranking=True
Similarity Scores
Similarity scores indicate how well a chunk or document matches your query:
Score Range: 0.0 to 1.0
1.0: Perfect match
0.0: No similarity
Typical Score Ranges:
0.9 - 1.0: Near-exact semantic match
0.8 - 0.9: Very strong semantic similarity
0.7 - 0.8: Strong semantic similarity
0.6 - 0.7: Moderate semantic similarity
< 0.6: Weak semantic similarity
Example of using similarity scores:
# High-confidence results onlychunks = db.retrieve_chunks( query="quantum computing applications", min_score=0.8, # Only very strong matches use_reranking=True# Enable reranking for accuracy)# Group results by confidencedefgroup_by_confidence(chunks): groups ={"very_high": [],# 0.9 - 1.0"high": [],# 0.8 - 0.9"medium": [],# 0.7 - 0.8"low": [] # < 0.7}for chunk in chunks:if chunk.score >=0.9: groups["very_high"].append(chunk)elif chunk.score >=0.8: groups["high"].append(chunk)elif chunk.score >=0.7: groups["medium"].append(chunk)else: groups["low"].append(chunk)return groupsresults =group_by_confidence(chunks)
Document Content Types
The DocumentContent type represents either a URL or direct content string:
from databridge.models import DocumentContent# URL content type (for large documents)url_content =DocumentContent( type="url", value="https://example.com/document.pdf", filename="document.pdf"# Required for URLs)# String content type (for small documents)text_content =DocumentContent( type="string", value="Document text content..."# filename not allowed for string type)
When retrieving documents, the content field will be one of these types: