Document Ingestion
This guide explains how to ingest documents into DataBridge using the Python SDK. DataBridge supports ingesting both text content and files (like PDFs, Word documents, etc.).
Installation
First, install the DataBridge Python client:
pip install databridge-clientBasic Setup
Initialize the DataBridge client:
from databridge import DataBridge, AsyncDataBridge
# Synchronous client
db = DataBridge("your-uri")
# Asynchronous client
async_db = AsyncDataBridge("your-uri")Text Ingestion
You can ingest text content directly:
# Synchronous ingestion
doc = db.ingest_text(
content="Machine learning is fascinating...",
metadata={
"title": "ML Introduction",
"category": "tech",
"author": "John Doe"
}
)
print(f"Document ID: {doc.external_id}")
# Asynchronous ingestion
async with AsyncDataBridge("your-uri") as db:
doc = await db.ingest_text(
content="Machine learning is fascinating...",
metadata={
"title": "ML Introduction",
"category": "tech",
"author": "John Doe"
}
)
print(f"Document ID: {doc.external_id}")What Happens During Text Ingestion?
The text content is processed and split into semantic chunks
Each chunk is embedded using state-of-the-art language models
The embeddings are stored in a vector database for efficient semantic search
Document metadata and content are stored for retrieval
Document Ingestion
For files like PDFs, Word documents, or other supported formats:
Document Processing Pipeline
When you ingest a document:
The file is uploaded and processed based on its content type
For PDFs and other text-based documents:
Text is extracted while preserving structure
Content is split into meaningful chunks
Each chunk is embedded for semantic search
Metadata and content are stored for retrieval
Document chunks are indexed for efficient searching
Verifying Ingestion
You can verify your ingested documents:
Document Model
When you ingest a document, the response includes several important fields:
Processing Rules
You can apply processing rules during ingestion to transform content or extract metadata:
For detailed information about rules, see the Rules Guide.
Next Steps
After ingesting documents, you can:
Use filters to organize and retrieve specific document sets
Last updated