This guide explains how to ingest documents into DataBridge using the Python SDK. DataBridge supports ingesting both text content and files (like PDFs, Word documents, etc.).
When you ingest a document, the response includes several important fields:
doc = db.ingest_file("document.pdf", "document.pdf")# Document identifierprint(f"ID: {doc.external_id}")# Storage informationprint(f"Storage Info: {doc.storage_info}")# Contains:# - storage_type: Where the document is stored (e.g., "s3", "local")# - bucket: Storage bucket name# - path: Path within storage# - size: Document size in bytes# System metadataprint(f"System Metadata: {doc.system_metadata}")# Contains:# - created_at: Document creation timestamp# - updated_at: Last modification timestamp# - chunk_count: Number of chunks generated# - embedding_model: Model used for embeddings# - processing_status: Current status# Access controlprint(f"Access Control: {doc.access_control}")# Contains:# - readers: List of entities that can read the document# - writers: List of entities that can modify the document# - admins: List of entities that can manage the document