DataBridge Docs
  • DataBridge Docs
  • Getting Started
    • Installation
    • Quick Start
  • API Reference
    • Overview
    • Endpoints
      • Ingest
      • Search
      • Query
      • Cache
      • Response Models
  • User Guides
    • Shell
    • Document Ingestion
    • Processing Rules
    • Semantic Search
    • Completions
    • Monitoring & Observability
Powered by GitBook
On this page
  • POST Ingest Text Document
  • POST Ingest File Document
  1. API Reference
  2. Endpoints

Ingest

POST Ingest Text Document

Ingest a text document with metadata. The document will be chunked and indexed for semantic search.

Parameters:

  • content: Text content to ingest

  • metadata: (Optional) Dictionary of metadata

  • rules: (Optional) List of processing rules to apply. Each rule can be:

    • Metadata extraction rule with a JSON schema

    • Natural language rule with a transformation prompt

Returns: Document object with the following fields:

  • external_id: Unique document identifier

  • content_type: Content type (always "text/plain" for text)

  • filename: Always None for text documents

  • metadata: Combined user-provided and rule-extracted metadata

  • storage_info: Empty for text documents

  • system_metadata: System-managed metadata (created_at, updated_at, version)

  • access_control: Access control lists (readers, writers, admins)

  • chunk_ids: List of chunk identifiers

from databridge import DataBridge, MetadataExtractionRule, NaturalLanguageRule

# Create client instance
db = DataBridge(uri="your-databridge-uri")

# Create processing rules (optional)
metadata_rule = MetadataExtractionRule(schema={
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "topics": {"type": "array", "items": {"type": "string"}}
    }
})

format_rule = NaturalLanguageRule(
    prompt="Convert the text into a professional format with clear paragraphs"
)

# Ingest text document with rules
doc = db.ingest_text(
    content="Machine learning is transforming industries...",
    metadata={
        "title": "ML Overview",
        "category": "tech",
        "tags": ["ml", "ai"]
    },
    rules=[metadata_rule, format_rule]  # Optional processing rules
)
print(f"Document ID: {doc.external_id}")
curl -X POST "http://localhost:8000/ingest/text" \
  -H "Authorization: Bearer your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Machine learning is transforming industries...",
    "metadata": {
        "title": "ML Overview",
        "category": "tech",
        "tags": ["ml", "ai"]
    },
    "rules": [
        {
            "type": "metadata_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "topics": {"type": "array", "items": {"type": "string"}}
                }
            }
        },
        {
            "type": "natural_language",
            "prompt": "Convert the text into a professional format with clear paragraphs"
        }
    ]
  }'

Response:

{
    "external_id": "doc_abc123",
    "content_type": "text/plain",
    "filename": null,
    "metadata": {
        "title": "ML Overview",
        "category": "tech",
        "tags": ["ml", "ai"]
    },
    "storage_info": {},
    "system_metadata": {
        "created_at": "2024-03-20T10:30:00Z",
        "updated_at": "2024-03-20T10:30:00Z",
        "version": 1
    },
    "access_control": {
        "readers": ["user_123"],
        "writers": ["user_123"],
        "admins": ["user_123"]
    },
    "chunk_ids": ["chunk_1", "chunk_2"]
}
{
    "detail": "Invalid authentication credentials"
}
{
    "detail": [
        {
            "loc": ["body", "content"],
            "msg": "field required",
            "type": "value_error.missing"
        }
    ]
}

POST Ingest File Document

Upload and ingest a file document. Supports various file types including PDFs, Word documents, presentations, and more. The file will be processed, chunked, and indexed for semantic search.

Parameters:

  • file: File to ingest (path string, bytes, file object, or Path)

  • filename: Name of the file

  • content_type: MIME type (optional, will be guessed if not provided)

  • metadata: Optional dictionary of metadata

  • rules: (Optional) List of processing rules to apply to extracted text

Returns: Document object with storage information including:

  • All fields from text documents

  • storage_info: Contains bucket and key information for file storage

  • filename: Original filename

  • content_type: MIME type of the file

from databridge import DataBridge, MetadataExtractionRule, NaturalLanguageRule

# Create client instance
db = DataBridge(uri="your-databridge-uri")

# Create processing rules (optional)
pii_rule = NaturalLanguageRule(
    prompt="Remove all PII. Replace names with [NAME], emails with [EMAIL]"
)

classify_rule = MetadataExtractionRule(schema={
    "type": "object",
    "properties": {
        "document_type": {"type": "string"},
        "confidentiality": {"type": "string"}
    }
})

# From file path with rules
doc = db.ingest_file(
    file="presentation.pdf",
    filename="Q4_Presentation.pdf",
    content_type="application/pdf",
    metadata={
        "department": "Finance",
        "year": 2024,
        "quarter": 4
    },
    rules=[pii_rule, classify_rule]  # Optional processing rules
)
print(f"Document ID: {doc.external_id}")
print(f"Storage location: {doc.storage_info['bucket']}/{doc.storage_info['key']}")

# From file object
with open("presentation.pptx", "rb") as f:
    doc = db.ingest_file(
        file=f,
        filename="presentation.pptx",
        rules=[pii_rule]  # Rules work with file objects too
    )
curl -X POST "http://localhost:8000/ingest/file" \
  -H "Authorization: Bearer your_token" \
  -F "file=@presentation.pdf" \
  -F 'metadata={"department":"Finance","year":2024,"quarter":4}' \
  -F 'rules=[
    {
        "type": "natural_language",
        "prompt": "Remove all PII. Replace names with [NAME], emails with [EMAIL]"
    },
    {
        "type": "metadata_extraction",
        "schema": {
            "type": "object",
            "properties": {
                "document_type": {"type": "string"},
                "confidentiality": {"type": "string"}
            }
        }
    }
  ]'

Response:

{
    "external_id": "doc_xyz789",
    "content_type": "application/pdf",
    "filename": "Q4_Presentation.pdf",
    "metadata": {
        "department": "Finance",
        "year": 2024,
        "quarter": 4
    },
    "storage_info": {
        "bucket": "your-bucket-name",
        "key": "doc_xyz789/Q4_Presentation.pdf"
    },
    "system_metadata": {
        "created_at": "2024-03-20T10:30:00Z",
        "updated_at": "2024-03-20T10:30:00Z",
        "version": 1
    },
    "access_control": {
        "readers": ["user_123"],
        "writers": ["user_123"],
        "admins": ["user_123"]
    },
    "chunk_ids": ["chunk_1", "chunk_2", "chunk_3"]
}
{
    "detail": "Invalid authentication credentials"
}
{
    "detail": "File size exceeds maximum allowed size of 100MB"
}
PreviousEndpointsNextSearch

Last updated 3 months ago