Installation
This guide covers setting up a DataBridge server. If you just want to use an existing DataBridge server, see our Quick Start Guide instead.
There are two ways to set up DataBridge:
Docker Installation (Recommended)
Docker Installation
The Docker setup is the recommended way to get started quickly with all components preconfigured.
Prerequisites
Docker and Docker Compose installed on your system
At least 10GB of free disk space (for models and data)
8GB+ RAM recommended
Quick Start
Clone the repository and navigate to the project directory:
git clone https://github.com/databridge-org/databridge-core.git
cd databridge-core
Start all services:
docker compose up --build
This command will:
Build all required containers
Download necessary AI models (nomic-embed-text and llama3.2)
Initialize the PostgreSQL database with pgvector
Start all services
The initial setup may take 5-10 minutes depending on your internet speed.
For subsequent runs:
docker compose up # Start all services
docker compose down # Stop all services
For more details on Docker setup, configuration, and troubleshooting, see our Docker Guide.
Using Existing Services
If you already have Ollama or PostgreSQL running on your machine, you can configure DataBridge to use these existing instances instead of starting new containers.
Using Existing Ollama
Modify
databridge.toml
to point to your local Ollama instance:
[completion]
provider = "ollama"
model_name = "llama3.2"
base_url = "http://host.docker.internal:11434" # Points to host machine's Ollama
[embedding]
provider = "ollama"
model_name = "nomic-embed-text"
base_url = "http://host.docker.internal:11434" # Points to host machine's Ollama
Remove the Ollama service from
docker-compose.yml
:Delete the
ollama
service sectionRemove
ollama
from thedepends_on
section of the DataBridge serviceRemove the
ollama_data
volume
Add host.docker.internal support (required for Linux):
services:
databridge:
extra_hosts:
- "host.docker.internal:host-gateway"
Start only the required services:
docker compose up postgres databridge
Make sure your local Ollama instance:
Is running and accessible on port 11434
Has the required models installed (
nomic-embed-text
andllama3.2
)
Using Existing PostgreSQL
Modify the
POSTGRES_URI
in your environment ordocker-compose.yml
:
services:
databridge:
environment:
- POSTGRES_URI=postgresql+asyncpg://your_user:your_password@host.docker.internal:5432/your_db
Remove the PostgreSQL service from
docker-compose.yml
:Delete the
postgres
service sectionRemove the
postgres_data
volumeUpdate the
depends_on
section of the DataBridge service
Make sure your PostgreSQL instance:
Has pgvector extension installed
Is accessible from Docker containers
Has the necessary database and permissions set up
Start only the DataBridge service:
docker compose up databridge
Manual Installation
This section covers setting up DataBridge manually if you prefer more control over the installation.
1. Clone the Repository
git clone https://github.com/databridge-org/databridge-core.git
2. Setup Python Environment
Python 3.12 is supported, but other versions may work:
cd databridge-core
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
3. Install Dependencies
pip install -r requirements.txt
4. Configure Environment
Copy the example environment file and create your own .env
:
cp .env.example .env
Then edit the .env
file with your settings:
JWT_SECRET_KEY="..." # Required in production, optional in dev mode
POSTGRES_URI="postgresql+asyncpg://postgres:postgres@localhost:5432/databridge" # Required for PostgreSQL database
MONGODB_URI="..." # Optional: Only needed if using MongoDB
UNSTRUCTURED_API_KEY="..." # Optional: Needed for parsing via unstructured API
OPENAI_API_KEY="..." # Optional: Needed for OpenAI embeddings and completions
ASSEMBLYAI_API_KEY="..." # Optional: Needed for combined parser
ANTHROPIC_API_KEY="..." # Optional: Needed for contextual parser
AWS_ACCESS_KEY="..." # Optional: Needed for AWS S3 storage
AWS_SECRET_ACCESS_KEY="..." # Optional: Needed for AWS S3 storage
For local development, you can enable development mode in databridge.toml
:
[auth]
dev_mode = true # Set to true to disable authentication for local development
Note: Development mode should only be used for local development and testing. Always configure proper authentication in production.
5. Setup PostgreSQL (Default Database)
If running with postgres locally:
brew install postgresql@14
brew install pgvector
brew services start postgresql@14
createdb databridge
createuser -s postgres
6. Run Quick Setup
python quick_setup.py
This script will automatically:
Configure your database
Set up your storage
Create the required vector index
7. Start the Server
python start_server.py
Accessing Your DataBridge Server
Once your server is running (either through Docker or manual installation), you can access it in several ways:
1. Server Access Points
API: http://localhost:8000
API Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/health
2. Getting Your Access URI
Visit the API documentation at http://localhost:8000/docs
Find and use the
/local/generate_uri
endpoint to generate your admin URISave this URI - you'll need it to connect to your server
3. Ways to Use DataBridge
With your URI, you can interact with DataBridge in several ways:
Using the Shell
python shell.py <your_local_uri>
Using the Python SDK
from databridge import DataBridge
db = DataBridge("your-databridge-uri", is_local=True)
Using the UI Component
The UI provides a visual interface for prototyping and testing. To set it up:
Navigate to the UI directory:
cd databridge-core/ui-component
Install dependencies and start:
npm install
npm run dev
The UI will be available at http://localhost:3000. Use your generated URI to connect.
Additional Configuration
MongoDB Setup
You need a MongoDB Atlas cluster with Vector Search enabled
Create a database named as per your DATABRIDGE_DB setting
The server will automatically create required collections and indexes
AWS S3 Setup
Create an S3 bucket for document storage
Create an IAM user with permissions for this bucket
Use the access keys in your .env file
API Keys
OpenAI API key: Required if using OpenAI for embeddings
Unstructured API key: Required for document parsing
Next Steps
See the Quick Start Guide to begin using your server
Last updated