Skip to main content

Knowledge Base Setup

Overview

Django CFG Knowbase follows the "zero-configuration" philosophy - enable one setting and get a complete AI-powered knowledge management system. This guide covers configuration options, setup requirements, and deployment considerations.

Philosophy: "Configuration over convention, but smart defaults everywhere" - Minimal required configuration with extensive customization options for advanced use cases.

TAGS: configuration, setup, deployment, django-cfg, zero-config, production-ready


Modules

@django_cfg.core.config/KnowbaseConfig

Purpose: Configuration management for Knowbase module with type-safe settings and intelligent defaults.

Dependencies:

  • django_cfg.core.config.DjangoConfig - base configuration
  • pydantic - type validation and settings management
  • Environment variable resolution
  • Constance dynamic settings integration

Exports:

  • enable_knowbase - main feature flag
  • openai_api_key - AI service authentication
  • Similarity threshold settings
  • Processing configuration options
  • Cache and performance settings

Used in:

  • Django settings generation
  • Service initialization
  • Background task configuration
  • Admin interface setup

Tags: configuration, type-safety, environment-variables, dynamic-settings


Environment Variables

# .env file
OPENAI_API_KEY=sk-your-openai-api-key-here

# Optional: Custom database for knowledge data
KNOWLEDGE_DATABASE_URL=postgresql://user:pass@localhost/knowledge_db

# Optional: Redis for caching and tasks
REDIS_URL=redis://localhost:6379/0

# Optional: Custom similarity thresholds
KNOWBASE_DOCUMENT_THRESHOLD=0.7
KNOWBASE_ARCHIVE_THRESHOLD=0.6
KNOWBASE_EXTERNAL_THRESHOLD=0.5

Advanced Configuration

class ProductionConfig(DjangoConfig):
# Core Settings
enable_knowbase: bool = True
openai_api_key: str = env.openai_api_key # From YAML config

# AI Model Configuration
openai_model: str = "gpt-4" # Default: gpt-3.5-turbo
embedding_model: str = "text-embedding-ada-002" # Default

# Similarity Thresholds (0.0-1.0)
knowbase_document_threshold: float = 0.75 # Documents
knowbase_archive_threshold: float = 0.65 # Code archives
knowbase_external_threshold: float = 0.55 # External data

# Processing Configuration
knowbase_chunk_size: int = 1000 # Text chunk size
knowbase_overlap_size: int = 200 # Chunk overlap
knowbase_batch_size: int = 50 # Embedding batch size

# Performance Settings
knowbase_max_context_chunks: int = 10 # Max context per query
knowbase_max_tokens_per_query: int = 1000 # Response length limit
knowbase_cache_ttl: int = 3600 # Cache timeout (seconds)

# Security Settings
knowbase_require_auth: bool = True # Require authentication
knowbase_allow_public_search: bool = False # Public search access
knowbase_rate_limit_per_minute: int = 60 # API rate limiting

# Database Configuration
knowbase_use_separate_db: bool = True # Dedicated knowledge DB
knowbase_db_name: str = "knowledge" # Database name

# Background Processing
knowbase_worker_concurrency: int = 4 # Dramatiq workers
knowbase_task_timeout: int = 300 # Task timeout (seconds)
knowbase_retry_attempts: int = 3 # Failed task retries

Development Configuration

class DevelopmentConfig(DjangoConfig):
enable_knowbase: bool = True
openai_api_key: str = env.openai_api_key # From YAML config

# Development-friendly settings
knowbase_debug_mode: bool = True # Verbose logging
knowbase_cache_ttl: int = 60 # Short cache for testing
knowbase_require_auth: bool = False # Easy testing
knowbase_allow_public_search: bool = True # Open access

# Faster processing for development
knowbase_chunk_size: int = 500 # Smaller chunks
knowbase_batch_size: int = 10 # Smaller batches
knowbase_worker_concurrency: int = 1 # Single worker

Infrastructure Setup

PostgreSQL with pgvector

# Install pgvector extension
sudo apt-get install postgresql-14-pgvector

# Enable extension in your database
psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS vector;"

# Verify installation
psql -d your_database -c "SELECT * FROM pg_extension WHERE extname = 'vector';"

Redis Configuration

# Install Redis
sudo apt-get install redis-server

# Start Redis service
sudo systemctl start redis-server
sudo systemctl enable redis-server

# Test connection
redis-cli ping # Should return PONG

Background Workers

# Start ReArq workers for background processing
rearq

# Production: Use supervisor or systemd
# /etc/supervisor/conf.d/knowbase-workers.conf
[program:knowbase-workers]
command=/path/to/venv/bin/rearq
directory=/path/to/project
user=www-data
autostart=true
autorestart=true
numprocs=4

Database Migrations

# Apply Knowbase migrations
python manage.py migrate

# Create superuser for admin access
python manage.py createsuperuser

# Optional: Load sample data
python manage.py loaddata knowbase_sample_data.json

%%END%%


---

## Data Models (Pydantic 2 & TypeScript)

### Pydantic 2 Models (Backend)

```python
from pydantic import BaseModel, Field, field_validator, ValidationInfo
from typing import Optional, Dict, Any, List
from enum import Enum

class KnowbaseConfig(BaseModel):
"""Complete Knowbase configuration model"""

# Core Settings
enable_knowbase: bool = True
openai_api_key: str = Field(..., min_length=1)

# AI Model Configuration
openai_model: str = Field("gpt-3.5-turbo", pattern=r"^gpt-")
embedding_model: str = "text-embedding-ada-002"

# Similarity Thresholds
knowbase_document_threshold: float = Field(0.7, ge=0.0, le=1.0)
knowbase_archive_threshold: float = Field(0.6, ge=0.0, le=1.0)
knowbase_external_threshold: float = Field(0.5, ge=0.0, le=1.0)

# Processing Configuration
knowbase_chunk_size: int = Field(1000, ge=100, le=4000)
knowbase_overlap_size: int = Field(200, ge=0, le=1000)
knowbase_batch_size: int = Field(50, ge=1, le=200)

# Performance Settings
knowbase_max_context_chunks: int = Field(10, ge=1, le=50)
knowbase_max_tokens_per_query: int = Field(1000, ge=100, le=4000)
knowbase_cache_ttl: int = Field(3600, ge=60, le=86400)

# Security Settings
knowbase_require_auth: bool = True
knowbase_allow_public_search: bool = False
knowbase_rate_limit_per_minute: int = Field(60, ge=1, le=1000)

# Database Configuration
knowbase_use_separate_db: bool = False
knowbase_db_name: str = "knowledge"

# Background Processing
knowbase_worker_concurrency: int = Field(4, ge=1, le=20)
knowbase_task_timeout: int = Field(300, ge=30, le=3600)
knowbase_retry_attempts: int = Field(3, ge=1, le=10)

@field_validator('knowbase_overlap_size')
@classmethod
def overlap_must_be_less_than_chunk_size(cls, v, info: ValidationInfo):
if 'knowbase_chunk_size' in info.data and v >= info.data['knowbase_chunk_size']:
raise ValueError('overlap_size must be less than chunk_size')
return v

class DatabaseConfig(BaseModel):
"""Database configuration for Knowbase"""
host: str = "localhost"
port: int = Field(5432, ge=1, le=65535)
name: str = "knowledge"
user: str
password: str
options: Dict[str, Any] = Field(default_factory=dict)

@property
def url(self) -> str:
return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.name}"

class RedisConfig(BaseModel):
"""Redis configuration for caching and tasks"""
host: str = "localhost"
port: int = Field(6379, ge=1, le=65535)
db: int = Field(0, ge=0, le=15)
password: Optional[str] = None

@property
def url(self) -> str:
auth = f":{self.password}@" if self.password else ""
return f"redis://{auth}{self.host}:{self.port}/{self.db}"
```

### TypeScript Interfaces (Frontend)

```typescript
export interface KnowbaseConfig {
// Core Settings
enable_knowbase: boolean;
openai_api_key: string;

// AI Model Configuration
openai_model: string;
embedding_model: string;

// Similarity Thresholds
knowbase_document_threshold: number;
knowbase_archive_threshold: number;
knowbase_external_threshold: number;

// Processing Configuration
knowbase_chunk_size: number;
knowbase_overlap_size: number;
knowbase_batch_size: number;

// Performance Settings
knowbase_max_context_chunks: number;
knowbase_max_tokens_per_query: number;
knowbase_cache_ttl: number;

// Security Settings
knowbase_require_auth: boolean;
knowbase_allow_public_search: boolean;
knowbase_rate_limit_per_minute: number;

// Database Configuration
knowbase_use_separate_db: boolean;
knowbase_db_name: string;

// Background Processing
knowbase_worker_concurrency: number;
knowbase_task_timeout: number;
knowbase_retry_attempts: number;
}

export interface DatabaseConfig {
host: string;
port: number;
name: string;
user: string;
password: string;
options: Record<string, any>;
}

export interface RedisConfig {
host: string;
port: number;
db: number;
password?: string;
}

// Configuration validation
export interface ConfigValidationResult {
valid: boolean;
errors: string[];
warnings: string[];
recommendations: string[];
}
```

---

## 🔁 Flows

### Configuration Loading Flow

1. **Environment Loading** → Load environment variables and .env files
2. **Config Validation** → Validate configuration with Pydantic models
3. **Default Application** → Apply intelligent defaults for missing values
4. **Service Registration** → Register Knowbase services with Django
5. **Database Setup** → Configure database routing and connections
6. **Cache Configuration** → Set up Redis caching and sessions
7. **Background Tasks** → Initialize ReArq task queues
8. **URL Registration** → Register API endpoints and admin interfaces

**Modules**:
- `django_cfg.core.config` - configuration loading
- `django_cfg.apps.knowbase.apps` - service registration
- Django settings generation

---

### Production Deployment Flow

1. **Environment Preparation** → Set up production environment variables
2. **Infrastructure Setup** → Configure PostgreSQL, Redis, and workers
3. **Security Configuration** → Set authentication and rate limiting
4. **Performance Tuning** → Optimize thresholds and batch sizes
5. **Monitoring Setup** → Configure logging and error tracking
6. **Health Checks** → Implement service health monitoring
7. **Scaling Configuration** → Set up horizontal scaling options

**Modules**:
- Production configuration classes
- Infrastructure setup scripts
- Monitoring and logging configuration

---

## Configuration Best Practices

### Environment-Specific Settings

```python
# Base configuration
class BaseKnowbaseConfig(DjangoConfig):
enable_knowbase: bool = True
openai_api_key: str = env.openai_api_key # From YAML config

# Development
class DevelopmentConfig(BaseKnowbaseConfig):
knowbase_debug_mode: bool = True
knowbase_require_auth: bool = False
knowbase_cache_ttl: int = 60

# Staging
class StagingConfig(BaseKnowbaseConfig):
knowbase_debug_mode: bool = True
knowbase_require_auth: bool = True
knowbase_rate_limit_per_minute: int = 100

# Production
class ProductionConfig(BaseKnowbaseConfig):
knowbase_debug_mode: bool = False
knowbase_require_auth: bool = True
knowbase_rate_limit_per_minute: int = 60
knowbase_use_separate_db: bool = True
```

### Security Hardening

```python
class SecureKnowbaseConfig(DjangoConfig):
enable_knowbase: bool = True
openai_api_key: str = env.openai_api_key # From YAML config

# Security settings
knowbase_require_auth: bool = True
knowbase_allow_public_search: bool = False
knowbase_rate_limit_per_minute: int = 30

# API key rotation
knowbase_api_key_rotation_days: int = 90

# Content filtering
knowbase_enable_content_filter: bool = True
knowbase_blocked_file_types: List[str] = ['.exe', '.bat', '.sh']

# Audit logging
knowbase_enable_audit_log: bool = True
knowbase_log_all_queries: bool = True
```

### Performance Optimization

```python
class HighPerformanceConfig(DjangoConfig):
enable_knowbase: bool = True
openai_api_key: str = env.openai_api_key # From YAML config

# Optimized processing
knowbase_batch_size: int = 100
knowbase_worker_concurrency: int = 8
knowbase_chunk_size: int = 1500

# Aggressive caching
knowbase_cache_ttl: int = 7200
knowbase_enable_query_cache: bool = True
knowbase_enable_embedding_cache: bool = True

# Database optimization
knowbase_use_separate_db: bool = True
knowbase_db_pool_size: int = 20
knowbase_db_max_overflow: int = 30
```

---

## ⚠️ Anti-patterns to Avoid

### ❌ Hardcoded API Keys

**Don't do this**:
```python
class BadConfig(DjangoConfig):
openai_api_key: str = "sk-hardcoded-key-in-source-code" # Security risk!
```

**Do this instead**:
```python
class GoodConfig(DjangoConfig):
openai_api_key: str = env.openai_api_key # From YAML config # Environment variable
```

### ❌ Ignoring Resource Limits

**Don't do this**:
```python
class ResourceHungryConfig(DjangoConfig):
knowbase_batch_size: int = 1000 # Too large
knowbase_worker_concurrency: int = 50 # Too many workers
knowbase_max_context_chunks: int = 100 # Expensive queries
```

**Do this instead**:
```python
class OptimizedConfig(DjangoConfig):
knowbase_batch_size: int = 50 # Reasonable batch size
knowbase_worker_concurrency: int = 4 # Match CPU cores
knowbase_max_context_chunks: int = 10 # Cost-effective
```

### ❌ Same Settings for All Environments

**Don't do this**:
```python
# Using production settings in development
class OneConfigForAll(DjangoConfig):
knowbase_require_auth: bool = True # Slows development
knowbase_cache_ttl: int = 3600 # Hard to test changes
knowbase_debug_mode: bool = False # No debugging info
```

**Do this instead**:
```python
# Environment-specific configurations
class DevelopmentConfig(DjangoConfig):
knowbase_require_auth: bool = False # Easy testing
knowbase_cache_ttl: int = 60 # Quick cache refresh
knowbase_debug_mode: bool = True # Verbose logging
```

---

## Version Tracking

- `ADDED_IN: v1.0` - Basic configuration with enable_knowbase flag
- `ADDED_IN: v1.1` - Similarity threshold configuration
- `ADDED_IN: v1.2` - Performance and security settings
- `ADDED_IN: v1.3` - Environment-specific configuration classes
- `CHANGED_IN: v1.4` - Improved validation and type safety
- `ADDED_IN: v1.5` - Production deployment configuration

---

## Configuration Checklist

### Development Setup

- [ ] Set `enable_knowbase: bool = True`
- [ ] Configure `OPENAI_API_KEY` environment variable
- [ ] Install PostgreSQL with pgvector extension
- [ ] Set up Redis server
- [ ] Run `python manage.py migrate`
- [ ] Start background workers with `rearq`

### Production Deployment

- [ ] Use environment-specific configuration class
- [ ] Set up dedicated knowledge database
- [ ] Configure Redis clustering for high availability
- [ ] Set up multiple background workers
- [ ] Enable authentication and rate limiting
- [ ] Configure monitoring and logging
- [ ] Set up health checks and alerts
- [ ] Implement backup and recovery procedures

### Security Hardening

- [ ] Enable authentication (`knowbase_require_auth: bool = True`)
- [ ] Disable public search in production
- [ ] Set appropriate rate limits
- [ ] Use strong database passwords
- [ ] Enable audit logging
- [ ] Regular API key rotation
- [ ] Content filtering for uploads

### Performance Optimization

- [ ] Tune similarity thresholds for your content
- [ ] Optimize chunk sizes based on content type
- [ ] Configure appropriate batch sizes
- [ ] Set up caching with reasonable TTL
- [ ] Monitor token usage and costs
- [ ] Scale workers based on load
- [ ] Use database connection pooling

---

**DEPENDS_ON**: [django-cfg, PostgreSQL, pgvector, Redis, OpenAI API]
**USED_BY**: [All Knowbase components, Django settings, Background workers]
**TAGS**: `configuration, setup, deployment, type-safety, environment-variables`

---


## Quick Navigation

This directory contains comprehensive user documentation for the Django CFG Knowbase module, following the `@DOCS_MODULE.md` methodology for LLM-optimized documentation.

---

## Documentation Structure

### 🏠 [configuration](./knowbase-configuration)
**Main overview and philosophy**
- Complete module overview and philosophy
- Zero-configuration AI integration approach
- Quick start checklist and verification steps
- Core modules and their relationships
- Production-ready architecture insights

**Key Topics**: Overview, Philosophy, Quick Start, Architecture, Anti-patterns

---

### [data-integration](./knowbase-data-integration)
**Auto-AI integration for Django models**
- ExternalDataMixin usage and configuration
- Real-time vectorization and sync patterns
- Advanced integration patterns and examples
- Performance optimization for model integration

**Key Topics**: ExternalDataMixin, Auto-sync, Model Integration, Real-time Updates

---

### [chat-search](./knowbase-chat-search)
**Conversational AI and semantic search**
- ChatService and SearchService usage
- Context retrieval and conversation management
- Multi-content type semantic search
- Performance optimization and cost management

**Key Topics**: AI Chat, Semantic Search, Context Retrieval, RAG Pipeline

---

### Configuration Setup
**Zero-config setup and deployment**
- Configuration options and environment setup
- Production deployment best practices
- Security hardening and performance tuning
- Infrastructure requirements and scaling

**Key Topics**: Configuration, Setup, Deployment, Security, Performance

---

## Getting Started Path

### For New Users
1. **Start here**: [configuration](./knowbase-configuration) - Understand the philosophy and architecture
2. **Setup**: Basic Configuration - Get your system running
3. **Integration**: [data-integration](./knowbase-data-integration) - Add AI to your models
4. **Usage**: [chat-search](./knowbase-chat-search) - Use AI chat and search features

### For Experienced Users
- **Quick Reference**: Each file contains API examples and configuration options
- **Advanced Patterns**: Look for "Advanced Features" sections in each guide
- **Production**: Focus on "Production" and "Performance" sections
- **Troubleshooting**: Check "Anti-patterns to Avoid" sections

---

## Documentation Features

### LLM-Optimized Format
- **Token-efficient**: Concise but comprehensive content
- **Structured**: Consistent headings and organization
- **Searchable**: Tagged with relevant keywords
- **Executable**: All code examples are working and tested

### Type-Safe Examples
- **Pydantic 2**: Backend models with full validation
- **TypeScript**: Frontend interfaces and types
- **Django**: Production-ready Django patterns
- **API**: Complete API usage examples

### Production-Ready
- **Security**: Authentication, rate limiting, content filtering
- **Performance**: Caching, batching, optimization techniques
- **Monitoring**: Logging, metrics, health checks
- **Scaling**: Horizontal scaling and load balancing

---

## Content Statistics

| Guide | Lines | Topics | Code Examples | API Methods |
|-------|-------|--------|---------------|-------------|
| configuration | ~400 | 8 | 15+ | 10+ |
| data-integration | ~500 | 10 | 20+ | 15+ |
| chat-search | ~600 | 12 | 25+ | 20+ |
| Basic Configuration | ~550 | 11 | 18+ | 12+ |
| **Total** | **~2050** | **41** | **78+** | **57+** |

---

## 🏷️ Tags and Keywords

**Core Tags**: `django-cfg`, `knowbase`, `ai`, `rag`, `semantic-search`, `chat-ai`, `documentation`

**Technical Tags**: `pydantic2`, `typescript`, `postgresql`, `pgvector`, `redis`, `dramatiq`, `openai`

**Feature Tags**: `zero-config`, `auto-integration`, `real-time-sync`, `production-ready`, `type-safe`

**Use Case Tags**: `document-management`, `customer-support`, `knowledge-base`, `code-search`, `api-docs`

---

## Version Information

- **Documentation Version**: v1.0
- **Knowbase Module Version**: v1.5+
- **Django CFG Version**: v1.1.82+
- **Last Updated**: September 2024

---

## Contributing

This documentation follows the `@DOCS_MODULE.md` methodology:

### Guidelines
- **Maximum 1000 lines per file** (enforced)
- **Token-efficient content** - every line adds value
- **Working code examples** - all examples must be executable
- **Type safety** - 100% typed examples (Pydantic 2 / TypeScript)
- **No duplication** - DRY principle for documentation

### Structure Requirements
- **Overview** section with philosophy and tags
- **Modules** section with dependencies and exports
- **APIs** section with function documentation
- **Data Models** with Pydantic 2 and TypeScript
- **Flows** section with process descriptions
- **Anti-patterns** section with what to avoid
- **Version Tracking** with change history

---

## Quick Reference

### Essential Commands
```bash
# Enable Knowbase
enable_knowbase: bool = True

# Start system
python manage.py migrate
rearq

# Test integration
python manage.py shell
>>> from django_cfg.apps.knowbase.services import ChatService
>>> chat = ChatService(user=user)
>>> session = chat.create_session(title="Test")
>>> response = chat.process_query(session.id, "Hello AI!")
```

### Key URLs
- **Admin**: `/admin/` (Knowledge Base section)
- **API**: `/cfg/knowbase/api/`
- **Chat**: `/cfg/knowbase/chat/`
- **Docs**: `/cfg/knowbase/api/docs/`

### Support Resources
- **Technical Documentation**: `/src/django_cfg/features/built-in-apps/knowbase/@docs/`
- **Examples**: `/src/django_cfg/features/built-in-apps/knowbase/guides/`
- **Tests**: `/src/django_cfg/features/built-in-apps/knowbase/tests/`

---

**DEPENDS_ON**: [Django CFG, PostgreSQL, pgvector, Redis, OpenAI API]
**USED_BY**: [Django developers, AI integrators, Knowledge management systems]
**TAGS**: `documentation, user-guide, django-cfg, knowbase, ai-integration`