Skip to main content

AI Chat & Semantic Search - Conversational Knowledge Retrieval

Overview

Knowbase provides enterprise-grade conversational AI with semantic search capabilities. Users can chat naturally with their knowledge base, and the system automatically retrieves relevant context from documents, external data, and code archives.

Philosophy: "Ask anything, get intelligent answers" - Natural language queries that understand context, intent, and relationships across all your content types.

TAGS: ai-chat, semantic-search, rag, conversational-ai, context-retrieval, openai


Modules

@knowbase/services/ChatService

Purpose: Conversational AI service with automatic context retrieval from semantic search across all content types.

Dependencies:

  • SearchService - semantic search across content
  • PromptBuilder - context assembly and prompt engineering
  • LLMClient - OpenAI API integration
  • ChatSession, ChatMessage models

Exports:

  • ChatService - main chat orchestration
  • create_session() - start new conversations
  • process_query() - handle user queries with context
  • get_conversation_history() - retrieve chat history

Used in:

  • Customer support interfaces
  • Internal knowledge portals
  • API documentation systems
  • Code exploration tools

Tags: chat, rag, context-retrieval, conversation-management


@knowbase/services/SearchService

Purpose: Unified semantic search across documents, external data, and code archives using pgvector cosine similarity.

Dependencies:

  • pgvector - vector similarity search
  • Document, ExternalData, ArchiveItem models
  • EmbeddingService - query embedding generation
  • Content-specific similarity thresholds

Exports:

  • SearchService - unified search interface
  • search_all_content() - cross-content search
  • search_documents() - document-specific search
  • search_external_data() - external data search
  • search_archives() - code archive search

Used in:

  • Chat context retrieval
  • Search interfaces
  • Content discovery
  • Related content suggestions

Tags: semantic-search, pgvector, cosine-similarity, unified-search


Advanced Chat Configuration

# Custom system prompt and settings
session = chat_service.create_session(
title="Technical Support",
system_prompt="""You are a technical support expert.
Always provide step-by-step solutions and ask clarifying questions
when the user's problem is unclear.""",
max_messages=50, # Conversation length limit
auto_title=True # Auto-generate titles from first query
)

# Query with specific content type filtering
response = chat_service.process_query(
session_id=session.id,
query="How do I configure the API authentication?",
max_context_chunks=3,
content_types=['document', 'external'], # Exclude archives
similarity_threshold=0.8, # Higher precision
include_metadata=True # Include source metadata
)

# Access detailed response information
for chunk in response.context_chunks:
print(f"Source: {chunk.source_title}")
print(f"Relevance: {chunk.similarity_score:.3f}")
print(f"Content: {chunk.content_preview}")

Conversation Management

# Get conversation history
history = chat_service.get_conversation_history(
session_id=session.id,
limit=20, # Recent messages
include_context=True # Include context chunks
)

# Continue conversation with context
response = chat_service.process_query(
session_id=session.id,
query="Can you elaborate on the second point?",
use_conversation_context=True, # Use previous messages as context
max_context_chunks=3
)

# Update session settings
chat_service.update_session(
session_id=session.id,
title="Updated Session Title",
system_prompt="Updated system instructions"
)

Search Service Usage

from django_cfg.apps.knowbase.services import SearchService

# Initialize search service
search_service = SearchService(user=request.user)

# Search across all content types
results = search_service.search_all_content(
query="machine learning algorithms",
limit=10,
similarity_threshold=0.7, # Auto-adjusted per content type
include_metadata=True
)

# Process results
for result in results:
print(f"Title: {result.title}")
print(f"Type: {result.content_type}")
print(f"Relevance: {result.similarity_score:.3f}")
print(f"Preview: {result.content_preview}")
print("---")
# Search only documents
doc_results = search_service.search_documents(
query="user authentication",
limit=5,
similarity_threshold=0.8,
category_filter="api-docs" # Optional category filtering
)

# Search external data (e.g., product catalog)
external_results = search_service.search_external_data(
query="wireless headphones under $100",
limit=10,
similarity_threshold=0.6,
metadata_filter={'category': 'electronics'}
)

# Search code archives
code_results = search_service.search_archives(
query="authentication middleware",
limit=8,
similarity_threshold=0.7,
file_type_filter=['.py', '.js'] # Specific file types
)

Advanced Search Features

# Hybrid search with keyword and semantic
results = search_service.hybrid_search(
query="Django REST API authentication",
semantic_weight=0.7, # 70% semantic, 30% keyword
limit=15
)

# Similar content discovery
similar_results = search_service.find_similar_content(
content_id="doc_123",
content_type="document",
limit=5,
exclude_self=True
)

# Search with date filtering
recent_results = search_service.search_all_content(
query="new features",
date_filter={
'field': 'created_at',
'after': '2024-01-01',
'before': '2024-12-31'
}
)

%%END%%


---

## Data Models (Pydantic 2 & TypeScript)

### Pydantic 2 Models (Backend)

```python
from pydantic import BaseModel, Field
from typing import Optional, List, Dict, Any
from datetime import datetime
from enum import Enum

class ChatQueryRequest(BaseModel):
query: str = Field(..., min_length=1, max_length=1000)
max_tokens: int = Field(500, ge=1, le=4000)
max_context_chunks: int = Field(5, ge=1, le=20)
temperature: float = Field(0.7, ge=0.0, le=2.0)
content_types: List[str] = Field(default_factory=lambda: ["document", "external", "archive"])
similarity_threshold: Optional[float] = Field(None, ge=0.0, le=1.0)
use_conversation_context: bool = True
include_metadata: bool = False

class ContextChunk(BaseModel):
content: str
source_title: str
source_type: str
similarity_score: float
metadata: Dict[str, Any] = Field(default_factory=dict)
content_preview: str = Field(..., max_length=200)

class TokenUsage(BaseModel):
prompt_tokens: int
completion_tokens: int
total_tokens: int
estimated_cost: float

class ChatResponse(BaseModel):
content: str
context_chunks: List[ContextChunk]
token_usage: TokenUsage
processing_time: float
session_id: str
message_id: str

class SearchRequest(BaseModel):
query: str = Field(..., min_length=1, max_length=500)
limit: int = Field(10, ge=1, le=100)
similarity_threshold: float = Field(0.7, ge=0.0, le=1.0)
content_types: List[str] = Field(default_factory=lambda: ["document", "external", "archive"])
include_metadata: bool = False
metadata_filter: Optional[Dict[str, Any]] = None

class SearchResult(BaseModel):
id: str
title: str
content_type: str
content_preview: str = Field(..., max_length=300)
similarity_score: float
metadata: Dict[str, Any] = Field(default_factory=dict)
created_at: datetime
updated_at: datetime
```

### TypeScript Interfaces (Frontend)

```typescript
export interface ChatQueryRequest {
query: string;
max_tokens: number;
max_context_chunks: number;
temperature: number;
content_types: string[];
similarity_threshold?: number;
use_conversation_context: boolean;
include_metadata: boolean;
}

export interface ContextChunk {
content: string;
source_title: string;
source_type: string;
similarity_score: number;
metadata: Record<string, any>;
content_preview: string;
}

export interface TokenUsage {
prompt_tokens: number;
completion_tokens: number;
total_tokens: number;
estimated_cost: number;
}

export interface ChatResponse {
content: string;
context_chunks: ContextChunk[];
token_usage: TokenUsage;
processing_time: number;
session_id: string;
message_id: string;
}

export interface SearchRequest {
query: string;
limit: number;
similarity_threshold: number;
content_types: string[];
include_metadata: boolean;
metadata_filter?: Record<string, any>;
}

export interface SearchResult {
id: string;
title: string;
content_type: string;
content_preview: string;
similarity_score: number;
metadata: Record<string, any>;
created_at: string;
updated_at: string;
}

// Chat session management
export interface ChatSession {
id: string;
title: string;
system_prompt: string;
message_count: number;
total_tokens: number;
created_at: string;
updated_at: string;
}

export interface ChatMessage {
id: string;
session_id: string;
role: 'user' | 'assistant';
content: string;
context_chunks?: ContextChunk[];
token_usage?: TokenUsage;
created_at: string;
}
```

---

## 🔁 Flows

### AI Chat with Context Retrieval Flow

1. **User Query** → User sends message to chat session
2. **Query Embedding** → Generate vector embedding for user query
3. **Semantic Search** → Search across all content types using cosine similarity
4. **Context Ranking** → Rank results by relevance and content type
5. **Context Assembly** → Combine top chunks into structured context
6. **Prompt Building** → Create LLM prompt with context and conversation history
7. **LLM Request** → Send to OpenAI with optimized prompt
8. **Response Processing** → Parse and validate AI response
9. **Storage** → Save message and context to database
10. **Token Tracking** → Update usage statistics and costs

**Modules**:
- `ChatService.process_query()` - orchestrates entire flow
- `SearchService.search_all_content()` - context retrieval
- `PromptBuilder.build_chat_prompt()` - prompt engineering
- `LLMClient.chat_completion()` - AI communication

---

### Semantic Search Flow

1. **Query Input** → User enters search query
2. **Query Processing** → Clean and normalize query text
3. **Embedding Generation** → Create query vector embedding
4. **Multi-Content Search** → Search documents, external data, archives in parallel
5. **Similarity Calculation** → Calculate cosine similarity scores
6. **Threshold Filtering** → Apply content-type specific thresholds
7. **Result Ranking** → Combine and rank results across content types
8. **Metadata Enrichment** → Add source information and previews
9. **Response Assembly** → Format results for presentation

**Modules**:
- `SearchService.search_all_content()` - main orchestration
- `EmbeddingService.generate_query_embedding()` - vector generation
- Content-specific search methods for each type
- Result ranking and formatting utilities

---

### Conversation Context Flow

1. **Context Request** → System needs conversation context for query
2. **History Retrieval** → Get recent messages from chat session
3. **Context Extraction** → Extract key information from previous messages
4. **Relevance Scoring** → Score historical context relevance to current query
5. **Context Integration** → Combine historical and semantic context
6. **Prompt Enhancement** → Include conversation context in LLM prompt

**Modules**:
- `ChatService.get_conversation_context()` - context extraction
- `PromptBuilder.integrate_conversation_context()` - context integration

---

## Advanced Features

### Smart Context Selection

```python
# Automatic context optimization based on query type
response = chat_service.process_query(
session_id=session.id,
query="How do I implement OAuth2 authentication?",
smart_context=True, # Automatically optimize context selection
# System will:
# - Prioritize technical documentation
# - Include code examples
# - Adjust similarity thresholds
# - Select optimal chunk count
)
```

### Multi-turn Conversation Awareness

```python
# System maintains conversation context automatically
session = chat_service.create_session(title="API Integration Help")

# First query
response1 = chat_service.process_query(
session_id=session.id,
query="How do I authenticate with your API?"
)

# Follow-up query - system understands "it" refers to API authentication
response2 = chat_service.process_query(
session_id=session.id,
query="What if it fails with a 401 error?"
# System automatically includes previous context
)
```

### Content Type Prioritization

```python
# Prioritize specific content types based on query analysis
response = chat_service.process_query(
session_id=session.id,
query="Show me the login function implementation",
content_type_weights={
'archive': 0.8, # Prioritize code archives
'document': 0.6, # Include documentation
'external': 0.3 # Lower priority for external data
}
)
```

### Real-time Search Suggestions

```python
# Get search suggestions as user types
suggestions = search_service.get_search_suggestions(
partial_query="machine learn",
limit=5,
min_query_length=3
)

# Returns: ["machine learning", "machine learning algorithms",
# "machine learning models", "machine learning tutorial",
# "machine learning best practices"]
```

---

## ⚠️ Anti-patterns to Avoid

### ❌ Excessive Context Retrieval

**Don't do this**:
```python
# Too much context overwhelms the LLM and increases costs
response = chat_service.process_query(
session_id=session.id,
query="What is authentication?",
max_context_chunks=20, # Overkill for simple question
max_tokens=4000 # Expensive and unnecessary
)
```

**Do this instead**:
```python
# Appropriate context for the query complexity
response = chat_service.process_query(
session_id=session.id,
query="What is authentication?",
max_context_chunks=3, # Sufficient for basic questions
max_tokens=300 # Concise and cost-effective
)
```

### ❌ Ignoring Similarity Thresholds

**Don't do this**:
```python
# Using same threshold for all content types
results = search_service.search_all_content(
query="user login",
similarity_threshold=0.9 # Too strict for all content types
)
```

**Do this instead**:
```python
# Let the system use optimized thresholds per content type
results = search_service.search_all_content(
query="user login"
# System automatically uses:
# - 0.7 for documents (general content)
# - 0.6 for archives (code similarity)
# - 0.5 for external data (structured data)
)
```

### ❌ Not Managing Conversation Length

**Don't do this**:
```python
# Unlimited conversation history
session = chat_service.create_session(
title="Long Conversation",
max_messages=None # Can grow indefinitely
)
```

**Do this instead**:
```python
# Reasonable conversation limits
session = chat_service.create_session(
title="Focused Conversation",
max_messages=50, # Prevent context overflow
auto_summarize=True # Summarize old messages
)
```

---

## Version Tracking

- `ADDED_IN: v1.0` - Basic chat and search functionality
- `ADDED_IN: v1.1` - Multi-content type search integration
- `ADDED_IN: v1.2` - Conversation context awareness
- `ADDED_IN: v1.3` - Smart context selection and content type weighting
- `CHANGED_IN: v1.4` - Optimized similarity thresholds per content type
- `ADDED_IN: v1.5` - Real-time search suggestions and hybrid search

---

## Performance Optimization

### Search Performance

- **Vector Indexes**: Automatic pgvector HNSW indexes for fast similarity search
- **Batch Processing**: Parallel search across content types
- **Caching**: Query embedding caching for repeated searches
- **Threshold Optimization**: Content-type specific similarity thresholds

### Chat Performance

- **Context Caching**: Cache assembled context for similar queries
- **Prompt Optimization**: Efficient prompt templates to minimize tokens
- **Response Streaming**: Stream responses for better user experience
- **Background Processing**: Async context preparation

### Cost Optimization

- **Token Counting**: Accurate token usage tracking and limits
- **Response Caching**: Cache responses for identical queries
- **Context Pruning**: Remove redundant context chunks
- **Batch Embeddings**: Process multiple queries efficiently

---

**DEPENDS_ON**: [SearchService, LLMClient, pgvector, OpenAI API, ChatSession models]
**USED_BY**: [Customer support, Knowledge portals, API documentation, Code exploration]
**TAGS**: `ai-chat, semantic-search, rag, conversational-ai, context-retrieval`