GitHub Actions commited on
Commit
37de870
·
1 Parent(s): 5c28dac

Deploy backend from GitHub Actions

Browse files

🚀 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (7) hide show
  1. README.md +1 -1
  2. README.md.bak +321 -0
  3. chatkit_server.py +1 -1
  4. main.py +1 -1
  5. rag/chat.py +1 -1
  6. rag/generation.py +1 -1
  7. test_chat.py +1 -1
README.md CHANGED
@@ -137,7 +137,7 @@ Environment variables:
137
  ```env
138
  # OpenAI
139
  OPENAI_API_KEY=your_api_key_here
140
- OPENAI_MODEL=gpt-4-turbo-preview
141
  OPENAI_EMBEDDING_MODEL=text-embedding-3-small
142
 
143
  # Qdrant
 
137
  ```env
138
  # OpenAI
139
  OPENAI_API_KEY=your_api_key_here
140
+ OPENAI_MODEL=gpt-5-nano
141
  OPENAI_EMBEDDING_MODEL=text-embedding-3-small
142
 
143
  # Qdrant
README.md.bak ADDED
@@ -0,0 +1,321 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: AI Book RAG API
3
+ emoji: 🤖
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ app_port: 7860
8
+ ---
9
+
10
+ # RAG Backend for Physical AI & Humanoid Robotics Book
11
+
12
+ A production-ready Retrieval-Augmented Generation (RAG) backend API for querying the "Physical AI & Humanoid Robotics" book content.
13
+
14
+ ## Features
15
+
16
+ - **Document Ingestion**: Automatic processing of Markdown book content
17
+ - **Semantic Search**: OpenAI embeddings for intelligent content retrieval
18
+ - **Streaming Chat**: Server-Sent Events for real-time responses
19
+ - **Citations**: Automatic source attribution in markdown format
20
+ - **Conversation Context**: Short-term memory for follow-up questions
21
+ - **Rate Limiting**: Token bucket pattern with API key support
22
+ - **Health Monitoring**: Comprehensive system health checks
23
+ - **Task Management**: Background ingestion with progress tracking
24
+
25
+ ## Quick Start
26
+
27
+ ### Prerequisites
28
+
29
+ - Python 3.11+
30
+ - OpenAI API key
31
+ - Qdrant instance (local or cloud)
32
+
33
+ ### Installation
34
+
35
+ 1. **Install uv** (if not already installed):
36
+ ```bash
37
+ # Unix/macOS
38
+ curl -LsSf https://astral.sh/uv/install.sh | sh
39
+
40
+ # Windows (PowerShell)
41
+ powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
42
+ ```
43
+
44
+ 2. **Clone and setup**:
45
+ ```bash
46
+ cd backend
47
+
48
+ # Install dependencies (10-100x faster than pip)
49
+ uv sync
50
+
51
+ # For development dependencies
52
+ uv sync --dev
53
+ ```
54
+
55
+ 2. **Configure environment**:
56
+ ```bash
57
+ cp .env.example .env
58
+ # Edit .env with your configuration
59
+ ```
60
+
61
+ 3. **Start Qdrant** (local):
62
+ ```bash
63
+ docker run -d --name qdrant -p 6333:6333 qdrant/qdrant:latest
64
+ ```
65
+
66
+ 4. **Run the API**:
67
+ ```bash
68
+ # Using uv (recommended)
69
+ uv run uvicorn main:app --host 0.0.0.0 --port 7860
70
+
71
+ # Or using the script command
72
+ uv run rag-server
73
+
74
+ # With auto-reload for development
75
+ uv run uvicorn main:app --host 0.0.0.0 --port 7860 --reload
76
+ ```
77
+
78
+ ### Ingest Book Content
79
+
80
+ 1. **Prepare content structure**:
81
+ ```
82
+ book_content/
83
+ ├── chapter1.md
84
+ ├── chapter2.md
85
+ └── chapter3/
86
+ ├── section1.md
87
+ └── section2.md
88
+ ```
89
+
90
+ 2. **Trigger ingestion**:
91
+ ```bash
92
+ # Using the API
93
+ curl -X POST "http://localhost:7860/ingest" \
94
+ -H "Content-Type: application/json" \
95
+ -d '{"content_path": "./book_content"}'
96
+
97
+ # Or using the script
98
+ python scripts/ingest.py --content-path ./book_content
99
+ ```
100
+
101
+ ### Chat with the Book
102
+
103
+ ```bash
104
+ curl -X POST "http://localhost:7860/chat" \
105
+ -H "Content-Type: application/json" \
106
+ -d '{
107
+ "question": "What is humanoid robotics?",
108
+ "stream": true
109
+ }'
110
+ ```
111
+
112
+ ## API Endpoints
113
+
114
+ ### Health Check
115
+ - `GET /health` - System health status
116
+
117
+ ### Chat
118
+ - `POST /chat` - Ask questions about the book
119
+ - Supports streaming responses (`stream: true`)
120
+ - Optional session ID for conversation context
121
+ - Configurable retrieval parameters
122
+
123
+ ### Ingestion
124
+ - `POST /ingest` - Trigger document ingestion
125
+ - `GET /ingest/status` - Get ingestion task status
126
+ - `GET /ingest/stats` - Get ingestion statistics
127
+ - `POST /ingest/{task_id}/cancel` - Cancel an ingestion task
128
+
129
+ ### Management
130
+ - `GET /collections` - List Qdrant collections
131
+ - `DELETE /collections/{collection_name}` - Delete a collection
132
+
133
+ ## Configuration
134
+
135
+ Environment variables:
136
+
137
+ ```env
138
+ # OpenAI
139
+ OPENAI_API_KEY=your_api_key_here
140
+ OPENAI_MODEL=gpt-4-turbo-preview
141
+ OPENAI_EMBEDDING_MODEL=text-embedding-3-small
142
+
143
+ # Qdrant
144
+ QDRANT_URL=http://localhost:6333
145
+ QDRANT_API_KEY=your_api_key
146
+
147
+ # Content
148
+ BOOK_CONTENT_PATH=./book_content
149
+ CHUNK_SIZE=1000
150
+ CHUNK_OVERLAP=200
151
+
152
+ # API
153
+ API_HOST=0.0.0.0
154
+ API_PORT=7860
155
+ RATE_LIMIT_REQUESTS=60
156
+ RATE_LIMIT_WINDOW=60
157
+
158
+ # Conversation
159
+ MAX_CONTEXT_MESSAGES=3
160
+ CONTEXT_WINDOW_SIZE=4000
161
+ ```
162
+
163
+ ## Architecture
164
+
165
+ ### Core Components
166
+
167
+ 1. **Ingestion Pipeline**:
168
+ - Markdown discovery and parsing
169
+ - Semantic chunking with overlap
170
+ - OpenAI embedding generation
171
+ - Qdrant vector storage
172
+
173
+ 2. **Chat System**:
174
+ - Query embedding generation
175
+ - Semantic document retrieval
176
+ - Context-aware response generation
177
+ - Server-Sent Events streaming
178
+
179
+ 3. **Management Layer**:
180
+ - Background task management
181
+ - Progress tracking
182
+ - Health monitoring
183
+ - Rate limiting
184
+
185
+ ### Design Principles
186
+
187
+ - **Pure Python**: No LangChain dependency
188
+ - **Async/Await**: Full async implementation
189
+ - **Production Ready**: Error handling, logging, monitoring
190
+ - **HF Spaces Compatible**: Docker configuration for deployment
191
+
192
+ ## Deployment
193
+
194
+ ### Hugging Face Spaces
195
+
196
+ 1. Create a new Space with Docker template
197
+ 2. Add secrets:
198
+ - `OPENAI_API_KEY`
199
+ - `QDRANT_URL`
200
+ - `QDRANT_API_KEY` (if using cloud)
201
+ 3. Push code to Space
202
+
203
+ ### Docker
204
+
205
+ ```bash
206
+ # Build
207
+ docker build -t rag-backend .
208
+
209
+ # Run
210
+ docker run -d \
211
+ --name rag-backend \
212
+ -p 7860:7860 \
213
+ -e OPENAI_API_KEY=$OPENAI_API_KEY \
214
+ -e QDRANT_URL=$QDRANT_URL \
215
+ rag-backend
216
+ ```
217
+
218
+ ## Development
219
+
220
+ ### Running Tests
221
+
222
+ ```bash
223
+ # Install dev dependencies
224
+ pip install -r requirements.txt
225
+
226
+ # Run tests
227
+ pytest
228
+
229
+ # Run with coverage
230
+ pytest --cov=rag tests/
231
+ ```
232
+
233
+ ### Code Style
234
+
235
+ ```bash
236
+ # Format code
237
+ black .
238
+
239
+ # Lint
240
+ ruff check .
241
+
242
+ # Type check
243
+ mypy .
244
+ ```
245
+
246
+ ## Monitoring
247
+
248
+ ### Health Checks
249
+
250
+ The `/health` endpoint provides:
251
+ - Service status
252
+ - Connection health
253
+ - System metrics
254
+ - Active tasks
255
+
256
+ ### Logging
257
+
258
+ Structured JSON logging with:
259
+ - Request tracing
260
+ - Error details
261
+ - Performance metrics
262
+
263
+ ### Metrics
264
+
265
+ Track:
266
+ - Response times
267
+ - Token usage
268
+ - Error rates
269
+ - Active conversations
270
+
271
+ ## Security
272
+
273
+ - Rate limiting with slowapi
274
+ - Optional API key authentication
275
+ - Input validation
276
+ - Error message sanitization
277
+
278
+ ## Performance
279
+
280
+ - Connection pooling for Qdrant
281
+ - Batch embedding generation
282
+ - Efficient token counting
283
+ - Configurable batch sizes
284
+
285
+ ## Troubleshooting
286
+
287
+ ### Common Issues
288
+
289
+ 1. **Qdrant Connection Failed**:
290
+ - Check Qdrant is running
291
+ - Verify URL and API key
292
+ - Check network connectivity
293
+
294
+ 2. **OpenAI API Errors**:
295
+ - Verify API key is valid
296
+ - Check quota limits
297
+ - Implement retries
298
+
299
+ 3. **Memory Issues**:
300
+ - Reduce batch sizes
301
+ - Limit concurrent requests
302
+ - Monitor chunk sizes
303
+
304
+ ### Debug Mode
305
+
306
+ Enable debug logging:
307
+ ```bash
308
+ LOG_LEVEL=DEBUG uvicorn main:app --reload
309
+ ```
310
+
311
+ ## Contributing
312
+
313
+ 1. Fork the repository
314
+ 2. Create a feature branch
315
+ 3. Make your changes
316
+ 4. Add tests
317
+ 5. Submit a pull request
318
+
319
+ ## License
320
+
321
+ This project is licensed under the MIT License.
chatkit_server.py CHANGED
@@ -32,7 +32,7 @@ class Settings(BaseSettings):
32
  openai_api_key: str
33
  qdrant_url: str
34
  qdrant_api_key: str
35
- openai_model: str = "gpt-4-turbo-preview"
36
  openai_embedding_model: str = "text-embedding-3-small"
37
 
38
  class Config:
 
32
  openai_api_key: str
33
  qdrant_url: str
34
  qdrant_api_key: str
35
+ openai_model: str = "gpt-5-nano"
36
  openai_embedding_model: str = "text-embedding-3-small"
37
 
38
  class Config:
main.py CHANGED
@@ -58,7 +58,7 @@ class Settings(BaseSettings):
58
 
59
  # OpenAI Configuration
60
  openai_api_key: str = os.getenv("OPENAI_API_KEY", "")
61
- openai_model: str = os.getenv("OPENAI_MODEL", "gpt-4-turbo-preview")
62
  openai_embedding_model: str = os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-3-small")
63
 
64
  # Qdrant Configuration
 
58
 
59
  # OpenAI Configuration
60
  openai_api_key: str = os.getenv("OPENAI_API_KEY", "")
61
+ openai_model: str = os.getenv("OPENAI_MODEL", "gpt-5-nano")
62
  openai_embedding_model: str = os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-3-small")
63
 
64
  # Qdrant Configuration
rag/chat.py CHANGED
@@ -33,7 +33,7 @@ class ChatHandler:
33
  self,
34
  qdrant_manager: QdrantManager,
35
  openai_api_key: str,
36
- model: str = "gpt-4-turbo-preview",
37
  embedding_model: str = "text-embedding-3-small",
38
  max_context_messages: int = 3,
39
  context_window_size: int = 4000,
 
33
  self,
34
  qdrant_manager: QdrantManager,
35
  openai_api_key: str,
36
+ model: str = "gpt-5-nano",
37
  embedding_model: str = "text-embedding-3-small",
38
  max_context_messages: int = 3,
39
  context_window_size: int = 4000,
rag/generation.py CHANGED
@@ -23,7 +23,7 @@ class ResponseGenerator:
23
  def __init__(
24
  self,
25
  openai_api_key: str,
26
- model: str = "gpt-4-turbo-preview",
27
  max_tokens: int = 1000,
28
  temperature: float = 0.7
29
  ):
 
23
  def __init__(
24
  self,
25
  openai_api_key: str,
26
+ model: str = "gpt-5-nano",
27
  max_tokens: int = 1000,
28
  temperature: float = 0.7
29
  ):
test_chat.py CHANGED
@@ -46,7 +46,7 @@ async def test_chat():
46
  chat_handler = ChatHandler(
47
  qdrant_manager=qdrant_manager,
48
  openai_api_key=os.getenv("OPENAI_API_KEY"),
49
- model=os.getenv("OPENAI_MODEL", "gpt-4-turbo-preview"),
50
  embedding_model=os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-3-small")
51
  )
52
 
 
46
  chat_handler = ChatHandler(
47
  qdrant_manager=qdrant_manager,
48
  openai_api_key=os.getenv("OPENAI_API_KEY"),
49
+ model=os.getenv("OPENAI_MODEL", "gpt-5-nano"),
50
  embedding_model=os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-3-small")
51
  )
52