Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,20 +1,96 @@
|
|
| 1 |
---
|
| 2 |
-
title: PDF
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: docker
|
| 7 |
-
app_port:
|
| 8 |
-
tags:
|
| 9 |
-
- streamlit
|
| 10 |
pinned: false
|
| 11 |
-
short_description: Streamlit template space
|
| 12 |
license: mit
|
| 13 |
---
|
| 14 |
|
| 15 |
-
#
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: PDF RAG Chatbot (Groq)
|
| 3 |
+
emoji: π
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
+
app_port: 7860
|
|
|
|
|
|
|
| 8 |
pinned: false
|
|
|
|
| 9 |
license: mit
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# π PDF RAG Chatbot (Groq + LangChain)
|
| 13 |
|
| 14 |
+
A **Retrieval-Augmented Generation (RAG)** application that allows users to:
|
| 15 |
|
| 16 |
+
- Upload a **PDF**
|
| 17 |
+
- Ask questions based **only on the PDF content**
|
| 18 |
+
- Get accurate answers powered by **Groq LLMs**
|
| 19 |
+
- Runs fully on **CPU (Hugging Face Free Tier)**
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## π Features
|
| 24 |
+
|
| 25 |
+
- π PDF upload & processing
|
| 26 |
+
- βοΈ Intelligent text chunking
|
| 27 |
+
- π Semantic search using embeddings
|
| 28 |
+
- π§ Context-aware LLM responses
|
| 29 |
+
- π§Ή Memory clear & health endpoints
|
| 30 |
+
- β‘ Fast inference via **Groq API**
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## π§± Tech Stack
|
| 35 |
+
|
| 36 |
+
- **Frontend**: Streamlit
|
| 37 |
+
- **Backend**: FastAPI
|
| 38 |
+
- **LLM**: Groq (`llama-3.1-8b-instant`)
|
| 39 |
+
- **Embeddings**: `all-MiniLM-L6-v2`
|
| 40 |
+
- **Vector DB**: Chroma (in-memory)
|
| 41 |
+
- **Frameworks**: LangChain
|
| 42 |
+
- **Deployment**: Docker + Hugging Face Spaces
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## π§ͺ How It Works (RAG Pipeline)
|
| 47 |
+
|
| 48 |
+
1. Upload PDF
|
| 49 |
+
2. Split text into chunks
|
| 50 |
+
3. Generate embeddings
|
| 51 |
+
4. Store in vector database
|
| 52 |
+
5. Retrieve relevant chunks
|
| 53 |
+
6. Generate answer using Groq LLM
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## π₯οΈ Usage
|
| 58 |
+
|
| 59 |
+
1. Upload a PDF file
|
| 60 |
+
2. Ask questions related to the document
|
| 61 |
+
3. If the answer is not in the PDF, the assistant will reply:
|
| 62 |
+
> **"I cannot find this in the PDF."**
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## π Environment Variables
|
| 67 |
+
|
| 68 |
+
The following secret **must** be added in Hugging Face Spaces:
|
| 69 |
+
|
| 70 |
+
| Variable | Description |
|
| 71 |
+
|--------|------------|
|
| 72 |
+
| `GROQ_API_KEY` | Groq API key |
|
| 73 |
+
|
| 74 |
+
> β οΈ Do NOT commit `.env` files to the repository.
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## β€οΈ Notes
|
| 79 |
+
|
| 80 |
+
- Runs on **CPU only** (no GPU required)
|
| 81 |
+
- Free-tier friendly
|
| 82 |
+
- First load may take a few minutes
|
| 83 |
+
- Space may sleep when idle
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## π¨βπ» Author
|
| 88 |
+
|
| 89 |
+
**Abhishek Saxena**
|
| 90 |
+
M.Tech Data Science, IIT Roorkee
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
## β If you like this project
|
| 95 |
+
|
| 96 |
+
Give it a β on Hugging Face and feel free to fork!
|