absiitr commited on
Commit
9370738
Β·
verified Β·
1 Parent(s): 58e21db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -12
README.md CHANGED
@@ -1,20 +1,96 @@
1
  ---
2
- title: PDF Assistant
3
- emoji: πŸš€
4
- colorFrom: red
5
- colorTo: red
6
  sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
  pinned: false
11
- short_description: Streamlit template space
12
  license: mit
13
  ---
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: PDF RAG Chatbot (Groq)
3
+ emoji: πŸ“˜
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
+ app_port: 7860
 
 
8
  pinned: false
 
9
  license: mit
10
  ---
11
 
12
+ # πŸ“˜ PDF RAG Chatbot (Groq + LangChain)
13
 
14
+ A **Retrieval-Augmented Generation (RAG)** application that allows users to:
15
 
16
+ - Upload a **PDF**
17
+ - Ask questions based **only on the PDF content**
18
+ - Get accurate answers powered by **Groq LLMs**
19
+ - Runs fully on **CPU (Hugging Face Free Tier)**
20
+
21
+ ---
22
+
23
+ ## πŸš€ Features
24
+
25
+ - πŸ“„ PDF upload & processing
26
+ - βœ‚οΈ Intelligent text chunking
27
+ - πŸ” Semantic search using embeddings
28
+ - 🧠 Context-aware LLM responses
29
+ - 🧹 Memory clear & health endpoints
30
+ - ⚑ Fast inference via **Groq API**
31
+
32
+ ---
33
+
34
+ ## 🧱 Tech Stack
35
+
36
+ - **Frontend**: Streamlit
37
+ - **Backend**: FastAPI
38
+ - **LLM**: Groq (`llama-3.1-8b-instant`)
39
+ - **Embeddings**: `all-MiniLM-L6-v2`
40
+ - **Vector DB**: Chroma (in-memory)
41
+ - **Frameworks**: LangChain
42
+ - **Deployment**: Docker + Hugging Face Spaces
43
+
44
+ ---
45
+
46
+ ## πŸ§ͺ How It Works (RAG Pipeline)
47
+
48
+ 1. Upload PDF
49
+ 2. Split text into chunks
50
+ 3. Generate embeddings
51
+ 4. Store in vector database
52
+ 5. Retrieve relevant chunks
53
+ 6. Generate answer using Groq LLM
54
+
55
+ ---
56
+
57
+ ## πŸ–₯️ Usage
58
+
59
+ 1. Upload a PDF file
60
+ 2. Ask questions related to the document
61
+ 3. If the answer is not in the PDF, the assistant will reply:
62
+ > **"I cannot find this in the PDF."**
63
+
64
+ ---
65
+
66
+ ## πŸ” Environment Variables
67
+
68
+ The following secret **must** be added in Hugging Face Spaces:
69
+
70
+ | Variable | Description |
71
+ |--------|------------|
72
+ | `GROQ_API_KEY` | Groq API key |
73
+
74
+ > ⚠️ Do NOT commit `.env` files to the repository.
75
+
76
+ ---
77
+
78
+ ## ❀️ Notes
79
+
80
+ - Runs on **CPU only** (no GPU required)
81
+ - Free-tier friendly
82
+ - First load may take a few minutes
83
+ - Space may sleep when idle
84
+
85
+ ---
86
+
87
+ ## πŸ‘¨β€πŸ’» Author
88
+
89
+ **Abhishek Saxena**
90
+ M.Tech Data Science, IIT Roorkee
91
+
92
+ ---
93
+
94
+ ## ⭐ If you like this project
95
+
96
+ Give it a ⭐ on Hugging Face and feel free to fork!