OliverPerrin commited on
Commit
c51e8ce
Β·
1 Parent(s): 9095ecc

Add LexiMind project files and models

Browse files
Files changed (2) hide show
  1. .gitattributes +33 -2
  2. README.md +14 -67
.gitattributes CHANGED
@@ -1,4 +1,35 @@
1
- *.pt filter=lfs diff=lfs merge=lfs -text
 
2
  *.bin filter=lfs diff=lfs merge=lfs -text
3
- models/**/*.pt filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
4
  *.model filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
  *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,67 +1,14 @@
1
- # LexiMind (Inference Edition)
2
-
3
- LexiMind now ships as a focused inference sandbox for the custom multitask Transformer found in
4
- `src/models`. Training, dataset downloaders, and legacy scripts have been removed so it is easy to
5
- load a checkpoint, run the Streamlit demo, and experiment with summarization, emotion
6
- classification, and topic cues on your own text.
7
-
8
- ## What Stays
9
- - Transformer encoder/decoder and task heads under `src/models`
10
- - Unit tests for the model stack (`tests/test_models`)
11
- - Streamlit UI (`src/ui/streamlit_app.py`) wired to the inference helpers in `src/api/inference`
12
-
13
- ## What Changed
14
- - Hugging Face tokenizers provide all tokenization (see `TextPreprocessor`)
15
- - Training, dataset downloaders, and CLI scripts have been removed
16
- - Scikit-learn powers light text normalization (stop-word removal optional)
17
- - Requirements trimmed to inference-only dependencies
18
-
19
- ## Quick Start
20
- ```bash
21
- git clone https://github.com/OliverPerrin/LexiMind.git
22
- cd LexiMind
23
- pip install -r requirements.txt
24
-
25
- # Optional extras via setup.py packaging metadata
26
- pip install .[web] # installs streamlit + plotly
27
- pip install .[api] # installs fastapi
28
- pip install .[all] # installs both groups
29
-
30
- streamlit run src/ui/streamlit_app.py
31
- ```
32
-
33
- Configure the Streamlit app via the sidebar to point at your tokenizer directory and model
34
- checkpoint (defaults assume `artifacts/hf_tokenizer` and `checkpoints/best.pt`).
35
-
36
- ## Minimal Project Map
37
- ```
38
- src/
39
- β”œβ”€β”€ api/ # load_models + helpers
40
- β”œβ”€β”€ data/ # TextPreprocessor using Hugging Face + sklearn
41
- β”œβ”€β”€ inference/ # thin summarizer facade
42
- β”œβ”€β”€ models/ # core Transformer architecture (untouched)
43
- └── ui/ # Streamlit interface
44
- ```
45
-
46
- Everything outside `src/` now holds optional assets such as checkpoints, tokenizer exports, and
47
- documentation stubs.
48
-
49
- ## Loading a Checkpoint Programmatically
50
- ```python
51
- from src.api.inference import load_models, summarize_text
52
-
53
- models = load_models({
54
- "checkpoint_path": "checkpoints/best.pt",
55
- "tokenizer_path": "artifacts/hf_tokenizer",
56
- "hf_tokenizer_name": "facebook/bart-base",
57
- })
58
-
59
- summary, _ = summarize_text("Paste any article here.", models=models)
60
- print(summary)
61
- ```
62
-
63
- ## License
64
- GPL-3.0
65
-
66
- ## Author
67
- Oliver Perrin Β· [email protected]
 
1
+ ---
2
+ title: LexiMind
3
+ emoji: πŸ“Š
4
+ colorFrom: pink
5
+ colorTo: gray
6
+ sdk: gradio
7
+ sdk_version: 5.49.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: Multi-Task Transformer for Document Analysis
12
+ ---
13
+
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference