Instructions to use darcar0/quotebound-27b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use darcar0/quotebound-27b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="darcar0/quotebound-27b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("darcar0/quotebound-27b", dtype="auto")

PEFT
How to use darcar0/quotebound-27b with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use darcar0/quotebound-27b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "darcar0/quotebound-27b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "darcar0/quotebound-27b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/darcar0/quotebound-27b

SGLang

How to use darcar0/quotebound-27b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "darcar0/quotebound-27b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "darcar0/quotebound-27b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "darcar0/quotebound-27b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "darcar0/quotebound-27b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use darcar0/quotebound-27b with Docker Model Runner:
```
docker model run hf.co/darcar0/quotebound-27b
```

darcar0 commited on Apr 26

Commit

3a33e72

verified ·

1 Parent(s): b314f67

Polish model card structure and copy

Browse files

Files changed (1) hide show

README.md +132 -113

README.md CHANGED Viewed

@@ -4,6 +4,7 @@ language:
 license: apache-2.0
 base_model:
   - Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2
 datasets:
   - fever/fever
   - hotpotqa/hotpot_qa
@@ -12,6 +13,7 @@ pipeline_tag: text-generation
 tags:
   - reasoning
   - evidence-grounding
   - attribution
   - fever
   - hotpotqa
@@ -23,47 +25,57 @@ tags:
 # Quotebound 27B
-*The standalone model release from Evidence-Faithful Reasoning, built on the
-Qwen 3.5 Opus Distilled 27B base.*
-Quotebound 27B is the downloadable model release for
-Evidence-Faithful Reasoning: a LoRA adapter that turns its
-reasoning-distilled 27B base model into an evidence-first reader for
-closed packets of source text. Every answer has to land on the right
-evidence units, quote them verbatim, and stop with
-`Insufficient evidence.` when the packet does not justify a claim.
-![Fresh public holdout: Quotebound 27B versus the prior bridge model](./standalone_holdout_comparison.svg)
 *On a fresh 36-task public holdout, Quotebound 27B improves task accuracy,
-evidence F1, and quote F1 over the prior bridge model. The packet-local
-quote normalizer carries the full stack to `0.9093` quote F1.*
-## At a glance
-- **What it is.** A LoRA adapter on top of
-  [`Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2`](https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2),
-  trained to answer from closed packets of source text under a strict
-  answer–evidence–quote–abstain contract.
-- **The headline number.** Raw quote F1 on a fresh public holdout roughly
-  doubles over the prior bridge model (`0.3343` → `0.6815`), meaning much
-  more of the grounding behavior now lives inside the model itself instead
-  of in a post-processing layer.
-- **Other deltas on the same holdout.** Raw task: `0.8611` → `0.8889`.
-  Raw strict: `0.2222` → `0.4444`. Raw evidence F1: `0.8815` → `0.9093`.
-  Zero invalid outputs across every reported evaluation surface.
-- **What it isn't.** Not a general chatbot. Not a replacement for the
-  benchmark-winning hybrid system, which is described below as a separate
-  result.
-## Read next
-- [Technical note](./technical_note_evidence_faithful_reasoning.md) — full method, results, and discussion.
-- [Frozen benchmark progression chart](./benchmark_progression.svg)
 ## Quick start
-Load the 27B base model and attach the adapter:
 ```python
 from peft import PeftModel
@@ -73,36 +85,37 @@ base_id = "Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2"
 adapter_id = "darcar0/quotebound-27b"
 tokenizer = AutoTokenizer.from_pretrained(base_id)
-base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto")
 model = PeftModel.from_pretrained(base, adapter_id)
 ```
-The base is a 27B-parameter model, so load it in whichever quantization
-your hardware supports (4-bit `bitsandbytes` works for inference).
-## The contract
-Each task arrives with a closed packet of source text. To count as a
-success, the model has to clear four conditions on the same answer:
-1. **Answer correctly** — return the right answer or label for the task.
-2. **Pick the right evidence** — the cited units must be the packet
-   locations that actually support the answer.
-3. **Quote exact support** — every quote is a verbatim substring of its
-   cited unit. No paraphrase, no stitching, no ellipsis.
-4. **Abstain when blocked** — if the packet does not justify a claim,
-   the answer must be exactly `Insufficient evidence.`
-Correctness alone is not credited. The model has been trained to fail
-closed when the packet runs out, and to ground every answer it does
-return.
-## Prompt format
 The model is trained for an evidence-first prompt that makes the answer
 subordinate to the cited text. A minimal version:
-```
 You are answering from a bounded evidence packet only.
 Work in this order:
@@ -115,11 +128,11 @@ Rules:
 - Return valid JSON only.
 - Every quote must be a verbatim substring of the cited unit.
 - Do not paraphrase, ellipsize, or stitch quotes.
-- If the packet is insufficient, the `answer` field must be exactly
-  `Insufficient evidence.`
 ```
-The model then writes a JSON object with this shape:
 ```json
 {
@@ -138,9 +151,9 @@ The model then writes a JSON object with this shape:
 ### Fresh 36-task mixed public holdout
-A held-out slice of 18 FEVER verify-claim tasks plus 18 HotpotQA
-grounded-QA tasks, drawn from public sources and de-duplicated against
-every training, dev, and held-out probe row.
 | Stack | Task | Strict | Evidence F1 | Quote F1 |
 |---|---:|---:|---:|---:|
@@ -149,11 +162,17 @@ every training, dev, and held-out probe row.
 | Bridge + `deterministic_v3` | 0.8611 | 0.5833 | 0.8815 | 0.8815 |
 | **Quotebound + `deterministic_v3`** | **0.8889** | **0.5833** | **0.9093** | **0.9093** |
-Quotebound 27B beats the prior bridge model on task accuracy, evidence F1,
-and quote F1 in both raw and normalized form, ties normalized strict, and
-roughly doubles raw quote F1 at the model level.
-### Fixed dev triage slice (21 tasks)
 | Stack | Task | Strict | Evidence F1 | Quote F1 |
 |---|---:|---:|---:|---:|
@@ -161,69 +180,59 @@ roughly doubles raw quote F1 at the model level.
 ### Untouched 104-task HotpotQA shadow slice
-On a 104-task HotpotQA shadow slice that was never touched during
-selection, Quotebound raw improved quote-faithful behavior over the prior
-bridge model, and Quotebound plus `deterministic_v3` matched bridge +
-`deterministic_v3` at the system level. The surface is reported as a
-narrative parity result because the freeze memo does not publish
-per-metric cells for it.
 ## Release architecture
-The project ends in two finished results that are reported separately on
-purpose. One is the strongest full system on the held-out benchmark; the
-other is the strongest standalone model — and the artifact you can
-actually download.
-1. **Quotebound 27B — this page.** The adapter above is the strongest
-   version of the project's evidence-faithful behavior that moved into the
-   model itself, evaluated across multiple surfaces beyond the held-out
-   probe.
-2. **The benchmark-winning hybrid system.** A trained bridge checkpoint
-   plus the `deterministic_v3` packet-local quote normalizer. That stack
-   is the only configuration that clears every gate of the strict
-   contract on the frozen held-out probe (`probe_v0`).
-The two results do not collapse into one. The hybrid system is the
-benchmark winner. Quotebound 27B is the downloadable model. Perfect
-`probe_v0` belongs to the hybrid system, not to the adapter on this page
-alone.
 ## Intended use
-Use this release for work that has to stay inside a fixed body of text:
 - bounded document QA with explicit evidence requirements,
-- claim verification and grounded QA from closed packets of source text,
-- policy, compliance, contract, and internal-document workflows where
-  each answer has to be justified from the provided text,
-- research on evidence-faithful reasoning and abstention behavior.
 ## Limitations
-- The download is the LoRA adapter only — the 27B base model is required.
-- The `deterministic_v3` packet-local quote normalizer is *not* shipped
-  here. It lives in the project repository as a separate post-processing
-  step. Quotebound 27B alone reproduces the raw standalone gains above;
-  normalized system-level rows require adapter + normalizer.
-- Perfect `probe_v0` belongs to the benchmark-winning hybrid system, not
-  to this adapter alone.
-- Specialized for closed-packet reasoning. Behavior outside that setting
-  — open chat, open-domain QA, free-form generation — is not
-  characterized.
-- Raw item-level contents of the held-out probe are intentionally not
-  published with the release; the held-out gate has to stay closed to
-  remain meaningful.
-## Citation and references
-- Base model:
-  [Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2](https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2)
-- Datasets:
-  [fever/fever](https://huggingface.co/datasets/fever/fever),
-  [hotpotqa/hotpot_qa](https://huggingface.co/datasets/hotpotqa/hotpot_qa)
-- Technical note:
-  [technical_note_evidence_faithful_reasoning.md](./technical_note_evidence_faithful_reasoning.md)
 ```bibtex
 @misc{quotebound_27b_2026,
@@ -234,3 +243,13 @@ Use this release for work that has to stay inside a fixed body of text:
   url          = {https://huggingface.co/darcar0/quotebound-27b}
 }
 ```

 license: apache-2.0
 base_model:
   - Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2
+base_model_relation: adapter
 datasets:
   - fever/fever
   - hotpotqa/hotpot_qa
 tags:
   - reasoning
   - evidence-grounding
+  - grounded-qa
   - attribution
   - fever
   - hotpotqa
 # Quotebound 27B
+**A 27B LoRA adapter for evidence-faithful reasoning over closed packets of
+source text.**
+Quotebound 27B is the standalone model release from the
+Evidence-Faithful Reasoning project. It is trained to read a bounded evidence
+packet, identify the supporting units, copy exact quotes, and abstain with
+`Insufficient evidence.` when the packet does not justify an answer.
+The project asks a stricter question than "did the model get the answer right?"
+It asks whether the answer is recoverably grounded in the supplied text.
+![Fresh public holdout: Quotebound 27B versus the prior bridge model](./assets/standalone_holdout_comparison.svg)
 *On a fresh 36-task public holdout, Quotebound 27B improves task accuracy,
+evidence F1, and quote F1 over the prior bridge model. The largest raw gain is
+quote faithfulness: `0.3343` -> `0.6815`.*
+## Result snapshot
+| Question | Answer |
+|---|---|
+| What ships here? | A PEFT/LoRA adapter for `Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2`. |
+| What changed inside the model? | Raw quote F1 roughly doubled on the fresh public holdout: `0.3343` -> `0.6815`. |
+| Best standalone-system row on that holdout | Quotebound + `deterministic_v3`: task `0.8889`, strict `0.5833`, evidence F1 `0.9093`, quote F1 `0.9093`. |
+| Output reliability | Zero invalid outputs across every reported evaluation surface. |
+| Important boundary | Perfect `probe_v0` belongs to the benchmark-winning hybrid stack, not to this adapter alone. |
+## Why this model exists
+Reasoning-tuned models can sound structured while grounding badly: they may
+answer correctly but cite the wrong evidence, corrupt a quote, or keep going
+when the packet is actually insufficient.
+Quotebound 27B is trained for a narrower, auditable behavior:
+1. choose the smallest sufficient evidence units,
+2. quote those units verbatim,
+3. answer only from those units,
+4. refuse cleanly when the packet runs out.
+Correctness alone is not credited. The model is meant for settings where a user
+needs the answer and the support to survive inspection together.
 ## Quick start
+Install the usual Transformers + PEFT stack, then load the base model and
+attach the adapter:
+```bash
+pip install -U transformers peft accelerate bitsandbytes
+```
 ```python
 from peft import PeftModel
 adapter_id = "darcar0/quotebound-27b"
 tokenizer = AutoTokenizer.from_pretrained(base_id)
+base = AutoModelForCausalLM.from_pretrained(
+    base_id,
+    device_map="auto",
+    torch_dtype="auto",
+)
 model = PeftModel.from_pretrained(base, adapter_id)
+model.eval()
 ```
+The base is a 27B-parameter model. Use the quantization and serving setup your
+hardware requires; 4-bit loading with `bitsandbytes` is a practical inference
+path on constrained GPUs.
+## Model details
+| Field | Value |
+|---|---|
+| Adapter | `darcar0/quotebound-27b` |
+| Base model | [`Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2`](https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2) |
+| Artifact type | LoRA / PEFT adapter |
+| Primary behavior | Closed-packet grounded QA, claim verification, exact quote attribution, and abstention |
+| Output style | JSON with answer, evidence IDs, verbatim quotes, and short justification |
+| Training sources | Public FEVER-style verify-claim data, public HotpotQA-style grounded-QA data, and project-local packet scaffolding derived from those sources |
+| License | Apache 2.0 |
+## Prompt contract
 The model is trained for an evidence-first prompt that makes the answer
 subordinate to the cited text. A minimal version:
+```text
 You are answering from a bounded evidence packet only.
 Work in this order:
 - Return valid JSON only.
 - Every quote must be a verbatim substring of the cited unit.
 - Do not paraphrase, ellipsize, or stitch quotes.
+- If the packet is insufficient, the answer field must be exactly
+  "Insufficient evidence."
 ```
+Expected output shape:
 ```json
 {
 ### Fresh 36-task mixed public holdout
+The main standalone comparison uses a fresh 36-task public holdout: 18 FEVER
+verify-claim tasks and 18 HotpotQA grounded-QA tasks. Source rows were
+de-duplicated against training, dev, and `probe_v0` rows.
 | Stack | Task | Strict | Evidence F1 | Quote F1 |
 |---|---:|---:|---:|---:|
 | Bridge + `deterministic_v3` | 0.8611 | 0.5833 | 0.8815 | 0.8815 |
 | **Quotebound + `deterministic_v3`** | **0.8889** | **0.5833** | **0.9093** | **0.9093** |
+How to read this table:
+- **Raw rows** measure the model outputs before quote repair.
+- **`deterministic_v3` rows** add the packet-local quote normalizer from the
+  project repository.
+- Quotebound improves task accuracy, evidence F1, and quote F1 in both raw and
+  normalized form; it also ties normalized strict success.
+- The largest model-side gain is raw quote faithfulness, from `0.3343` to
+  `0.6815`.
+### Fixed dev triage slice
 | Stack | Task | Strict | Evidence F1 | Quote F1 |
 |---|---:|---:|---:|---:|
 ### Untouched 104-task HotpotQA shadow slice
+On a 104-task HotpotQA shadow slice that was never touched during selection,
+Quotebound raw improved quote-faithful behavior over the prior bridge model.
+Quotebound plus `deterministic_v3` matched bridge plus `deterministic_v3` at
+the system level. This surface is reported as a narrative parity result because
+the freeze memo does not publish per-metric cells for it.
 ## Release architecture
+The project ends in two finished results that are intentionally reported
+separately:
+| Result | What it is | What it proves |
+|---|---|---|
+| **Quotebound 27B** | The downloadable LoRA adapter on this page. | More of the evidence-faithful behavior moved into the model itself, with gains across non-`probe_v0` surfaces. |
+| **Benchmark-winning hybrid stack** | A trained bridge checkpoint plus the `deterministic_v3` packet-local quote normalizer. | The full system clears every gate of the strict contract on frozen held-out `probe_v0`. |
+These are connected, but they are not the same claim. Quotebound 27B is the
+standalone model release. The hybrid stack is the benchmark-facing winner.
+Perfect `probe_v0` belongs to the hybrid stack, not to this adapter alone.
 ## Intended use
+Use this release when answers must stay inside a fixed body of supplied text:
 - bounded document QA with explicit evidence requirements,
+- claim verification over closed packets of source text,
+- policy, compliance, contract, and internal-document review where answers
+  need source-text support,
+- research on evidence-faithful reasoning, quote fidelity, and abstention.
 ## Limitations
+- This is not a general chatbot. Open-domain QA, open chat, and free-form
+  generation outside the closed-packet setup are not characterized.
+- The downloadable artifact is the LoRA adapter only; the 27B base model is
+  required.
+- `deterministic_v3` is not shipped as part of this model repo. It is a
+  separate packet-local post-processing step in the project repository.
+- Perfect `probe_v0` belongs to the benchmark-winning hybrid stack, not to this
+  adapter alone.
+- Raw item-level contents of the frozen held-out probe are intentionally not
+  published; the held-out gate has to stay closed to remain meaningful.
+- For high-stakes use, treat the model as an evidence-grounding component that
+  still requires human review and application-specific validation.
+## Read next
+- [Technical note](./technical_note_evidence_faithful_reasoning.md) - full
+  method, release boundary, and result discussion.
+- [Frozen benchmark progression chart](./assets/benchmark_progression.svg)
+- [Fresh holdout comparison chart](./assets/standalone_holdout_comparison.svg)
+## Citation
 ```bibtex
 @misc{quotebound_27b_2026,
   url          = {https://huggingface.co/darcar0/quotebound-27b}
 }
 ```
+## References
+- Base model:
+  [Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2](https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2)
+- Datasets:
+  [fever/fever](https://huggingface.co/datasets/fever/fever),
+  [hotpotqa/hotpot_qa](https://huggingface.co/datasets/hotpotqa/hotpot_qa)
+- Technical note:
+  [technical_note_evidence_faithful_reasoning.md](./technical_note_evidence_faithful_reasoning.md)