YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

judgment_partition_infer

Standalone inference bundle for partitioning a Chinese judgment document (or a truncated excerpt) into 7 zones (Z1..Z7) by predicting 6 boundaries.

Install

pip install -r requirements.txt

If you want to download this bundle from Hugging Face Hub:

pip install huggingface_hub
python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
  repo_id="<USER_OR_ORG>/<REPO_NAME>",
  repo_type="model",
  local_dir="judgment_partition_infer_bundle",
  local_dir_use_symlinks=False,
)
print("Downloaded -> judgment_partition_infer_bundle")
PY

Input format (JSONL)

One JSON object per line. Required field: text (or full_text).

Example:

{"sample_id":"demo_1","text":"...全文..."}

Optional fields case_no and case_name are passed through to outputs.

Run inference (CLI)

From this folder:

python infer_cli.py --input examples/input.jsonl

Outputs are written under output/<YYYYMMDD_HHMMSS>/ by default:

predictions.jsonl
run_meta.json

You can also write to an explicit path:

python infer_cli.py \
  --input examples/input.jsonl \
  --output examples/output.example.jsonl \
  --device cpu

Anchor behavior

Default: --anchor auto

If anchors are detected, enforce:
- boundary[0] = Z1 anchor ("号" within first 100 chars)
- boundary[3] = Z4 anchor ("判决如下"/"如下判决")
If anchors are missing/invalid, keep model boundaries and set anchor_status accordingly.

Python API

from judgment_partition_infer import Predictor

pred = Predictor()  # loads ./assets/best_model.pt + ./assets/vocab.json (or env override)
out = pred.predict_text("...全文/片段...")
print(out["boundaries"])

Note: for Hub compatibility, the model may be stored as assets/best_model.pt.b64.part-* (text shards). Predictor() will automatically reassemble/decode and load the model.

If you pip install only the code (without assets), pass explicit paths:

from judgment_partition_infer import Predictor
pred = Predictor(
  model_path="path/to/best_model.pt",
  vocab_path="path/to/vocab.json",
  device="cpu",
)

Publish to Hugging Face Hub (maintainers)

Install publishing dependency:

pip install -r requirements-publish.txt

huggingface-cli login
or export HF_TOKEN=... / export HUGGINGFACE_HUB_TOKEN=...

Create + upload (model repo):

python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --public

If you already created the repo on the website:

python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --skip-create

If HTTPS to huggingface.co is blocked

You can push via SSH (host hf.co) instead:

chmod +x push_to_hf_ssh.sh
./push_to_hf_ssh.sh <USER_OR_ORG>/<REPO_NAME>

(The script prints an SSH public key; add it at https://huggingface.co/settings/keys, then rerun.)

judgment_partition_infer

这是一个独立的推理工具包，用于通过预测 6 个边界位置，将中文裁判文书全文（或截断的片段）自动切分为 7 个固定结构分区（Z1~Z7）。

安装依赖

pip install -r requirements.txt

如果你希望从 Hugging Face Hub 下载这个推理包（包含权重与词表）：

pip install huggingface_hub
python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
  repo_id="<USER_OR_ORG>/<REPO_NAME>",
  repo_type="model",
  local_dir="judgment_partition_infer_bundle",
  local_dir_use_symlinks=False,
)
print("Downloaded -> judgment_partition_infer_bundle")
PY

输入格式 (JSONL)

输入文件必须为 JSONL 格式（每行一个独立的 JSON 对象）。 必填字段：text（系统也兼容读取 full_text 字段）。

数据示例：

{"sample_id":"demo_1","text":"...全文..."}

注：可选的元数据字段如 case_no（案号）和 case_name（案名）在处理过程中不会被修改，并会原样透传到输出结果中。

运行推理 (命令行方式)

请在当前工具包根目录下执行以下命令：

python infer_cli.py --input examples/input.jsonl

默认情况下，推理结果会保存在按时间戳自动生成的 output/<YYYYMMDD_HHMMSS>/ 目录下，包含以下两个文件：

predictions.jsonl：包含边界坐标、各个分区文本等最终预测结果。
run_meta.json：本次推理任务的运行元数据及统计信息。

你也可以直接输出到一个固定文件路径（便于对接其他系统或做示例）：

python infer_cli.py \
  --input examples/input.jsonl \
  --output examples/output.example.jsonl \
  --device cpu

锚点规则 (Anchor Behavior)

系统默认启用自动锚点策略：--anchor auto

当检测到业务锚点时，强制执行以下约束：
boundary[0]（第 1 条边界）强制对齐至 Z1 锚点（即正文前 100 个字符内出现的最后一个“号”字）。
boundary[3]（第 4 条边界）强制对齐至 Z4 锚点（匹配“判决如下”或“如下判决”）。
当锚点缺失或无效时：
系统将直接保留模型预测的原始句子边界，并在输出结果中相应地更新 anchor_status 字段（标明锚点缺失）。

Python API 调用 (代码内嵌方式)

如果你希望在自己的 Python 代码中直接调用该模型，可以使用以下接口：

from judgment_partition_infer import Predictor

# 初始化预测器（默认加载 ./assets/best_model.pt 和 ./assets/vocab.json，或用环境变量覆盖）
pred = Predictor()  

# 传入文书全文或片段进行推理
out = pred.predict_text("...全文/片段...")

# 打印预测出的 6 个边界位置
print(out["boundaries"])

说明：为了兼容 Hub 的文件大小与二进制限制，模型可能以 assets/best_model.pt.b64.part-* 文本分片形式存储； Predictor() 会自动拼接、解码并加载，不需要手动处理。

如果你只安装了代码（没有把 assets 一起下载到本地），请显式传入路径：

from judgment_partition_infer import Predictor
pred = Predictor(
  model_path="path/to/best_model.pt",
  vocab_path="path/to/vocab.json",
  device="cpu",
)

发布到 Hugging Face Hub（维护者用）

安装发布依赖：

pip install -r requirements-publish.txt

huggingface-cli login
或 export HF_TOKEN=... / export HUGGINGFACE_HUB_TOKEN=...

创建并上传（Model Repo）：

python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --public

如果你已经在网页端创建了仓库：

python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --skip-create

如果当前网络无法访问 huggingface.co

可以改用 SSH（host 为 hf.co）推送：

chmod +x push_to_hf_ssh.sh
./push_to_hf_ssh.sh <USER_OR_ORG>/<REPO_NAME>

（脚本会打印 SSH 公钥；请复制到 https://huggingface.co/settings/keys，然后再运行一次脚本。）

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support