YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
judgment_partition_infer
Standalone inference bundle for partitioning a Chinese judgment document (or a truncated excerpt) into 7 zones (Z1..Z7) by predicting 6 boundaries.
Install
pip install -r requirements.txt
If you want to download this bundle from Hugging Face Hub:
pip install huggingface_hub
python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="<USER_OR_ORG>/<REPO_NAME>",
repo_type="model",
local_dir="judgment_partition_infer_bundle",
local_dir_use_symlinks=False,
)
print("Downloaded -> judgment_partition_infer_bundle")
PY
Input format (JSONL)
One JSON object per line. Required field: text (or full_text).
Example:
{"sample_id":"demo_1","text":"...全文..."}
Optional fields case_no and case_name are passed through to outputs.
Run inference (CLI)
From this folder:
python infer_cli.py --input examples/input.jsonl
Outputs are written under output/<YYYYMMDD_HHMMSS>/ by default:
predictions.jsonlrun_meta.json
You can also write to an explicit path:
python infer_cli.py \
--input examples/input.jsonl \
--output examples/output.example.jsonl \
--device cpu
Anchor behavior
Default: --anchor auto
- If anchors are detected, enforce:
- boundary[0] = Z1 anchor ("号" within first 100 chars)
- boundary[3] = Z4 anchor ("判决如下"/"如下判决")
- If anchors are missing/invalid, keep model boundaries and set
anchor_statusaccordingly.
Python API
from judgment_partition_infer import Predictor
pred = Predictor() # loads ./assets/best_model.pt + ./assets/vocab.json (or env override)
out = pred.predict_text("...全文/片段...")
print(out["boundaries"])
Note: for Hub compatibility, the model may be stored as
assets/best_model.pt.b64.part-* (text shards).
Predictor() will automatically reassemble/decode and load the model.
If you pip install only the code (without assets), pass explicit paths:
from judgment_partition_infer import Predictor
pred = Predictor(
model_path="path/to/best_model.pt",
vocab_path="path/to/vocab.json",
device="cpu",
)
Publish to Hugging Face Hub (maintainers)
- Install publishing dependency:
pip install -r requirements-publish.txt
- Login (recommended) or set env token:
huggingface-cli login- or
export HF_TOKEN=.../export HUGGINGFACE_HUB_TOKEN=...
- Create + upload (model repo):
python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --public
If you already created the repo on the website:
python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --skip-create
If HTTPS to huggingface.co is blocked
You can push via SSH (host hf.co) instead:
chmod +x push_to_hf_ssh.sh
./push_to_hf_ssh.sh <USER_OR_ORG>/<REPO_NAME>
(The script prints an SSH public key; add it at https://huggingface.co/settings/keys, then rerun.)
judgment_partition_infer
这是一个独立的推理工具包,用于通过预测 6 个边界位置,将中文裁判文书全文(或截断的片段)自动切分为 7 个固定结构分区(Z1~Z7)。
安装依赖
pip install -r requirements.txt
如果你希望从 Hugging Face Hub 下载这个推理包(包含权重与词表):
pip install huggingface_hub
python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="<USER_OR_ORG>/<REPO_NAME>",
repo_type="model",
local_dir="judgment_partition_infer_bundle",
local_dir_use_symlinks=False,
)
print("Downloaded -> judgment_partition_infer_bundle")
PY
输入格式 (JSONL)
输入文件必须为 JSONL 格式(每行一个独立的 JSON 对象)。
必填字段:text(系统也兼容读取 full_text 字段)。
数据示例:
{"sample_id":"demo_1","text":"...全文..."}
注:可选的元数据字段如
case_no(案号)和case_name(案名)在处理过程中不会被修改,并会原样透传到输出结果中。
运行推理 (命令行方式)
请在当前工具包根目录下执行以下命令:
python infer_cli.py --input examples/input.jsonl
默认情况下,推理结果会保存在按时间戳自动生成的 output/<YYYYMMDD_HHMMSS>/ 目录下,包含以下两个文件:
predictions.jsonl:包含边界坐标、各个分区文本等最终预测结果。run_meta.json:本次推理任务的运行元数据及统计信息。
你也可以直接输出到一个固定文件路径(便于对接其他系统或做示例):
python infer_cli.py \
--input examples/input.jsonl \
--output examples/output.example.jsonl \
--device cpu
锚点规则 (Anchor Behavior)
系统默认启用自动锚点策略:--anchor auto
当检测到业务锚点时,强制执行以下约束:
boundary[0](第 1 条边界)强制对齐至 Z1 锚点(即正文前 100 个字符内出现的最后一个“号”字)。boundary[3](第 4 条边界)强制对齐至 Z4 锚点(匹配“判决如下”或“如下判决”)。当锚点缺失或无效时:
系统将直接保留模型预测的原始句子边界,并在输出结果中相应地更新
anchor_status字段(标明锚点缺失)。
Python API 调用 (代码内嵌方式)
如果你希望在自己的 Python 代码中直接调用该模型,可以使用以下接口:
from judgment_partition_infer import Predictor
# 初始化预测器(默认加载 ./assets/best_model.pt 和 ./assets/vocab.json,或用环境变量覆盖)
pred = Predictor()
# 传入文书全文或片段进行推理
out = pred.predict_text("...全文/片段...")
# 打印预测出的 6 个边界位置
print(out["boundaries"])
说明:为了兼容 Hub 的文件大小与二进制限制,模型可能以
assets/best_model.pt.b64.part-* 文本分片形式存储;
Predictor() 会自动拼接、解码并加载,不需要手动处理。
如果你只安装了代码(没有把 assets 一起下载到本地),请显式传入路径:
from judgment_partition_infer import Predictor
pred = Predictor(
model_path="path/to/best_model.pt",
vocab_path="path/to/vocab.json",
device="cpu",
)
发布到 Hugging Face Hub(维护者用)
- 安装发布依赖:
pip install -r requirements-publish.txt
- 登录(推荐)或通过环境变量提供 token:
huggingface-cli login- 或
export HF_TOKEN=.../export HUGGINGFACE_HUB_TOKEN=...
- 创建并上传(Model Repo):
python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --public
如果你已经在网页端创建了仓库:
python publish_to_hf.py --repo-id <USER_OR_ORG>/<REPO_NAME> --skip-create
如果当前网络无法访问 huggingface.co
可以改用 SSH(host 为 hf.co)推送:
chmod +x push_to_hf_ssh.sh
./push_to_hf_ssh.sh <USER_OR_ORG>/<REPO_NAME>
(脚本会打印 SSH 公钥;请复制到 https://huggingface.co/settings/keys,然后再运行一次脚本。)