Instructions to use YeungNLP/bloom-396m-zh with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use YeungNLP/bloom-396m-zh with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="YeungNLP/bloom-396m-zh")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("YeungNLP/bloom-396m-zh") model = AutoModelForCausalLM.from_pretrained("YeungNLP/bloom-396m-zh") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use YeungNLP/bloom-396m-zh with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "YeungNLP/bloom-396m-zh" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YeungNLP/bloom-396m-zh", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/YeungNLP/bloom-396m-zh
- SGLang
How to use YeungNLP/bloom-396m-zh with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "YeungNLP/bloom-396m-zh" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YeungNLP/bloom-396m-zh", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "YeungNLP/bloom-396m-zh" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YeungNLP/bloom-396m-zh", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use YeungNLP/bloom-396m-zh with Docker Model Runner:
docker model run hf.co/YeungNLP/bloom-396m-zh
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
项目地址:LLMPruner:大语言模型裁剪工具
LLMPruner是一个大语言模型裁剪工具,通过对大语言模型的冗余词表进行裁剪,减少模型参数量,降低显存占用,提升训练速度,并且能够保留预训练中学习到的知识。
本项目对Bloom进行词表裁剪,保留中文token和常用的英文token,词表由250880将至46145,缩减为原来的18.39%。裁剪得到的Bloom模型如下表:
使用方法:
from transformers import BloomTokenizerFast, BloomForCausalLM
tokenizer = BloomTokenizerFast.from_pretrained('YeungNLP/bloom-1b4-zh')
model = BloomForCausalLM.from_pretrained('YeungNLP/bloom-1b4-zh')
print(tokenizer.batch_decode(model.generate(tokenizer.encode('长风破浪会有时', return_tensors='pt'))))
- Downloads last month
- 9