来源:llm/tgi
TGI: Hugging Face 文本生成推理#
文本生成推理 (TGI) 是 Hugging Face 用于部署和提供大型语言模型 (LLM) 文本生成任务的工具包。
启动单实例 TGI 服务#
我们可以使用服务 YAML 文件在单个实例上托管模型
sky launch -c tgi serve.yaml
用户可以使用以下命令访问模型
ENDPOINT=$(sky status --endpoint 8080 tgi)
curl $(sky serve status tgi --endpoint)/generate \
-H 'Content-Type: application/json' \
-d '{
"inputs": "What is Deep Learning?",
"parameters": {
"max_new_tokens": 20
}
}'
输出应类似于以下内容
{
"generated_text": "What is Deep Learning? Deep Learning is a subfield of machine learning that is concerned with algorithms inspired by the structure and function of the brain called artificial neural networks."
}
使用 SkyPilot Serve 扩展服务#
使用相同的 YAML 文件,我们可以轻松地通过 SkyServe 在多个实例、区域和云之间扩展模型服务
sky serve up -n tgi serve.yaml
服务启动后,我们可以使用以下命令访问模型
ENDPOINT=$(sky serve status --endpoint tgi)
curl $ENDPOINT/generate \
-H 'Content-Type: application/json' \
-d '{
"inputs": "What is Deep Learning?",
"parameters": {
"max_new_tokens": 20
}
}'
包含的文件#
serve.yaml
# SkyServe YAML to run HuggingFace TGI
#
# Usage:
# sky serve up -n tgi huggingface-tgi.yaml \
# [--env MODEL_ID=<model-id-on-huggingface>]
# Then visit the endpoint printed in the console. You could also
# check the endpoint by running:
# sky serve status --endpoint tgi
envs:
MODEL_ID: lmsys/vicuna-13b-v1.5
service:
readiness_probe: /health
replicas: 2
resources:
ports: 8080
accelerators: A100:1
run: |
docker run --gpus all --shm-size 1g -p 8080:80 \
-v ~/data:/data ghcr.io/huggingface/text-generation-inference \
--model-id $MODEL_ID