使用 SkyPilot 和 vLLM 运行及部署 DeepSeek-R1 Distilled 模型#
SkyPilot 是一个用于在任何基础设施上运行 AI 和批量工作负载的框架,提供统一的执行、高成本节省和高 GPU 可用性。
在 2025 年 1 月 20 日,DeepSeek AI 发布了 DeepSeek-R1 模型系列,其中包含参数量高达 671B 的模型。
DeepSeek-R1 自然展现出许多强大而有趣的推理能力。它超越了 OpenAI-o1-mini 等最先进的专有模型,并首次实现了开源大型语言模型(LLM)能够与 OpenAI-o1 媲美。
本指南将介绍如何在任何基础设施上运行和托管 DeepSeek-R1 模型,包括本地 GPU 工作站、Kubernetes 集群和公共云(支持 15+ 种云)。
SkyPilot 支持多种大型语言模型框架和模型。在本指南中,我们将使用 vLLM(一个用于快速 LLM 推理和服务的开源库)作为示例。
新增:我们添加了一个新的 SkyPilot YAML 配置,用于使用 SGLang 运行 DeepSeek-R1 671B 模型。
步骤 0:准备好任意基础设施#
在您的本地机器上安装 SkyPilot
pip install 'skypilot-nightly[all]'
根据您希望运行 DeepSeek-R1 的基础设施选择以下之一
如果您的本地机器/集群有 GPU:您可以直接在现有机器上运行 SkyPilot,命令如下:
sky local up
如果您想使用云服务(支持 15+ 种云)
sky check
详情请参阅文档。
步骤 1:使用 SkyPilot 运行#
现在是时候使用 SkyPilot 运行 DeepSeek 了。具体指令可能取决于您现有的硬件。
8B
sky launch deepseek-r1-vllm.yaml \
-c deepseek \
--env MODEL_NAME=deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
--gpus L4:1
70B
sky launch deepseek-r1-vllm.yaml \
-c deepseek \
--env MODEL_NAME=deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--gpus A100-80GB:2
将命令中的模型和 GPU 替换为您希望使用的。您可以运行 sky show-gpus
来查看您可以访问哪些 GPU。作为参考,以下是模型-GPU 兼容性矩阵:
GPU |
DeepSeek-R1-Distill-Qwen-7B |
DeepSeek-R1-Distill-Llama-70B |
DeepSeek-R1 |
---|---|---|---|
L4:1 |
✅,使用 |
❌ |
❌ |
L4:8 |
✅ |
❌ |
❌ |
A100:8 |
✅ |
✅ |
❌ |
A100-80GB:12 |
✅ |
✅ |
✅,使用 |
步骤 2:获取结果#
获取一个在副本之间进行负载均衡的单个端点
ENDPOINT=$(sky status --ip deepseek)
在终端中查询端点:8B
curl http://$ENDPOINT:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}' | jq .
70B
curl http://$ENDPOINT:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "how many rs are in strawberry"
}
]
}' | jq .
您将获得包含在 <think>
标签内的思考链和最终结果。
你是谁?我是 DeepSeek-R1。
您好!我是 DeepSeek-R1,一个由 DeepSeek 创建的人工智能助手。竭诚为您服务,非常乐意协助您处理任何疑问或任务。
{
"id": "chatcmpl-507f467863344f31b98d8bf36b9a3c1c",
"object": "chat.completion",
"created": 1737503962,
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<think>\n\n</think>\n\nGreetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 13,
"total_tokens": 57,
"completion_tokens": 44,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
strawberry 里有多少个 R? strawberry 里有 3 个 R。
<think>
好的,我需要弄清楚字母“R”在单词“strawberry”中出现了多少次。嗯,让我一步一步来思考。首先,我最好把这个词写出来,这样更容易看清楚。这个词是 S-T-R-A-W-B-E-R-R-Y。等等,对吗?让我再确认一下。S-T-R-A-W-B-E-R-R-Y,是的,就是这样拼写的。
现在,我需要一个字母一个字母地检查并数出 R 的个数。从第一个字母开始,它是 S。不是 R,所以跳过。第二个字母是 T,也不是 R。第三个字母是 R。好的,这是第一个 R。我记下来。
接下来的字母是:A、W、B、E。这些都不是 R。到目前为止,只有一个 R。然后 E 后面又跟着 R,对吧?那就是第二个 R。但是等等,我觉得那之后还有一个 R。让我确认一下。第二个 R 之后还有一个 R,那就是第三个 R。等等,不对,让我再检查一下拼写。是 S-T-R-A-W-B-E-R-R-Y。所以 E 后面是 R,然后又一个 R,那那里就有两个 R。所以总共多少个?
让我再数一遍:第一个 R 是第三个字母,然后 E 后面还有一个 R,这样总共是两个,还是三个?等等,不对。让我看看每个位置
S
T
R (第 1 个 R)
A
W
B
E
R (第 2 个 R)
R (第 3 个 R)
Y
等等,那么 E 后面连续有两个 R,那就是第 8 和第 9 个字母。所以第一个 R 之后又多了两个 R。那么总共是三个 R?还是这样对?让我确保我没有数重。我们写出来看看
S T R A W B E R R Y
所以,位置如下:
1: S
2: T
3: R (1)
4: A
5: W
6: B
7: E
8: R (2)
9: R (3)
10: Y
所以是三个 R。等等,但当我想到“strawberry”这个词时,我以为它有两个 R,也许是三个?等等,也许我错了。让我查查字典或其他什么东西,但因为我不能那样做,我必须依靠我的记忆。嗯,也许我之前弄错了。让我再想想。我认为 strawberry 有一个双写 R。但是在拼写中,它是 R-A-W-B-E-R-R-Y 吗?所以 E 后面是 R-R-Y。那末尾就有两个 R 了。再加上 T 后面的那个,总共有三个 R。
等等,不对。让我考虑一下这个词的发音。它是“straw”加上“berry”,对吧?那么“straw”有一个 R,“berry”有两个 R?不,“berry”只有一个 R。等等,不对,“berry”是 B-E-R-R-Y,那里有两个 R。所以把它们合起来,“strawberry”就会有“straw”的一个 R 和“berry”的两个 R,总共是三个 R。嗯,但我不太确定。有些人可能认为只有两个 R,但根据拼写,是三个。
等等,不,其实,让我把它分解一下。这个词是 S-T-R-A-W-B-E-R-R-Y。所以 T 后面有一个 R,然后 B 和 E 后面又有一个 R,然后在 Y 前面还有一个 R。所以总共有三个 R。因此答案应该是三个。但我有点困惑,因为有时候人们可能会数错,认为只有两个。但根据拼写,是三个。我认为这是正确的。 </think>
单词“strawberry”包含三个 R。
分步拆解
第一个 R 是第三个字母。
第二个 R 是第八个字母。
第三个 R 是第九个字母。
数一下每次出现:1 (位置 3)、2 (位置 8) 和 3 (位置 9)。
答案:单词“strawberry”中有 3 个 R。
{
"id": "chatcmpl-d532bd1c1738493ab9c8c906550044bf",
"object": "chat.completion",
"created": 1737507945,
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<think>\nOkay, so I need to figure out how many times the letter 'R' appears in the word \"strawberry.\" Hmm, let me think about this step by step. First, I should probably write out the word to visualize it better. The word is S-T-R-A-W-B-E-R-R-Y. Wait, is that right? Let me double-check. S-T-R-A-W-B-E-R-R-Y, yes, that's how it's spelled.\n\nNow, I need to go through each letter one by one and count the Rs. Starting with the first letter, it's an S. Not an R, so move on. The second letter is T, still not an R. The third letter is R. Okay, that's the first R. I'll note that down.\n\nNext letters: A, W, B, E. None of those are Rs. So far, only one R. Then comes R again after E, right? So that's the second R. But wait, I think there's another R after that. Let me make sure. After the second R, there's another R, making it the third R. Wait, no, let me check the spelling again. It's S-T-R-A-W-B-E-R-R-Y. So after E, it's R, then another R, so that's two Rs there. So total, how many?\n\nLet me recount: first R is the third letter, then after E, there's another R, making it two in total, or three? Wait, no. Let me look at each position:\n\n1. S\n2. T\n3. R (1st R)\n4. A\n5. W\n6. B\n7. E\n8. R (2nd R)\n9. R (3rd R)\n10. Y\n\nWait, so after E, there are two Rs in a row, which would make it the 8th and 9th letters. So that's two more Rs after the first one. So total, it's three Rs? Or is that correct? Let me make sure I'm not overcounting. Let's write it out:\n\nS T R A W B E R R Y\n\nSo, positions:\n\n1: S\n\n2: T\n\n3: R (1)\n\n4: A\n\n5: W\n\n6: B\n\n7: E\n\n8: R (2)\n\n9: R (3)\n\n10: Y\n\nSo that's three Rs. Wait, but when I think about the word \"strawberry,\" I thought it had two Rs, but maybe it's three. Wait, maybe I'm wrong. Let me check a dictionary or something, but since I can't do that, I'll have to rely on my memory. Hmm, maybe I was mistaken earlier. Let me think again. Strawberries have a double R, I believe. But in the spelling, is it R-A-W-B-E-R-R-Y? So after the E, it's R-R-Y. So that's two Rs at the end. Plus the one after the T, so that's three Rs total.\n\nWait, no. Let me think about how the word is pronounced. It's \"straw\" plus \"berry,\" right? So \"straw\" has one R, and \"berry\" has two Rs? No, \"berry\" only has one R. Wait, no, \"berry\" is B-E-R-R-Y, so there are two Rs there. So when you put it together, \"strawberry\" would have the R from \"straw\" and two Rs from \"berry,\" making three Rs. Hmm, but I'm not sure. Some people might think it's only two Rs, but based on the spelling, it's three.\n\nWait, no, actually, let me break it down. The word is S-T-R-A-W-B-E-R-R-Y. So after the T, there's an R, then later after the B and E, there's another R, and then another R before Y. So that's three Rs. So the answer should be three. But I'm a bit confused because sometimes people might miscount, thinking it's two. But according to the spelling, it's three. I think that's correct.\n</think>\n\nThe word \"strawberry\" contains three Rs. \n\nStep-by-step breakdown:\n- The first R is the third letter.\n- The second R is the eighth letter.\n- The third R is the ninth letter.\n\nCounting each occurrence: 1 (position 3), 2 (position 8), and 3 (position 9).\n\nAnswer: There are 3 Rs in \"strawberry.\"",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 15,
"total_tokens": 985,
"completion_tokens": 970,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
关闭#
要关闭,运行
sky down deepseek
包含的文件#
deepseek-r1-vllm.yaml
envs:
MODEL_NAME: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
MAX_MODEL_LEN: 4096
resources:
accelerators: {L4:1, A10G:1, A10:1, A100:1, A100-80GB:1}
ports:
- 8000
disk_tier: best
setup: |
uv pip install transformers==4.48.1
uv pip install vllm==0.6.6.post1
run: |
echo 'Starting vllm openai api server...'
python -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 \
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
--model $MODEL_NAME \
--max-model-len $MAX_MODEL_LEN