使用竞价实例进行服务#
SkyServe 支持在竞价实例和按需实例的混合副本上提供模型服务,并提供两个选项:base_ondemand_fallback_replicas
和 dynamic_ondemand_fallback
。当前,当发生竞价实例抢占时,SkyServe 依赖于用户侧进行重试。
基础按需回退#
base_ondemand_fallback_replicas
设置始终保持运行的按需副本数量。这对于确保服务可用性非常有用,可以确保即使竞价副本不可用,也始终有一定容量可用。use_spot
应设置为 true
以启用竞价副本。
service:
readiness_probe: /health
replica_policy:
min_replicas: 2
max_replicas: 3
target_qps_per_replica: 1
# Ensures that one of the replicas is run on on-demand instances
base_ondemand_fallback_replicas: 1
resources:
ports: 8081
cpus: 2+
use_spot: true
workdir: examples/serve/http_server
run: python3 server.py
提示
Kubernetes 实例被视为按需实例。您可以使用 base_ondemand_fallback_replicas
选项让部分副本运行在 Kubernetes 上,而其他副本运行在云竞价实例上。
动态按需回退#
SkyServe 支持在竞价副本不可用时动态回退到按需副本。通过将 dynamic_ondemand_fallback
设置为 true
来启用此功能。这对于在竞价实例中断时确保所需的副本容量非常有用。当竞价副本可用时,SkyServe 将自动切换回使用竞价副本,以最大程度地节省成本。
service:
readiness_probe: /health
replica_policy:
min_replicas: 2
max_replicas: 3
target_qps_per_replica: 1
# Allows replicas to be run on on-demand instances if spot instances are not available
dynamic_ondemand_fallback: true
resources:
ports: 8081
cpus: 2+
use_spot: true
workdir: examples/serve/http_server
run: python3 server.py
提示
SkyServe 支持同时指定 base_ondemand_fallback_replicas
和 dynamic_ondemand_fallback
。同时指定这两者将设置基础的按需副本数量,并在竞价副本不可用时动态回退到按需副本。
示例#
以下示例演示了如何使用 SkyServe 的竞价副本并进行动态回退。该示例是一个简单的 HTTP 服务器,监听端口 8081,并设置 dynamic_ondemand_fallback: true
。要运行
$ sky serve up examples/serve/spot_policy/dynamic_on_demand_fallback.yaml -n http-server
服务启动后,我们可以使用以下命令检查服务和副本的状态。最初,我们会看到
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 1m 17s NO_REPLICA 0/4 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 1 1 - 1 min ago 1x GCP([Spot]vCPU=2) PROVISIONING us-east1
http-server 2 1 - 1 min ago 1x GCP([Spot]vCPU=2) PROVISIONING us-central1
http-server 3 1 - 1 mins ago 1x GCP(vCPU=2) PROVISIONING us-east1
http-server 4 1 - 1 min ago 1x GCP(vCPU=2) PROVISIONING us-central1
当所需数量的竞价副本不可用时,SkyServe 将供应按需副本以满足目标副本数量。例如,当目标数量为 2 且没有竞价副本准备就绪时,SkyServe 将供应 2 个按需副本以满足目标副本数量。
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 1m 17s READY 2/4 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 1 1 http://34.23.22.160:8081 3 min ago 1x GCP([Spot]vCPU=2) READY us-east1
http-server 2 1 http://34.68.226.193:8081 3 min ago 1x GCP([Spot]vCPU=2) READY us-central1
http-server 3 1 - 3 mins ago 1x GCP(vCPU=2) SHUTTING_DOWN us-east1
http-server 4 1 - 3 min ago 1x GCP(vCPU=2) SHUTTING_DOWN us-central1
当竞价副本准备就绪时,SkyServe 将自动缩减按需副本以最大程度地节省成本。
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 3m 59s READY 2/2 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 1 1 http://34.23.22.160:8081 4 mins ago 1x GCP([Spot]vCPU=2) READY us-east1
http-server 2 1 http://34.68.226.193:8081 4 mins ago 1x GCP([Spot]vCPU=2) READY us-central1
如果发生竞价实例中断(例如副本 1),SkyServe 将自动回退到按需副本(例如启动一个按需副本)以满足所需的副本容量。如果竞价可用性恢复,SkyServe 将继续尝试供应一个竞价副本。请注意,SkyServe 将尝试不同的区域和云提供商,以最大化成功供应竞价实例的机会。
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 7m 2s READY 1/3 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 2 1 http://34.68.226.193:8081 7 mins ago 1x GCP([Spot]vCPU=2) READY us-central1
http-server 5 1 - 13 secs ago 1x GCP([Spot]vCPU=2) PROVISIONING us-central1
http-server 6 1 - 13 secs ago 1x GCP(vCPU=2) PROVISIONING us-central1
最终,当竞价可用性恢复时,SkyServe 将自动缩减按需副本。
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 10m 5s READY 2/3 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 2 1 http://34.68.226.193:8081 10 mins ago 1x GCP([Spot]vCPU=2) READY us-central1
http-server 5 1 http://34.121.49.94:8081 1 min ago 1x GCP([Spot]vCPU=2) READY us-central1