使用竞价实例进行服务#

SkyServe 支持在竞价实例和按需实例的混合副本上提供模型服务,并提供两个选项:base_ondemand_fallback_replicasdynamic_ondemand_fallback。当前,当发生竞价实例抢占时,SkyServe 依赖于用户侧进行重试。

基础按需回退#

base_ondemand_fallback_replicas 设置始终保持运行的按需副本数量。这对于确保服务可用性非常有用,可以确保即使竞价副本不可用,也始终有一定容量可用。use_spot 应设置为 true 以启用竞价副本。

service:
  readiness_probe: /health
  replica_policy:
    min_replicas: 2
    max_replicas: 3
    target_qps_per_replica: 1
    # Ensures that one of the replicas is run on on-demand instances
    base_ondemand_fallback_replicas: 1

resources:
  ports: 8081
  cpus: 2+
  use_spot: true

workdir: examples/serve/http_server

run: python3 server.py

提示

Kubernetes 实例被视为按需实例。您可以使用 base_ondemand_fallback_replicas 选项让部分副本运行在 Kubernetes 上,而其他副本运行在云竞价实例上。

动态按需回退#

SkyServe 支持在竞价副本不可用时动态回退到按需副本。通过将 dynamic_ondemand_fallback 设置为 true 来启用此功能。这对于在竞价实例中断时确保所需的副本容量非常有用。当竞价副本可用时,SkyServe 将自动切换回使用竞价副本,以最大程度地节省成本。

service:
  readiness_probe: /health
  replica_policy:
    min_replicas: 2
    max_replicas: 3
    target_qps_per_replica: 1
    # Allows replicas to be run on on-demand instances if spot instances are not available
    dynamic_ondemand_fallback: true

resources:
  ports: 8081
  cpus: 2+
  use_spot: true

workdir: examples/serve/http_server

run: python3 server.py

提示

SkyServe 支持同时指定 base_ondemand_fallback_replicasdynamic_ondemand_fallback。同时指定这两者将设置基础的按需副本数量,并在竞价副本不可用时动态回退到按需副本。

示例#

以下示例演示了如何使用 SkyServe 的竞价副本并进行动态回退。该示例是一个简单的 HTTP 服务器,监听端口 8081,并设置 dynamic_ondemand_fallback: true。要运行

$ sky serve up examples/serve/spot_policy/dynamic_on_demand_fallback.yaml -n http-server

服务启动后,我们可以使用以下命令检查服务和副本的状态。最初,我们会看到

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS      REPLICAS  ENDPOINT
http-server  1        1m 17s  NO_REPLICA  0/4       54.227.229.217:30001

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT                   LAUNCHED    RESOURCES             STATUS         REGION
http-server   1   1        -                          1 min ago   1x GCP([Spot]vCPU=2)  PROVISIONING  us-east1
http-server   2   1        -                          1 min ago   1x GCP([Spot]vCPU=2)  PROVISIONING  us-central1
http-server   3   1        -                          1 mins ago  1x GCP(vCPU=2)        PROVISIONING  us-east1
http-server   4   1        -                          1 min ago   1x GCP(vCPU=2)        PROVISIONING  us-central1

当所需数量的竞价副本不可用时,SkyServe 将供应按需副本以满足目标副本数量。例如,当目标数量为 2 且没有竞价副本准备就绪时,SkyServe 将供应 2 个按需副本以满足目标副本数量。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1        1m 17s  READY   2/4       54.227.229.217:30001

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT                   LAUNCHED    RESOURCES             STATUS         REGION
http-server   1   1        http://34.23.22.160:8081   3 min ago   1x GCP([Spot]vCPU=2)  READY          us-east1
http-server   2   1        http://34.68.226.193:8081  3 min ago   1x GCP([Spot]vCPU=2)  READY          us-central1
http-server   3   1        -                          3 mins ago  1x GCP(vCPU=2)        SHUTTING_DOWN  us-east1
http-server   4   1        -                          3 min ago   1x GCP(vCPU=2)        SHUTTING_DOWN  us-central1

当竞价副本准备就绪时,SkyServe 将自动缩减按需副本以最大程度地节省成本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1        3m 59s  READY   2/2       54.227.229.217:30001

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT                   LAUNCHED    RESOURCES             STATUS  REGION
http-server   1   1        http://34.23.22.160:8081   4 mins ago  1x GCP([Spot]vCPU=2)  READY   us-east1
http-server   2   1        http://34.68.226.193:8081  4 mins ago  1x GCP([Spot]vCPU=2)  READY   us-central1

如果发生竞价实例中断(例如副本 1),SkyServe 将自动回退到按需副本(例如启动一个按需副本)以满足所需的副本容量。如果竞价可用性恢复,SkyServe 将继续尝试供应一个竞价副本。请注意,SkyServe 将尝试不同的区域和云提供商,以最大化成功供应竞价实例的机会。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1        7m 2s   READY   1/3       54.227.229.217:30001

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT                   LAUNCHED     RESOURCES             STATUS        REGION
http-server   2   1        http://34.68.226.193:8081  7 mins ago   1x GCP([Spot]vCPU=2)  READY         us-central1
http-server   5   1        -                          13 secs ago  1x GCP([Spot]vCPU=2)  PROVISIONING  us-central1
http-server   6   1        -                          13 secs ago  1x GCP(vCPU=2)        PROVISIONING  us-central1

最终,当竞价可用性恢复时,SkyServe 将自动缩减按需副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1        10m 5s  READY   2/3       54.227.229.217:30001

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT                   LAUNCHED     RESOURCES             STATUS         REGION
http-server   2   1        http://34.68.226.193:8081  10 mins ago  1x GCP([Spot]vCPU=2)  READY          us-central1
http-server   5   1        http://34.121.49.94:8081   1 min ago    1x GCP([Spot]vCPU=2)  READY          us-central1