更新服务#

SkyServe 支持更新已部署的服务,可用于更改:

  • 副本代码(例如,run/setup;对调试很有用)

  • resources 中的副本资源规范(例如,加速器或实例类型)

  • service 中的服务规范(例如,副本数量或自动伸缩规范)

更新期间,服务将保持可访问性且无停机,其端点也将保持不变。默认情况下应用滚动更新,您也可以指定蓝绿更新

滚动更新#

要更新现有服务,请使用 sky serve update

$ sky serve update service-name new_service.yaml

SkyServe 将按照 new_service.yaml 中描述的新副本启动,具体行为如下:

  • 更新开始,流量将继续重定向到现有的(旧)副本。

  • 新副本(具有新设置)在后台启动。

  • 无论何时,当新旧副本的总数超过预期的副本数量(基于自动伸缩器的决策)时,多余的旧副本将被缩减。

  • 流量将同时重定向到新旧副本,直到所有新副本准备就绪。

提示

当只更新 service 字段,且服务任务中未指定 workdirfile_mounts 时,SkyServe 将通过应用新的服务规范并提升其版本来重用旧副本(有关版本信息,请参见 sky serve status)。这将显著减少更新服务所需的时间,并避免潜在的配额问题。

示例#

我们首先启动一个简单的 HTTP 服务

$ sky serve up examples/serve/http_server/task.yaml -n http-server

我们可以使用 sky serve status http-server 来检查服务状态

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1        1m 41s  READY   2/2       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED    RESOURCES       STATUS  REGION
http-server   1   1        54.173.203.169  2 mins ago  1x AWS(vCPU=2)  READY   us-east-1
http-server   2   1        52.87.241.103   2 mins ago  1x AWS(vCPU=2)  READY   us-east-1

服务 http-server 的初始版本为 1。

假设我们想将服务的副本数量从 2 个更新为 3 个。我们可以通过更改 replicas 字段来更新任务 yaml 文件 examples/serve/http_server/task.yaml

# examples/serve/http_server/task.yaml
service:
  readiness_probe:
    path: /health
    initial_delay_seconds: 20
  replicas: 3

resources:
  ports: 8081
  cpus: 2+

workdir: examples/serve/http_server

run: python3 server.py

然后我们可以使用 sky serve update 来更新服务

$ sky serve update http-server examples/serve/http_server/task.yaml

SkyServe 将触发启动三个新的副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  2        6m 15s  READY   2/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS        REGION
http-server   1   1        54.173.203.169  6 mins ago   1x AWS(vCPU=2)  READY         us-east-1
http-server   2   1        52.87.241.103   6 mins ago   1x AWS(vCPU=2)  READY         us-east-1
http-server   3   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1
http-server   4   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1
http-server   5   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1

每当新副本准备就绪,流量将同时重定向到新旧副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1,2        10m 4s  READY   3/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS         REGION
http-server   1   1        54.173.203.169  10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   2   1        52.87.241.103   10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   3   2        3.93.241.163    1 min ago    1x AWS(vCPU=2)  READY          us-east-1
http-server   4   2        -               1 min ago    1x AWS(vCPU=2)  PROVISIONING   us-east-1
http-server   5   2        -               1 min ago    1x AWS(vCPU=2)  PROVISIONING   us-east-1

一旦新旧副本的总数超过请求的数量,旧副本将被缩减。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1,2        10m 4s  READY   3/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS         REGION
http-server   1   1        54.173.203.169  10 mins ago  1x AWS(vCPU=2)  SHUTTING_DOWN  us-east-1
http-server   2   1        52.87.241.103   10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   3   2        3.93.241.163    1 min ago    1x AWS(vCPU=2)  READY          us-east-1
http-server   4   2        18.206.226.82   1 min ago    1x AWS(vCPU=2)  READY          us-east-1
http-server   5   2        -               1 min ago    1x AWS(vCPU=2)  PROVISIONING   us-east-1

最终,我们将只剩下新副本准备好为用户请求提供服务。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME   STATUS  REPLICAS  ENDPOINT
http-server  2        11m 42s  READY   3/3       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP             LAUNCHED    RESOURCES       STATUS  REGION
http-server   3   2        3.93.241.163   3 mins ago  1x AWS(vCPU=2)  READY   us-east-1
http-server   4   2        18.206.226.82  3 mins ago  1x AWS(vCPU=2)  READY   us-east-1
http-server   5   2        3.26.232.31    1 min ago   1x AWS(vCPU=2)  READY   us-east-1

蓝绿更新#

SkyServe 也支持蓝绿更新,使用以下命令:

$ sky serve update --mode blue_green service-name new_service.yaml

在此更新模式下,SkyServe 将按照 new_service.yaml 中描述的新副本启动,具体行为如下:

  • 更新开始,流量将继续重定向到现有的(旧)副本。

  • 新副本(具有新设置)在后台启动。

  • 仅当所有新副本准备就绪后,流量才会重定向到新副本。

  • 所有新副本准备就绪后,旧副本将被缩减。

更新期间,流量完全由旧版本或新版本的副本提供服务。sky serve status 显示最新的服务版本和每个副本的版本。

示例#

我们以同一个服务 http-server 为例。然后我们可以使用 sky serve update --mode blue_green 来更新服务

$ sky serve update http-server --mode blue_green examples/serve/http_server/task.yaml

SkyServe 将触发启动三个新的副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  2        6m 15s  READY   2/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS        REGION
http-server   1   1        54.173.203.169  6 mins ago   1x AWS(vCPU=2)  READY         us-east-1
http-server   2   1        52.87.241.103   6 mins ago   1x AWS(vCPU=2)  READY         us-east-1
http-server   3   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1
http-server   4   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1
http-server   5   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1

当新副本准备就绪时,流量仍将重定向到旧副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1        10m 4s  READY   3/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS         REGION
http-server   1   1        54.173.203.169  10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   2   1        52.87.241.103   10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   3   2        3.93.241.163    1 min ago    1x AWS(vCPU=4)  READY          us-east-1
http-server   4   2        -               1 min ago    1x AWS(vCPU=4)  PROVISIONING   us-east-1
http-server   5   2        -               1 min ago    1x AWS(vCPU=4)  PROVISIONING   us-east-1

一旦新副本的总数满足要求,流量将重定向到新副本,旧副本将被缩减。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  2        10m 4s  READY   3/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS         REGION
http-server   1   1        54.173.203.169  10 mins ago  1x AWS(vCPU=2)  SHUTTING_DOWN  us-east-1
http-server   2   1        52.87.241.103   10 mins ago  1x AWS(vCPU=2)  SHUTTING_DOWN  us-east-1
http-server   3   2        3.93.241.163    1 min ago    1x AWS(vCPU=4)  READY          us-east-1
http-server   4   2        18.206.226.82   1 min ago    1x AWS(vCPU=4)  READY          us-east-1
http-server   5   2        3.26.232.31     1 min ago    1x AWS(vCPU=4)  READY          us-east-1

最终,与滚动更新一样,我们将只剩下新副本准备好为用户请求提供服务。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME   STATUS  REPLICAS  ENDPOINT
http-server  2        11m 42s  READY   3/3       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP             LAUNCHED    RESOURCES       STATUS  REGION
http-server   3   2        3.93.241.163   3 mins ago  1x AWS(vCPU=4)  READY   us-east-1
http-server   4   2        18.206.226.82  3 mins ago  1x AWS(vCPU=4)  READY   us-east-1
http-server   5   2        3.26.232.31    1 min ago   1x AWS(vCPU=4)  READY   us-east-1