GPU 和加速器#

SkyPilot 支持广泛的 GPU、TPU 和其他加速器。

支持的加速器#

$ sky show-gpus -a
COMMON_GPU  AVAILABLE_QUANTITIES  
A10         1, 2, 4               
A10G        1, 4, 8               
A100        1, 2, 4, 8, 16        
A100-80GB   1, 2, 4, 8            
H100        1, 2, 4, 8, 12        
H200        1, 8                  
L4          1, 2, 4, 8            
L40S        1, 2, 4, 8            
T4          1, 2, 4, 8            
V100        1, 2, 4, 8            
V100-32GB   1, 2, 4, 8            

GOOGLE_TPU       AVAILABLE_QUANTITIES  
tpu-v2-8         1                     
tpu-v3-8         1                     
tpu-v4-8         1                     
tpu-v4-16        1                     
tpu-v4-32        1                     
tpu-v5litepod-1  1                     
tpu-v5litepod-4  1                     
tpu-v5litepod-8  1                     
tpu-v5p-8        1                     
tpu-v5p-16       1                     
tpu-v5p-32       1                     
tpu-v6e-1        1                     
tpu-v6e-4        1                     
tpu-v6e-8        1                     

OTHER_GPU          AVAILABLE_QUANTITIES  
A100-80GB-SXM      1, 2, 4, 8            
A40                1, 2, 4, 8            
A4000              1, 2, 4               
A6000              1, 2, 4               
GH200              1                     
Gaudi HL-205       8                     
H100-MEGA          8                     
H100-SXM           1, 2, 4, 8            
K80                1, 2, 4, 8, 16        
L40                1, 2, 4, 8            
M4000              1                     
M60                1, 2, 4               
P100               1, 2, 4               
P4                 1, 2, 4               
P4000              1, 2                  
RTX3060            1, 2                  
RTX3080            1                     
RTX3090            1, 2, 4, 8            
RTX4000-Ada        1, 2, 4, 8            
RTX4090            1, 2, 3, 4, 6, 8, 12  
RTX6000            1                     
RTX6000-Ada        1, 2, 4, 8            
RTXA4000           1, 2, 4, 8            
RTXA4500           1, 2, 4, 8            
RTXA5000           1, 2, 4, 8            
RTXA6000           1, 2, 4, 8            
Radeon MI25        1                     
Radeon Pro V520    1, 2, 4               
T4g                1, 2                  
... [omitted long outputs] ...

在幕后,这些细节都编码在 SkyPilot Catalog 中:skypilot-org/skypilot-catalog

Kubernetes 中的加速器#

您的 Kubernetes 集群可能只包含特定的加速器。

您可以使用以下命令查询 Kubernetes 集群中可用的加速器

$ sky show-gpus --cloud k8s
Kubernetes GPUs
GPU   REQUESTABLE_QTY_PER_NODE  UTILIZATION
L4    1, 2, 4                   12 of 12
H100  1, 2, 4, 8                16 of 16

Kubernetes per node GPU availability
NODE                       GPU       UTILIZATION
my-cluster-0               L4        4 of 4
my-cluster-1               L4        4 of 4
my-cluster-2               L4        2 of 2
my-cluster-3               L4        2 of 2
my-cluster-4               H100      8 of 8
my-cluster-5               H100      8 of 8

查询加速器详细信息#

您可以查询支持的加速器配置 accelerator:count 的详细信息

$ sky show-gpus H100:8
GPU   QTY  CLOUD       INSTANCE_TYPE                     DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION             
H100  8    Vast        8x-H100_NVL-32-65536              749GB       32     64GB      $ 16.000      $ 16.000           Australia, AU, OC  
H100  8    Vast        8x-H100_SXM-32-65536              637GB       32     64GB      $ 21.000      $ 10.670           Iceland, IS, EU    
H100  8    nebius      gpu-h100-sxm_8gpu-128vcpu-1600gb  200GB       128    1600GB    $ 23.600      -                  eu-north1          
H100  8    Lambda      gpu_8x_h100_sxm5                  80GB        208    1800GB    $ 23.920      -                  europe-central-1   
H100  8    Fluidstack  H100_NVLINK_80GB::8               80GB        252    1440GB    $ 23.920      -                  FINLAND            
H100  8    RunPod      8x_H100_SECURE                    -           128    640GB     $ 35.920      -                  CA                 
H100  8    GCP         a3-highgpu-8g                     80GB        208    1872GB    $ 46.021      $ 20.199           us-central1        
H100  8    Paperspace  H100x8                            -           128    640GB     $ 47.600      -                  East Coast (NY2)   
H100  8    DO          gpu-h100x8-640gb                  80GB        160    1920GB    $ 47.600      -                  tor1               
H100  8    OCI         BM.GPU.H100.8                     80GB        224    2048GB    $ 80.000      -                  eu-amsterdam-1     
H100  8    AWS         p5.48xlarge                       80GB        192    2048GB    $ 98.320      $ 9.832            us-east-2          

GPU        QTY  CLOUD  INSTANCE_TYPE  DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION       
H100-MEGA  8    GCP    a3-megagpu-8g  80GB        208    1872GB    $ 92.214      $ 21.208           us-central1  

GPU       QTY  CLOUD   INSTANCE_TYPE       DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION  
H100-SXM  8    RunPod  8x_H100-SXM_SECURE  -           208    640GB     $ 37.520      -                  CA      

请求加速器#

您可以在接受加速器规范的各种地方使用 accelerator:count

$ sky launch --gpus H100:8
$ sky launch --gpus H100  # If count is omitted, default to 1.
$ sky exec my-h100-8-cluster --gpus H100:0.5 job.yaml
# In SkyPilot YAML:

resources:
  accelerators: H100:8

# Set: ask SkyPilot to auto-choose the cheapest and available option.
resources:
  accelerators: {H100:8, A100:8}

# List: ask SkyPilot to try each one in order.
resources:
  accelerators: [L4:8, L40S:8, A10G:8, A10:8]

有关更多示例,请参阅 供应计算资源

Google TPU#

请参阅 Cloud TPU