Skip to content

0. Prerequisites

Before getting started, please make sure your local environment meets the requirements below:

  • OS: Windows, GNU/Linux, or macOS.
  • Platform: For macOS, ScaleX supports arm64 and x86_64. For Windows and GNU/Linux, currently only x86_64 is supported.
  • Cloud: An AWS account with a valid Access Key and Access Secret Key. The account should have admin permissions. Least permissions must cover full access to various resource types such as: VPC, EC2, EBS, Eventbridge, Cloudwatch, IAM. To ensure ScaleX running properly, we strongly recommend to use an admin account.

1. Architecture

1.1 High-level

Local CLI --- AWS APIs (EC2, VPC, S3, IAM, Lambda ...)

1.2 Details

CLI-+---Cluster-1---+--VPC---------------------------------------+
    |               |  +-Pubnet---+   +-Subnet-1--------------+  | 
    |               |  | nat      |   | workers in groups     |  |
    |               |  | master   |   | route to nat          |  |
    |               |  +----------+   +-----------------------+  |
    |               |                 +-Subnet-2--------------+  |
    |               |                 | workers in groups     |  |
    |               |                 | route to nat          |  |
    |               |                 +-----------------------+  |
    |               |                 ...                        |
    |               +--------------------------------------------+
    |               +-Events----+---->+-Lambdas----+----+-Bucket-+
    |               | rules     |---->| delete     |    | info   |
    |               | schedules |---->| replace    |    | logs   |
    |               +-----------+---->+------------+    +--------+
    |
    +---Cluster-2---+---Stack-2
    |
    +---...
    |
    +---Cluster-N---+---Stack-N

2. How-To

2.1 Install

Download the software tarball from the links below:

Unzip the tarball to your local directory. The local directory would be:

ROOT_DIR/
        +---cloud_functions/    * Do not modify
        +---cluster_scripts/    * Do not modify
        +---examples/           * Do not modify
        +---README.md           * This file
        +---cluster_config.json * Modifiable: cluster config
        +---scalex              * Executable: the main program

The ROOT_DIR/scalex (on POSIX) or ROOT_DIR\scalex.exe (on Windows) is the main program to run.

2.2 Configure

As mentioned above, ScaleX needs your AWS credentials (access_key_id and secret_access_key) to run.

IMPORTANT:

  • ScaleX doesn't transfer any information (EXCEPT Cloud API calls) such as credentials out from your local environment. The config command only reads your credentials and stores it to $HOME/.aws/config. Please protect your credentials carefully on your local machine.
  • ScaleX doesn't override your current AWS configs. Instead, it appends a profile with the name scalex to your local AWS config.

Open a Terminal (on POSIX) or a Command Prompt Window (on Windows).

Change the directory to the ROOT_DIR of ScaleX. Suppose the ROOT_DIR is /home/my/scalex_root.

Run the commands:

  • my@ubuntu:~$ cd /home/my/scalex_root
  • my@ubuntu:~/scalex_root$ ./scalex config --ak your_access_key --sk your_secret_key

If everything goes well, your local aws config file would be updated. The console output would be something like:

Scalex (Scale-X): Scale Your Cloud and Save Your Cost. [Version: 0.0.1]
Updated the config profile [profile scalex] to: /home/my/.aws/config.

Please keep the Terminal or CMD Window open for next steps.

2.3 Run

ScaleX provides a Command Line Interface with several commands and corresponding options. Please check the details below.

Usage: ./scalex COMMAND ARGUMENTS ...

COMMAND         ARGUMENTS
config  Configure cloud credentials.
                --ak 'your_access_key'
                --sk 'your_access_secret'
                --force
create  Create a kubernetes cluster.
                --config /your/config/file.json
                --functions-root /your/functions/root/path
                --scripts-root /your/scripts/root/path
delete  Delete a kubernetes cluster.
                --stack-id stack_id_string
                --region region_name
                --stack-data /your/stack/data/file.json
                --delete-logs
list    List your stacks(clusters) or groups in a cluster.
                --type clusters | groups
                --regions region_name_a,region_name_b or --regions all
                --groups group_name_a,group_name_b or --groups all
status  Get the status of a cluster.
                --stack-id stack_id_string
                --region region_name
                --stack-data /your/stack/data/file.json
                --summary
start   Start node(s) of a cluster.
                --all/--groups all
                --groups group_name_a,group_name_b
                --stack-id stack_id_string
                --region region_name
                --stack-data /your/stack/data/file.json
stop    Stop node(s) of a cluster.
                --all
                --groups all or --groups group_name_a,group_name_b
                --stack-id stack_id_string
                --region region_name
                --stack-data /your/stack/data/file.json

3. Tutorials

3.1 Concepts:

  • stack/cluster: a kubernetes cluster managed by ScaleX on your AWS account.
  • *Each stack/cluster has a unique stack-id.
  • worker group: a group of workers with same specifications and instance types.
  • *Inside a stack/cluster, each worker group has a unique name.
  • worker node: an instance in a cluster to run as a kubernetes worker. There are several types:
  • fixed_od: an On-Demand instance that doesn't be switched to Spot during the lifetime of the stack
  • od: an On-Demand instance that is dynamic and might be switched to Spot automatically by ScaleX
  • spot: an Spot instance that might be interrupted. ScaleX handles the interruption automatically
  • nat: an instance in a stack to provide internet access for private subnets
  • master: an instance in a kubernetes stack that runs the control plane
  • autoscaling: a feature that applys to worker group(s) to scale up or down according to the workload. There are several options:
  • full: the group can be scaled up (adding nodes) or scaled down (deleting nodes)
  • up: the group only allows scaling up (adding nodes)
  • down: the group only allows scaling down (deleting nodes)
  • off: autoscaling deactivated
  • schedule_priority: when a spot instance fails, 'region' priority would triggers cross-az migration more frequently, while 'az' priority would triggers spot-to-od switch more frequently.

3.2 Create Your First Stack

3.2.1 Prepare an SSH Keypair

We recommend to use OpenSSH utilities to generate SSH keypairs. Usually, OpenSSH is pre-installed to your Operating System (Windows, macOS, or GNU/Linux).

  • Open a Terminal (on POSIX) or a Command Prompt Window (on Windows).
  • Type command: ssh-keygen -t ed25519 -N '' -f ./scalex_ssh_ed25519 -q. This command is for reference, you can adjust it accordingly.
  • -t: the type (algorithm) of the key pair. You can specify -t rsa or -t ed25519
  • -N: specify a passphrase, empty is usually good
  • -f: the path of the key pair output
  • Two files would be generated. If you use the sample command above, they would be in your current path:
  • scalex_ssh_ed25519: the private key. Please do manage this key very carefully and do not transfer it to any untrusted party.
  • scalex_ssh_ed25519.pub: the public key which can inject it to your cluster later.
  • Change the permissions of your private key. SSH requires 600 mode.
  • On Windows: Right click on the private key file -> Properties -> Security -> Remove other users from the permission list.
  • On POSIX (GNU/Linux or macOS): chmod 600 ./scalex_ssh_ed25519

After the steps above, your SSH key pair should be good to go.

3.2.2 Define and Create a Stack

Each stack/cluster is pre-defined by a JSON-format file. Please refer to the sample cluster_config.json.

{
    "region": "us-east-2",          # A valid AWS region name
    "cpu_arch": "x86_64",           # The CPU architecture of management nodes
    "ha_flag": "False",             # Whether control node HA is enabled (to-be-developed)
    "nfs_init_gb": 100,             # The initial size of cluster NFS share (auto-scaling is in development)
    "master_ssh_public_key": {      # A public SSH key to be injected to the control node for SSH connection
        "type": "file",
        "content": "./id_rsa.pub"
    },
    "worker_groups": [              # Worker groups, it is a list[dict]
        {
            "group_name": "a10_group",  # A unique name of the group
            "instance_type": "",        # Optinal: a precise instance type
                                        # If instance type specified, the parameters such as vcpu_range
                                        # would be skipped
            "schedule_priority": "",    # AZ prioritized or region prioritized(default)
            "cpu_arch": "x86_64",       # CPU architecture of worker nodes
            "cpu_providers": [],        # CPU providers, by default it is intel
            "vcpu_range": {             
                "start": 1,
                "end": 16
            },
            "mem_range": {              # Unit: GiB
                "start": 32,
                "end": 10000
            },
            "mem_per_cpu_range": {      # Unit: GiB
                "start": 0,
                "end": 1000000
            },
            "acc_names": ["a10g"],      # Accelerator (GPU) names
            "acc_count_range": {        # Accelerator (GPU) count per node
                "start": 1,
                "end": 1
            },
            "total_workers": 1,         # Total initial workers
            "od_workers": 1,            # Initial On-Demand workers
            "fixed_workers": 0,         # Fixed OD workers
            "os_disk_gb": 16,           # OS volume/disk size
            "container_disk_gb": 100,   # Container volume/disk size
            "autoscaling_experimental": "full"  # Autoscaling policy
        },
        {
            "group_name": "cpu_group",
            "instance_type": "",
            "schedule_priority": "",
            "cpu_arch": "x86_64",
            "cpu_providers": ["amd"],
            "vcpu_range": {
                "start": 4,
                "end": 4
            },
            "mem_range": {
                "start": 0,
                "end": 10000
            },
            "mem_per_cpu_range": {
                "start": 2,
                "end": 2
            },
            "acc_names": [], 
            "acc_count_range": {
                "start": 1,
                "end": 1
            },
            "total_workers": 1,
            "od_workers": 1,
            "fixed_workers": 0,
            "os_disk_gb": 16,
            "container_disk_gb": 100,
            "autoscaling_experimental": "full"
        }
    ]
}

With the JSON file above (suppose its path is /home/my/cluster_config.json), you can run the command:

./scalex create --config /home/my/cluster_config.json

If everything goes well, an initial stack would be generated. The console output would be something like:

Checking the ingested request attributes ...
The format of configs is checked and good to go.
Checking the contents of the configs.
Worker group: a10_group will use instance type: g5.xlarge.
Worker group: cpu_group will use instance type: c5a.xlarge.
[*] scalex (scale-x) kubernetes stack started creating ...
Stack id: k8s-lml6uq7wbk5e
...
...
...
Worker group: cpu_group provision summary: spot: 1 | od: 1 | total: 2
Worker group: cpu_group: All planned 2 node(s) provisioned.

Scalex k8s stack summary:
...

Please connect to the cluster: ssh ubuntu@XXX.XXX.XXX.XXX -i /your/private/key
[*] The IP address above is *elastic* and will keep unchanged during the lifecycle of this cluster.
Saved the stack to 'k8s-lml6uq7wbk5e.json'.

3.3 Deploy Your First AI Model

NOTE: In this demo, we use Huggingface to get the demo model: DeepSeek-R1-Distill-Qwen-7B. Please signup to Huggingface and get a valid token to follow this part of tutorials.

Step 1: Connect To Your Cluster

Login to your cluster using your private key (the public key has been injected to the master node, and the IP address has been echoed):

ssh ubuntu@10.10.10.10 -i /my/private/key

Please change the example IP address and the example private key path to actual ones.

Or, you can connect to your master node on the AWS Console.

Step 2: Create a User Namespace

Next, you can create a namespace:

kubectl create namespace deepseek

Step 3: Import a Huggingface Token

Import a huggingface token to the namespace deepseek:

kubectl create secret generic huggingface-token --from-literal=HF_TOKEN=hf_xxx.... -n deepseek

Please change the example token string hf_xxx.... to an actual valid token

Step 4: Deploy The Model

Next, please create a YAML file (deepseek.yaml) on the master node to deploy DeepSeek-R1-Distill-Qwen-7B. The YAML template here would create a 3 replica deployment to your cluster.

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  name: deepseek-deployment  
  namespace: deepseek  
  labels:  
    app: deepseek  
spec:  
  replicas: 3 
  selector:  
    matchLabels:  
      app: deepseek  
  template:  
    metadata:  
      labels:  
        app: deepseek  
    spec:  
      tolerations:  
        - key: "nvidia.com/gpu"  
          operator: "Exists"  
          effect: "NoSchedule"  
      volumes:  
      - name: cache-volume  
        hostPath:  
          path: /tmp/deepseek  
          type: DirectoryOrCreate  
      - name: shm  
        emptyDir:  
          medium: Memory  
          sizeLimit: "2Gi"  
      containers:  
      - name: deepseek  
        image: vllm/vllm-openai:latest  
        command: ["/bin/sh", "-c"]  
        args: [  
          "vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B  --trust-remote-code --max_model_len 2048"  
        ]  
        env: 
        - name: HUGGING_FACE_HUB_TOKEN
          valueFrom:
            secretKeyRef:
              name: huggingface-token
              key: HF_TOKEN
        ports:  
        - containerPort: 8000  
        resources:  
          requests:  
            nvidia.com/gpu: "1"  
          limits:  
            nvidia.com/gpu: "1"  
        volumeMounts:  
        - mountPath: /root/.cache/huggingface  
          name: cache-volume  
        - name: shm  
          mountPath: /dev/shm  
---  
apiVersion: v1  
kind: Service  
metadata:  
  name: deepseek-svc  
  namespace: deepseek  
spec:  
  ports:  
  - name: http  
    port: 80  
    protocol: TCP  
    targetPort: 8000
  selector:  
    app: deepseek  
  type: ClusterIP

Please adjust the contents according to your actual stack configurations.

Next, please run kubectl apply -f deepseek.yaml. The pod(s) would be scheduled to the worker node(s).

3.4 Automated Management

ScaleX clusters are fully-dynamic and fully-managed. That means, you don't need to manage the resources when:

  • A spot instance gets interrupted at any time
  • A spot instance is available to replace an high-price On-Demand instance
  • New deployments are scheduled but currently without sufficient resources
  • Deployments get deleted and make some worker nodes idle

Therefore, you can focus on your applicatins, deployments, as well as other workloads, ScaleX aims to handling all the maintainance jobs for you automatically. To achieve this goal, ScaleX runs several low-cost AWS services under your stacks, they are:

  • Lambda Functions
  • AWS EventBridge rules
  • CloudWatch Logs
  • AWS EventBridge Schedules
  • S3 buckets

Please note that these services may introduce extra minor costs. For details about the billing and costs, please refer to the AWS official website.

4. Bug Report, Issues, and Troubleshooting

Please report issues to us by:

  • Email: wangzr@cloudsway.com
  • Technical Support Groups