0. Prerequisites
Before getting started, please make sure your local environment meets the requirements below:
- OS: Windows, GNU/Linux, or macOS.
- Platform: For macOS, ScaleX supports
arm64
andx86_64
. For Windows and GNU/Linux, currently onlyx86_64
is supported. - Cloud: An AWS account with a valid Access Key and Access Secret Key. The account should have admin permissions. Least permissions must cover full access to various resource types such as: VPC, EC2, EBS, Eventbridge, Cloudwatch, IAM. To ensure ScaleX running properly, we strongly recommend to use an admin account.
1. Architecture
1.1 High-level
Local CLI --- AWS APIs (EC2, VPC, S3, IAM, Lambda ...)
1.2 Details
CLI-+---Cluster-1---+--VPC---------------------------------------+
| | +-Pubnet---+ +-Subnet-1--------------+ |
| | | nat | | workers in groups | |
| | | master | | route to nat | |
| | +----------+ +-----------------------+ |
| | +-Subnet-2--------------+ |
| | | workers in groups | |
| | | route to nat | |
| | +-----------------------+ |
| | ... |
| +--------------------------------------------+
| +-Events----+---->+-Lambdas----+----+-Bucket-+
| | rules |---->| delete | | info |
| | schedules |---->| replace | | logs |
| +-----------+---->+------------+ +--------+
|
+---Cluster-2---+---Stack-2
|
+---...
|
+---Cluster-N---+---Stack-N
2. How-To
2.1 Install
Download the software tarball from the links below:
- Windows-x64: Download Link
- GNU/Linux-x64: Download Link
- macOS-x64: Download Link
- macOS-arm64: Download Link
Unzip the tarball to your local directory. The local directory would be:
ROOT_DIR/
+---cloud_functions/ * Do not modify
+---cluster_scripts/ * Do not modify
+---examples/ * Do not modify
+---README.md * This file
+---cluster_config.json * Modifiable: cluster config
+---scalex * Executable: the main program
The ROOT_DIR/scalex
(on POSIX) or ROOT_DIR\scalex.exe
(on Windows) is the main program to run.
2.2 Configure
As mentioned above, ScaleX needs your AWS credentials (access_key_id
and secret_access_key
) to run.
IMPORTANT:
- ScaleX doesn't transfer any information (EXCEPT Cloud API calls) such as credentials out from your local environment. The
config
command only reads your credentials and stores it to $HOME/.aws/config. Please protect your credentials carefully on your local machine. - ScaleX doesn't override your current AWS configs. Instead, it appends a profile with the name
scalex
to your local AWS config.
Open a Terminal (on POSIX) or a Command Prompt Window (on Windows).
Change the directory to the ROOT_DIR
of ScaleX. Suppose the ROOT_DIR
is /home/my/scalex_root
.
Run the commands:
my@ubuntu:~$
cd /home/my/scalex_root
my@ubuntu:~/scalex_root$
./scalex config --ak your_access_key --sk your_secret_key
If everything goes well, your local aws config file would be updated. The console output would be something like:
Scalex (Scale-X): Scale Your Cloud and Save Your Cost. [Version: 0.0.1]
Updated the config profile [profile scalex] to: /home/my/.aws/config.
Please keep the Terminal or CMD Window open for next steps.
2.3 Run
ScaleX provides a Command Line Interface with several commands and corresponding options. Please check the details below.
Usage: ./scalex COMMAND ARGUMENTS ...
COMMAND ARGUMENTS
config Configure cloud credentials.
--ak 'your_access_key'
--sk 'your_access_secret'
--force
create Create a kubernetes cluster.
--config /your/config/file.json
--functions-root /your/functions/root/path
--scripts-root /your/scripts/root/path
delete Delete a kubernetes cluster.
--stack-id stack_id_string
--region region_name
--stack-data /your/stack/data/file.json
--delete-logs
list List your stacks(clusters) or groups in a cluster.
--type clusters | groups
--regions region_name_a,region_name_b or --regions all
--groups group_name_a,group_name_b or --groups all
status Get the status of a cluster.
--stack-id stack_id_string
--region region_name
--stack-data /your/stack/data/file.json
--summary
start Start node(s) of a cluster.
--all/--groups all
--groups group_name_a,group_name_b
--stack-id stack_id_string
--region region_name
--stack-data /your/stack/data/file.json
stop Stop node(s) of a cluster.
--all
--groups all or --groups group_name_a,group_name_b
--stack-id stack_id_string
--region region_name
--stack-data /your/stack/data/file.json
3. Tutorials
3.1 Concepts:
- stack/cluster: a kubernetes cluster managed by ScaleX on your AWS account.
- *Each stack/cluster has a unique stack-id.
- worker group: a group of workers with same specifications and instance types.
- *Inside a stack/cluster, each worker group has a unique name.
- worker node: an instance in a cluster to run as a kubernetes worker. There are several types:
- fixed_od: an On-Demand instance that doesn't be switched to Spot during the lifetime of the stack
- od: an On-Demand instance that is dynamic and might be switched to Spot automatically by ScaleX
- spot: an Spot instance that might be interrupted. ScaleX handles the interruption automatically
- nat: an instance in a stack to provide internet access for private subnets
- master: an instance in a kubernetes stack that runs the control plane
- autoscaling: a feature that applys to worker group(s) to scale up or down according to the workload. There are several options:
- full: the group can be scaled up (adding nodes) or scaled down (deleting nodes)
- up: the group only allows scaling up (adding nodes)
- down: the group only allows scaling down (deleting nodes)
- off: autoscaling deactivated
- schedule_priority: when a spot instance fails, 'region' priority would triggers cross-az migration more frequently, while 'az' priority would triggers spot-to-od switch more frequently.
3.2 Create Your First Stack
3.2.1 Prepare an SSH Keypair
We recommend to use OpenSSH utilities to generate SSH keypairs. Usually, OpenSSH is pre-installed to your Operating System (Windows, macOS, or GNU/Linux).
- Open a Terminal (on POSIX) or a Command Prompt Window (on Windows).
- Type command:
ssh-keygen -t ed25519 -N '' -f ./scalex_ssh_ed25519 -q
. This command is for reference, you can adjust it accordingly. -t:
the type (algorithm) of the key pair. You can specify-t rsa
or-t ed25519
-N:
specify a passphrase, empty is usually good-f:
the path of the key pair output- Two files would be generated. If you use the sample command above, they would be in your current path:
scalex_ssh_ed25519:
the private key. Please do manage this key very carefully and do not transfer it to any untrusted party.scalex_ssh_ed25519.pub:
the public key which can inject it to your cluster later.- Change the permissions of your private key. SSH requires
600
mode. - On Windows: Right click on the private key file -> Properties -> Security -> Remove other users from the permission list.
- On POSIX (GNU/Linux or macOS):
chmod 600 ./scalex_ssh_ed25519
After the steps above, your SSH key pair should be good to go.
3.2.2 Define and Create a Stack
Each stack/cluster is pre-defined by a JSON-format file. Please refer to the sample cluster_config.json
.
{
"region": "us-east-2", # A valid AWS region name
"cpu_arch": "x86_64", # The CPU architecture of management nodes
"ha_flag": "False", # Whether control node HA is enabled (to-be-developed)
"nfs_init_gb": 100, # The initial size of cluster NFS share (auto-scaling is in development)
"master_ssh_public_key": { # A public SSH key to be injected to the control node for SSH connection
"type": "file",
"content": "./id_rsa.pub"
},
"worker_groups": [ # Worker groups, it is a list[dict]
{
"group_name": "a10_group", # A unique name of the group
"instance_type": "", # Optinal: a precise instance type
# If instance type specified, the parameters such as vcpu_range
# would be skipped
"schedule_priority": "", # AZ prioritized or region prioritized(default)
"cpu_arch": "x86_64", # CPU architecture of worker nodes
"cpu_providers": [], # CPU providers, by default it is intel
"vcpu_range": {
"start": 1,
"end": 16
},
"mem_range": { # Unit: GiB
"start": 32,
"end": 10000
},
"mem_per_cpu_range": { # Unit: GiB
"start": 0,
"end": 1000000
},
"acc_names": ["a10g"], # Accelerator (GPU) names
"acc_count_range": { # Accelerator (GPU) count per node
"start": 1,
"end": 1
},
"total_workers": 1, # Total initial workers
"od_workers": 1, # Initial On-Demand workers
"fixed_workers": 0, # Fixed OD workers
"os_disk_gb": 16, # OS volume/disk size
"container_disk_gb": 100, # Container volume/disk size
"autoscaling_experimental": "full" # Autoscaling policy
},
{
"group_name": "cpu_group",
"instance_type": "",
"schedule_priority": "",
"cpu_arch": "x86_64",
"cpu_providers": ["amd"],
"vcpu_range": {
"start": 4,
"end": 4
},
"mem_range": {
"start": 0,
"end": 10000
},
"mem_per_cpu_range": {
"start": 2,
"end": 2
},
"acc_names": [],
"acc_count_range": {
"start": 1,
"end": 1
},
"total_workers": 1,
"od_workers": 1,
"fixed_workers": 0,
"os_disk_gb": 16,
"container_disk_gb": 100,
"autoscaling_experimental": "full"
}
]
}
With the JSON file above (suppose its path is /home/my/cluster_config.json
), you can run the command:
./scalex create --config /home/my/cluster_config.json
If everything goes well, an initial stack would be generated. The console output would be something like:
Checking the ingested request attributes ...
The format of configs is checked and good to go.
Checking the contents of the configs.
Worker group: a10_group will use instance type: g5.xlarge.
Worker group: cpu_group will use instance type: c5a.xlarge.
[*] scalex (scale-x) kubernetes stack started creating ...
Stack id: k8s-lml6uq7wbk5e
...
...
...
Worker group: cpu_group provision summary: spot: 1 | od: 1 | total: 2
Worker group: cpu_group: All planned 2 node(s) provisioned.
Scalex k8s stack summary:
...
Please connect to the cluster: ssh ubuntu@XXX.XXX.XXX.XXX -i /your/private/key
[*] The IP address above is *elastic* and will keep unchanged during the lifecycle of this cluster.
Saved the stack to 'k8s-lml6uq7wbk5e.json'.
3.3 Deploy Your First AI Model
NOTE: In this demo, we use Huggingface to get the demo model: DeepSeek-R1-Distill-Qwen-7B
. Please signup to Huggingface and get a valid token to follow this part of tutorials.
Step 1: Connect To Your Cluster
Login to your cluster using your private key (the public key has been injected to the master node, and the IP address has been echoed):
ssh ubuntu@10.10.10.10 -i /my/private/key
Please change the example IP address and the example private key path to actual ones.
Or, you can connect to your master node on the AWS Console.
Step 2: Create a User Namespace
Next, you can create a namespace:
kubectl create namespace deepseek
Step 3: Import a Huggingface Token
Import a huggingface token to the namespace deepseek
:
kubectl create secret generic huggingface-token --from-literal=HF_TOKEN=hf_xxx.... -n deepseek
Please change the example token string hf_xxx.... to an actual valid token
Step 4: Deploy The Model
Next, please create a YAML file (deepseek.yaml
) on the master node to deploy DeepSeek-R1-Distill-Qwen-7B
. The YAML template here would create a 3 replica deployment to your cluster.
apiVersion: apps/v1
kind: Deployment
metadata:
name: deepseek-deployment
namespace: deepseek
labels:
app: deepseek
spec:
replicas: 3
selector:
matchLabels:
app: deepseek
template:
metadata:
labels:
app: deepseek
spec:
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: cache-volume
hostPath:
path: /tmp/deepseek
type: DirectoryOrCreate
- name: shm
emptyDir:
medium: Memory
sizeLimit: "2Gi"
containers:
- name: deepseek
image: vllm/vllm-openai:latest
command: ["/bin/sh", "-c"]
args: [
"vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --trust-remote-code --max_model_len 2048"
]
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: huggingface-token
key: HF_TOKEN
ports:
- containerPort: 8000
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
volumeMounts:
- mountPath: /root/.cache/huggingface
name: cache-volume
- name: shm
mountPath: /dev/shm
---
apiVersion: v1
kind: Service
metadata:
name: deepseek-svc
namespace: deepseek
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8000
selector:
app: deepseek
type: ClusterIP
Please adjust the contents according to your actual stack configurations.
Next, please run kubectl apply -f deepseek.yaml
. The pod(s) would be scheduled to the worker node(s).
3.4 Automated Management
ScaleX clusters are fully-dynamic and fully-managed. That means, you don't need to manage the resources when:
- A spot instance gets interrupted at any time
- A spot instance is available to replace an high-price On-Demand instance
- New deployments are scheduled but currently without sufficient resources
- Deployments get deleted and make some worker nodes idle
Therefore, you can focus on your applicatins, deployments, as well as other workloads, ScaleX aims to handling all the maintainance jobs for you automatically. To achieve this goal, ScaleX runs several low-cost AWS services under your stacks, they are:
- Lambda Functions
- AWS EventBridge rules
- CloudWatch Logs
- AWS EventBridge Schedules
- S3 buckets
Please note that these services may introduce extra minor costs. For details about the billing and costs, please refer to the AWS official website.
4. Bug Report, Issues, and Troubleshooting
Please report issues to us by:
- Email: wangzr@cloudsway.com
- Technical Support Groups