System Architecture
A single tokio async process replaces the current multi-process Python stack. 12 Rust crates in a workspace, coordinated by 9 smart contracts on Arbitrum.
Process Architecture
Unlike pyaleph's multi-process model (main + workers + API coordinated via RabbitMQ), aleph-node runs as a single async process using tokio. Internal communication uses tokio channels (mpsc, broadcast, watch) instead of RabbitMQ.
aleph-node (single process)
|
+-- tokio runtime (multi-threaded)
|
+-- Chain watcher task (aleph-chain)
+-- P2P network task (aleph-network)
+-- API server task (aleph-api)
+-- Scheduler task (aleph-scheduler, if coordinator)
+-- Storage GC task (aleph-storage)
+-- Heartbeat/proof submitter (aleph-chain)
+-- Message processing pipeline (aleph-message)
| +-- Channel-based work queue (tokio::mpsc)
| +-- N worker tasks from queue
+-- VM executor tasks (aleph-executor, if compute)
+-- Per-VM supervision tasks
+-- Metering task
On-Chain vs Off-Chain Split
The system splits concerns between Arbitrum smart contracts (verifiable state) and the Rust node mesh (execution).
| Concern | On-Chain (Arbitrum) | Off-Chain (Rust Nodes) |
|---|---|---|
| Node identity | NodeRegistry contract | P2P discovery, metadata hosting |
| Staking | StakingManager contract | Reward calculation, distribution |
| Job lifecycle | JobManager (create, assign, heartbeat) | Scheduling, resource matching, execution |
| Storage proofs | StorageRegistry (Merkle roots, challenges) | Actual storage, replication, retrieval |
| Payments | PaymentManager (allowance-based settlement) | Usage metering, reporting |
| SLAs | SLAManager (definitions, penalties) | Uptime tracking, violation detection |
| Domains | DomainRegistry (ownership, mapping) | TLS provisioning, reverse proxy |
| Functions | Not on-chain | Coordinator routes to Compute Node |
| Data transfer | Not on-chain | P2P between nodes |
Cross-Contract Flows
Node Registration & Activation
User NodeRegistry StakingManager
| | |
|-- registerNode() -->| |
|<-- NodeRegistered ---| |
| | |
|-- stake() ---------------------------------->|
| |<-- activateNode() ----| (if minTotalStake met)
|<-- NodeActivated ----| |
|<-- Staked -----------------------------------|
Job Creation with Allowance-Based Payment
User JobManager PaymentManager ERC-20
| | | |
|-- approve() -------------------------------------------------->|
|<-- Approval ------------------------------------------------------|
| | | |
|-- createJob() --->| | |
|<-- JobCreated ----| | |
| | | |
Node | | |
|-- assignJob() --->| | |
|<-- JobAssigned ---| | |
| | | |
Settler | | |
|-- settleJob() ----------------->| |
| | |-- transferFrom() -->|
|<-- JobSettled ----| | |
| | | (if !ALEPH: swap |
| | | via Uniswap V3) |
Storage Challenge & Slashing
Challenger StorageRegistry StakingManager NodeRegistry
| | | |
|-- issueChallenge()>| | |
|<-- ChallengeIssued-| | |
| | | |
| (response deadline passes without response) |
| | | |
Anyone | | |
|-- resolveChallenge()>| | |
| |-- slash() ------->| |
|<-- ChallengeResolved| | |
Access Control Roles
Contracts use OpenZeppelin AccessControlUpgradeable with role-based permissions. All admin roles are held by the governance timelock.
| Role | Held By | Permissions |
|---|---|---|
DEFAULT_ADMIN |
TimelockController (governance) | Grant/revoke roles, upgrade contracts |
UPGRADER_ROLE |
TimelockController | Upgrade UUPS proxy implementations |
PARAMETER_ROLE |
TimelockController | Adjust protocol parameters |
PAUSER_ROLE |
Emergency multisig | Pause contracts (no timelock needed) |
SLASHER_ROLE |
Governance + slashing committee | Execute slashing on StakingManager |
REPORTER_ROLE |
Reward calculator service | Submit reward Merkle roots |
SCHEDULER_ROLE |
Coordinator nodes | Assign jobs to compute nodes |
Scheduling Algorithm
The coordinator node scores candidate compute nodes for each job using a weighted formula:
score(node, job) =
0.4 * resource_fit(node, job) // How well free resources match request
+ 0.3 * stake_weight(node) // Higher stake = more trustworthy
+ 0.2 * locality_score(node, job) // Geographic proximity to user
+ 0.1 * load_balance(node) // Prefer less-loaded nodes
Client SDK Architecture
The core SDK is Rust-first, with TypeScript and Python wrapping it for consistent behavior across all languages.
aleph-sdk-rs (Rust, core)
|
+-- aleph-sdk-ts (TypeScript, via wasm-pack / napi-rs)
+-- aleph-sdk-py (Python, via PyO3)
+-- aleph-cli (Rust, built on aleph-sdk-rs)
// SDK usage example
let client = AlephClient::new(config)?;
let auth = client.with_account(EthAccount::from_private_key(key)?);
// Deploy an instance
let job = auth.create_instance(InstanceSpec {
rootfs: "Qm...abc".parse()?,
vcpus: 4,
memory_mb: 8192,
ssh_keys: vec!["ssh-ed25519 ...".into()],
..Default::default()
}).await?;
// Upload storage
let hash = auth.upload_file(path).await?;
// Stake on a node
auth.stake(node_id, amount).await?;
Security Considerations
Reentrancy
ReentrancyGuard on all payment functions. Checks-effects-interactions pattern. SafeERC20 for all transfers.
Flash Loan Protection
minStakeDuration: stake must persist across multiple blocks before activation. Staking and activation are separate transactions.
Upgrade Safety
Governance proposal + voting + 48h timelock delay. Storage layout validated by OpenZeppelin. No selfdestruct or arbitrary delegatecall.
DoS Mitigation
Registration requires minimum stake (economic cost). Challenges require bond. Batch operations capped. EnumerableSet for O(1) add/remove.