System Architecture

A single tokio async process replaces the current multi-process Python stack. 12 Rust crates in a workspace, coordinated by 9 smart contracts on Arbitrum.

Process Architecture

Unlike pyaleph's multi-process model (main + workers + API coordinated via RabbitMQ), aleph-node runs as a single async process using tokio. Internal communication uses tokio channels (mpsc, broadcast, watch) instead of RabbitMQ.

aleph-node (single process)
  |
  +-- tokio runtime (multi-threaded)
       |
       +-- Chain watcher task            (aleph-chain)
       +-- P2P network task              (aleph-network)
       +-- API server task               (aleph-api)
       +-- Scheduler task                (aleph-scheduler, if coordinator)
       +-- Storage GC task               (aleph-storage)
       +-- Heartbeat/proof submitter     (aleph-chain)
       +-- Message processing pipeline   (aleph-message)
       |     +-- Channel-based work queue  (tokio::mpsc)
       |     +-- N worker tasks from queue
       +-- VM executor tasks             (aleph-executor, if compute)
             +-- Per-VM supervision tasks
             +-- Metering task

On-Chain vs Off-Chain Split

The system splits concerns between Arbitrum smart contracts (verifiable state) and the Rust node mesh (execution).

ConcernOn-Chain (Arbitrum)Off-Chain (Rust Nodes)
Node identityNodeRegistry contractP2P discovery, metadata hosting
StakingStakingManager contractReward calculation, distribution
Job lifecycleJobManager (create, assign, heartbeat)Scheduling, resource matching, execution
Storage proofsStorageRegistry (Merkle roots, challenges)Actual storage, replication, retrieval
PaymentsPaymentManager (allowance-based settlement)Usage metering, reporting
SLAsSLAManager (definitions, penalties)Uptime tracking, violation detection
DomainsDomainRegistry (ownership, mapping)TLS provisioning, reverse proxy
FunctionsNot on-chainCoordinator routes to Compute Node
Data transferNot on-chainP2P between nodes

Cross-Contract Flows

Node Registration & Activation

User                NodeRegistry        StakingManager
  |                      |                    |
  |-- registerNode() -->|                    |
  |<-- NodeRegistered ---|                    |
  |                      |                    |
  |-- stake() ---------------------------------->|
  |                      |<-- activateNode() ----|  (if minTotalStake met)
  |<-- NodeActivated ----|                    |
  |<-- Staked -----------------------------------|

Job Creation with Allowance-Based Payment

User              JobManager         PaymentManager        ERC-20
  |                   |                    |                   |
  |-- approve() -------------------------------------------------->|
  |<-- Approval ------------------------------------------------------|
  |                   |                    |                   |
  |-- createJob() --->|                    |                   |
  |<-- JobCreated ----|                    |                   |
  |                   |                    |                   |
Node                  |                    |                   |
  |-- assignJob() --->|                    |                   |
  |<-- JobAssigned ---|                    |                   |
  |                   |                    |                   |
Settler               |                    |                   |
  |-- settleJob() ----------------->|                   |
  |                   |                    |-- transferFrom() -->|
  |<-- JobSettled ----|                    |                   |
  |                   |                    | (if !ALEPH: swap  |
  |                   |                    |  via Uniswap V3)  |

Storage Challenge & Slashing

Challenger       StorageRegistry      StakingManager     NodeRegistry
  |                   |                    |                  |
  |-- issueChallenge()>|                   |                  |
  |<-- ChallengeIssued-|                   |                  |
  |                    |                   |                  |
  |  (response deadline passes without response)              |
  |                    |                   |                  |
Anyone                 |                   |                  |
  |-- resolveChallenge()>|                 |                  |
  |                    |-- slash() ------->|                  |
  |<-- ChallengeResolved|                  |                  |

Access Control Roles

Contracts use OpenZeppelin AccessControlUpgradeable with role-based permissions. All admin roles are held by the governance timelock.

RoleHeld ByPermissions
DEFAULT_ADMIN TimelockController (governance) Grant/revoke roles, upgrade contracts
UPGRADER_ROLE TimelockController Upgrade UUPS proxy implementations
PARAMETER_ROLE TimelockController Adjust protocol parameters
PAUSER_ROLE Emergency multisig Pause contracts (no timelock needed)
SLASHER_ROLE Governance + slashing committee Execute slashing on StakingManager
REPORTER_ROLE Reward calculator service Submit reward Merkle roots
SCHEDULER_ROLE Coordinator nodes Assign jobs to compute nodes

Scheduling Algorithm

The coordinator node scores candidate compute nodes for each job using a weighted formula:

score(node, job) =
    0.4 * resource_fit(node, job)      // How well free resources match request
  + 0.3 * stake_weight(node)           // Higher stake = more trustworthy
  + 0.2 * locality_score(node, job)    // Geographic proximity to user
  + 0.1 * load_balance(node)           // Prefer less-loaded nodes

Client SDK Architecture

The core SDK is Rust-first, with TypeScript and Python wrapping it for consistent behavior across all languages.

aleph-sdk-rs (Rust, core)
  |
  +-- aleph-sdk-ts  (TypeScript, via wasm-pack / napi-rs)
  +-- aleph-sdk-py  (Python, via PyO3)
  +-- aleph-cli     (Rust, built on aleph-sdk-rs)
// SDK usage example
let client = AlephClient::new(config)?;
let auth = client.with_account(EthAccount::from_private_key(key)?);

// Deploy an instance
let job = auth.create_instance(InstanceSpec {
    rootfs: "Qm...abc".parse()?,
    vcpus: 4,
    memory_mb: 8192,
    ssh_keys: vec!["ssh-ed25519 ...".into()],
    ..Default::default()
}).await?;

// Upload storage
let hash = auth.upload_file(path).await?;

// Stake on a node
auth.stake(node_id, amount).await?;

Security Considerations

Reentrancy

ReentrancyGuard on all payment functions. Checks-effects-interactions pattern. SafeERC20 for all transfers.

Flash Loan Protection

minStakeDuration: stake must persist across multiple blocks before activation. Staking and activation are separate transactions.

Upgrade Safety

Governance proposal + voting + 48h timelock delay. Storage layout validated by OpenZeppelin. No selfdestruct or arbitrary delegatecall.

DoS Mitigation

Registration requires minimum stake (economic cost). Challenges require bond. Batch operations capped. EnumerableSet for O(1) add/remove.