Storage System

Content-addressed blob store with RocksDB indexing, Reed-Solomon erasure coding, Merkle proof verification, and on-chain challenge/response auditing.

Architecture

Upload Content hash Chunk Fixed-size splits Erasure Code Reed-Solomon Distribute N nodes Verify Merkle proof

Module Overview

The aleph-storage crate exports:

ModuleKey TypesPurpose
engineStorageEngineCore blob store: put, get, delete, exists
indexStorageIndex, BlobMetadataRocksDB index for content-addressed lookup
cacheContentCache, EvictionPolicyIn-memory LRU/LFU cache layer
cached_engineCachedStorageEngineEngine + cache composition
chunkingChunkingEngine, ChunkManifestFixed-size chunking with manifests
merkleMerkleTree, MerkleProofMerkle tree construction and proof generation
proofsStorageProofGenerator, ChallengeResponderOn-chain storage proof generation
replicationErasureEncoder, ReplicationManagerReed-Solomon encoding and shard placement
ipfsCidV0, CidV1, IpfsGatewayIPFS CID compatibility and gateway
gcGarbage collectionCleanup of unreferenced blobs

Content Addressing

All data is stored by its content hash (SHA-256). The StorageEngine provides the core interface:

pub trait StorageEngine {
    async fn put(&self, data: &[u8]) -> Result<ContentHash>;
    async fn get(&self, hash: &ContentHash) -> Result<Vec<u8>>;
    async fn exists(&self, hash: &ContentHash) -> bool;
    async fn delete(&self, hash: &ContentHash) -> Result<()>;
    async fn size(&self, hash: &ContentHash) -> Result<u64>;
}

Chunking

Large files are split into fixed-size chunks (default: 256 KiB, configurable between MIN_CHUNK_SIZE and MAX_CHUNK_SIZE). Each chunk is stored independently and tracked by a ChunkManifest:

pub struct ChunkManifest {
    pub content_hash: ContentHash,   // hash of original file
    pub chunks: Vec<ChunkInfo>,
    pub total_size: u64,
}

pub struct ChunkInfo {
    pub hash: ContentHash,
    pub offset: u64,
    pub size: u32,
}

Merkle Trees & Proofs

Each storage commitment generates a Merkle tree from chunk hashes. The root is committed on-chain in the StorageRegistry contract.

// Build Merkle tree from chunks
let tree = MerkleTree::from_leaves(&chunk_hashes);
let root = tree.root();

// Generate proof for a specific chunk
let proof = tree.proof(chunk_index);

// Verify proof (done on-chain via StorageRegistry)
assert!(proof.verify(root, leaf_hash));

Erasure Coding

Reed-Solomon encoding provides data redundancy. Data shards are distributed across nodes for fault tolerance.

// Encode with Reed-Solomon
let encoder = ErasureEncoder::new(
    data_shards,    // e.g., 4
    parity_shards,  // e.g., 2 (tolerates 2 failures)
);

let shards: Vec<Shard> = encoder.encode(&data)?;

// Place shards across nodes
let placements = ReplicationManager::place_shards(
    &shards,
    &available_nodes,
    replication_factor,
)?;

Challenge/Response

Storage providers must prove data possession via on-chain challenges:

  1. Challenger issues a challenge with a random seed via StorageRegistry.issueChallenge()
  2. Node computes the challenge response using the seed to select which chunks to prove
  3. Node submits Merkle proof via StorageRegistry.respondToChallenge()
  4. Contract verifies proof on-chain using MerkleProof.verify()
  5. Failed challenges trigger slashing via StakingManager.slash()
// Challenge response flow (node side)
let responder = ChallengeResponder::new(&storage_engine);

let response: ChallengeResponse = responder
    .respond(challenge_seed, commitment_id)
    .await?;

// Submit proof on-chain
storage_registry
    .respondToChallenge(
        challenge_id,
        response.proof,
        response.leaf,
    )
    .send().await?;

Caching

The CachedStorageEngine wraps the base engine with an in-memory cache supporting LRU and LFU eviction policies:

let cache = ContentCache::new(CacheConfig {
    max_size_bytes: 512 * 1024 * 1024, // 512 MB
    eviction_policy: EvictionPolicy::LRU,
});

let engine = CachedStorageEngine::new(base_engine, cache);