How Ferum Improved Gas Efficiency by 4.5x

Learn the techniques we used to build the most efficient CLOB on Aptos

Jan 25, 2023

Ferum is a fully decentralized orderbook built on top of the Aptos block chain. As a core piece of DeFi infrastructure which will handle a large amount of TPS, its critical for transactions on Ferum to be as cheap as possible. A single transaction on its own costing $0.003 USD for example is not expensive but when this cost is compounded every second, it balloons to a cost of $94,608 USD per year.

The Ferum team’s been hard at work building Ferum v3, a full gas optimized rewrite of Ferum. We’ve opensourced a big chunk of our codebase so give it a read!

Below we’ll dive deep into a few of the techniques used to reduce gas costs to the lowest possible amount, resulting in the most efficient consolidated limit order book on Aptos.

Optimizing in a distributed environment has been challenging because it requires us to combine asymptotic analysis with real world outcomes. A contrived example for maintaining elements in sorted order: from a big O perspective a tree based data structure is better but, if the number of elements is small, the overhead of maintaining a tree on a blockchain nullifies any gains. For this reason, we’ll think a lot from first principles as it can reveal paths that end up being much more efficient.

This post assumes decent familiarity with Aptos, Move, algorithms, and data structures.

Aptos Gas Basics

Aptos charges gas for both compute and storage. Each item in the linked schedules is the cost for the specified operation in terms of Internal Gas, which is simply the number of Gas Units multiplied by a scaling factor, currently set to 10,000.

Each Gas Unit currently costs 100 APT octas (10E-8 APT). So we have the following:

APT octas cost = internal_gas_cost / scaling_factor * gas_unit_price
               = internal_gas_cost / 10000 * 100
               = internal_gas_cost / 100

APT cost = APT octas cost * 10E-8

Aptos also scales gas costs based on overall usage of the network. For the purposes of this post, we will just consider base costs to establish consistent baselines. You can learn more Aptos gas from the docs.

Optimizations

1. B+ Tree

Orderbooks are quite simple actually and have few requirements. At a high level, they match sells and bids whose prices cross. So being able to access, insert, and remove prices in sorted order is a key operation that needs to be efficient.

We can immediately eliminate just storing prices in unsorted order. Inserting is fast but removing and accessing elements become really costly - requiring an O(n) scan.

We evaluated and implemented many options to store prices in sorted order when deciding on which data structure to use. Treaps, heaps, AVL trees, red black trees (which we open sourced!), to even just using a sorted list. It became clear that none of the above options were gas efficient.

The main culprit of this was gas costs associated with reading and writing to existing objects in global storage. Let’s take AVL trees for example, a data structure commonly used by other CLOBs. With just 100 prices, and AVL tree will have a height of 10 (ceil(1.44 * log2(100))). That means each update operation will touch 10 tree nodes and, potentially, write upwards of 19 tree nodes when rebalancing (each rotation touches 3 nodes in the most optimal implementation and, in the case of deletes, rotations propagate to the root). Reading and writing individual items each costs 3,000 octas. Ignoring any per byte read/writes costs for now, inserting or removing from an AVL tree will cost a whopping 114,000 octas! For an orderbook, these operations need to be done multiple times per order so cost only compounds.

AVL Tree Worst Case — Shows how an AVL tree with depth 4 has worst case 7 nodes to read/write when deleting

It turns out that we can look to databases for help. The biggest cost for a database is the cost to read/write a new page to disk and so they try to use data structures which minimizes disk access as much as possible. This is analogous to our situation: being able to access fewer items in global storage would help reduce gas costs tremendously.

Enter B+ trees, a structure commonly used to represent database indexes. A B+ tree stores many elements at each node, which greatly reduces the height of the tree. For example, storing the same 100 prices in a B+ tree with 5 prices at each node (degree 5) will only have a height of 3! Even considering the worst case for rebalancing, B+ trees only require reading/writing h tree nodes, where h is the height of the tree.

B+ tree can have multiple elements in each node, greatly reducing tree depth

B+ trees do present a tradeoff. Aptos charges gas per byte when writing and reading - particularly, writing costs an additional 50 octas per byte. So our B+ tree nodes can’t be too large. We found that a degree of 8 yielded the most optimal results based on the size of our node elements and how many prices we expect there to be in the tree.

Taking into consideration per byte read/write costs, 100 prices in our B+ tree would cost only ~60,000 octas to perform updates on, helping us reduce gas costs by over 50% compared to other data structures.

You can find our B+ tree implementation in the Ferum source code.

2. Price Cache

Orderbooks also have a unique usage pattern in that prices closer to the spread are more likely to see more activity. We can leverage this “hot path” by storing prices in a simple vector.

This might sound counter intuitive - we went through all that work to build a B+ tree but are still using vectors. The key here is that vectors are only cheap if the number of elements in that vector is constrained. This is primarily because of the per byte write cost of 50 octas. For example, a vector of just 1,000 bytes would end up costing >50,000 octas to write. And we pay this cost each time we update the vector, even if we update only a single element. Vector operations all cost less than 100 octas so, while comparatively cheaper, still need to be taken into account.

We landed on 25 prices as the optimal for the size of the cache based on element size and price level usage on other CLOBs like OpenBook. Taking into account all gas costs, operations on the cache cost only ~25,000 octas, a further 60% improvement over the B+ tree.

3. Crank Deferred Computation

Ferum’s order matching is completely atomic. Orders are executed the moment they are placed onto the book and don’t require cranking to be matched.

Ferum’s crank mechanism allows us to defer expensive computation to outside the “hot path”. Asset settlement for example is deferred to when the crank is turned. Cache/tree rebalancing is also deferred to the crank.

Most importantly, Ferum leverages pending quantities to defer expensive reads/writes. If a taker order was to execute against maker orders, the quantities of each of these maker orders would need to be modified. Since each read/write costs at least 6,000 octas, costs can quickly balloon. By deferring updates to order objects to the crank, order executions only involve the modification of a single counter in the cache/tree object, greatly reducing costs.

4. Multi Element Lists

A common theme has been balancing per byte storage costs vs per item storage costs. Another example of where this comes into play is with linked lists. Typically, a linked list will have nodes each stored as an item in a table (just as we did with our open source implementation). Iterating through this list is costly because of Aptos’ high per item storage gas cost.

One way to improve this is to store multiple elements of the list into a single node. As a high level thought, this approach works when data stored in each node is localized (ie using element A will mean there’s a high likelihood element B will also be used). This approach works particularly well for queues, where elements are only being popped/pushed from the front/end.

We’ve implemented the into the inlined NodeList structure, which you can find in the Ferum source code.

5. Global Storage Object Reuse

We’ve mostly talked about read/write costs for existing item but a bigger elephant in the room is the creation of new items in global storage. Currently, each new item costs a daunting 50,000 octas and there is no incentive for teams to remove these items once they are not needed.

Given the current state of the gas schedule, a common technique is to reuse objects. Luckily, this is pretty easy to do. Here is a quick generalized example:

struct Node<T: store> {
  next: u16,
  data: vector<T>,
}

struct ReuseTable<T: store> {
  unusedStack: u16,
  nodes: table::Table<u16, T>,
  currNodeID: u16,
}

fun get_or_create_node<T>(table: &mut ReuseTable<T>): u16 {
    if (table.unusedStack == 0) {
        prealloc_nodes(table, 1)
    };
    let nodeID = table.unusedStack;
    let node = table::borrow_mut(&mut table.nodes, nodeID);
    table.unusedStack = node.next;
    node.next = 0;
    nodeID
}

fun prealloc_nodes<T>(table: &mut ReuseTable<T>, count: u8) {
    let i = 0;
    while (i < count) {
        let node = Node<{
            next: table.unusedStack,
            data: vector[],
        };
        let nodeID = table.currNodeID;
        table.currNodeID = table.currNodeID + 1;
        table.unusedStack = nodeID;
        table::add(&mut table.nodes, nodeID, node);
        i = i + 1;
    }
}

ReuseTable in the above example supports up to 255 reusable nodes. A onetime cost can be paid to preallocate these nodes into global storage so nodes can be reused when get_or_create_node is called. When a node is no longer being used, it just gets added back onto the unused node stack.

We’ve inlined the above implementation for both PriceLevelReuseTable, OrderReuseTable, and NodeList.

Closing

The above optimizations have been a ton of fun to work on and have allowed the Ferum team to really flex it’s engineering muscle. Stay tuned as we start to deploy Ferum and release benchmarks.

That being said, gas does present a significant challenge to novice teams and a big barrier to the creation of amazing DeFi protocols. We’re continuously working with the Aptos team to provide feedback on improvements to the core blockchain layer so optimizations like the ones above aren’t needed anymore!

We hope you enjoyed this post and don’t hesitate to reach out on Twitter if you have questions!

Ferum Engineering Blog

Discussion about this post