May 07, 2025
Hash What You Mean
There seems to be no doubts on how to go about hashing single messages, under the assumption that the hash function H itself is not prone to length-extension attacks, and that the encoding of the message is not ambiguous: the message may be split internally into blocks, the blocks may be padded, etc., but the process could be generalized simply with \( m \mapsto H(m) \). On the contrary, when it comes to hashing collections of objects, the situation looks different. One first option that comes to mind is to convert the collection to its string representation and then hash it. However, this approach "delegates" the problem of ambiguity to the encoding or serialization function: there are indeed multiple ways to encode a single string, depending on whether we have a preference for string termination characters or for encoding the length of the string explicitly, in which case one needs to worry about the separator as well. For instance, a list such as ["storm", "rain"] could be encoded as "5storm,4rain" but what about ["5storm,", "a"*68]? With the same encoding, this would become: "75storm,,68aaa...", which would put the deserializer in trouble. However, we take for granted here that there exist an encoding function which makes the representation unambiguous, provided its implementation is correct, and that multiple implementations are all coherent. This can be anything from ASN.1, to JSON and Protobuf. Equipped with such a function, the first approach works, and when this function also provides means to declare the types of each object, and that of the container itself, it fulfills the Horton principle, adapted here for the context: hash what is being meant, not what is being said, that is, the hashed content must convey indications on the way the message was constructed. If the collection happens to be a list, then the ordering of the objects matters and this method preserves the ordering as well.
Let's consider an alternative approach, using Merkle trees. We could segment the collection into pairs of leaves of the tree, and then hash each pair to obtain the nodes, and then, recursively on the pairs of nodes, until we obtain the root, represented by a single hash. Even before the hashing operations themselves though, it becomes apparent that this approach presumes a whole new data structure built on top of the input collection: we have terminal leaves and nodes, depth and levels. Also, we have mentioned that the hashing itself is carried over pairs of elements, and so one may ask: even if we are equipped with the "optimal" encoding function above, how do we pick the pairs? Supposing that the collection is a list, how do we pair the elements? Do we treat each element as a single leaf, therefore hashing pairs of elements? And in such case, what happens when the list has a number of elements that is not a power of two? Furthermore, do we start pairing from the "left", i.e. from the beginning of the list, or from the tail? Each of these questions leads to at least two different, opinionated approaches. Let's find them out by taking a curious look at how these are implemented in the blockchain industry, which made Merkle trees popular. We will show that these differences, when taken on their own, are basically harmless but, as it is often the case with cryptographic constructions, when blended together lightheartedly, could lead to unintended consequences.
We start with the simplest choice, the binary Merkle tree. This structure provides an immediate answer to the questions regarding the shape of the tree and the pairing of the elements. We refer to the standard construction used for certificate transparency logs, as specified in RFC-6962: the input is an ordered list L of elements which is split into two segments by an index k, where k is the largest power of two smaller than the length of L. This solves the problem of lists not having a power-of-two length. Each of the two segments is then hashed individually and the two resulting hashes are concatenated and hashed together. But this implies that we must necessarily use the first method above to further hash the sublists, or perform the same step recursively on the sublists. The latter case, which the specification mandates, introduces the necessity of discerning between the hash of the leaves and the hash of the nodes, and this is where domain separation comes into play: we must prepend a different byte when we hash one-entry lists and multi-entry ones, 0x00 and 0x01, respectively. Without domain separation, the lists \( [a, b] \) and \( [H(a)||H(b)] \) would yield the same root. In addition, the specification requires that empty lists are just hashed as the empty string. The latter requirement is not ambiguous because the inputs to the algorithm are lists and never single strings.
The method is adopted by CometBFT to hash byte slices, and it's concisely implemented here. Note their emphasis on the fact that the byte slices are ordered, and that the construction is order-preserving.
A different strategy is defined in the Ethereum Consensus Specs, where Merkle trees have a first-class citizen status. In this case, the resulting tree is balanced, because the input list is augmented with the insertion of empty leaves so that the final length is a power of two. This allows to easily index the left and right children of a node k as 2k and 2k+1, respectively. Each node is then the hash of its children, and the pairing starts from the tail of the augmented input list. Furthermore, the tree being balanced, combined with the fact that the position of a leaf inherently determines whether it is a leaf or a node, makes domain separation between single-item lists and nodes unnecessary. This construction is the backbone of Ethereum's serialization scheme named SSZ, which is used to serialize complex objects such as entire blocks, but also to store the deposit transactions in the deposit contract. Although not explicitly mentioned, both these settings require a strict order of the input sequence: for instance, a serializing block is broken into fixed-size chunks, which represent the input leaves, and it is rather obvious that different orderings not only would result in different merkle trees but also in different (and potentially invalid) blocks altogether when deserialized.
We could say that both RFC-6962-styled and Ethereum's Merkle trees satisfy the Horton principle.
Is there any relevant variation to the above "standards"? Apparently, yes. OpenZeppelin's Merkle tree library does things differently. The library refers to their trees as "standard", as designed for Ethereum smart contracts: the input is an array of leaves of length L, and the tree is an array of size 2*L - 1; the pairing starts from the tail, and each node at index i in the tree is the double hash of the concatenation of leaves at index 2*i + 1 and 2*i + 2 in the array of leaves, as seen here. The double hashing is done to disambiguate between the concatenation of internal nodes and single leaves without the need for explicit domain separation, and was introduced as a response to this issue, which affected previous versions. Another major deviation is that each pair of leaves is sorted prior to hashing, for some reason. This implies that the tree does not preserve the order of the input sequence and, as they explain in their documentation, presumes that the input sequence is pre-sorted. This requirement makes sense for the only use case this library is advertised for: that of inclusion proofs for on-chain verification. In other words, the Merkle tree has been thought of as a structure for proving membership, and only that, disqualifying it for other uses such as the hashing of ordered collections, namely lists. For this latter use, the API provides an option whereby the sorting of pairs of leaves can be disabled.
This is where the fun begins. As seen too often with cryptographic
constructions, too many moving parts and options become the ingredients
for the proverbial "hazardous material". With this tenet in mind,
one only needs searching for projects using this library inappropriately.
One such project is Omni
Network, some sort of so-called "L-1". They decided
to port the OpenZeppelin's Merkle tree library for their
own off-chain use, for some reason. Their rendition can be found here.
Expectedly, they forgot about the optionability of the leaves pair sorting, and they only always sort
the pair of leaves prior to hashing. One may think that
their only use for the library is the same as intended, i.e. for
generating and verifying inclusion proofs but it turns out it's not.
They define a data type
they call "XMsgs" used for cross-chain communication. The type has a field LogIndex which
is the index of an event log in a block, and the field is used to "enforce" the order
of the messages, as they claim in this specific PR.
Therefore it is correct to assume that these XMsgs represent an ordered
collection of messages. This is also explicitly mentioned in this bug finding competition,
which precedes the PR: We rely on xmsgs being ordered by log index (ascending) to build
the xblock merkle tree
.
The function that generates the Merkle tree of the message sequences "enforces" the ordering by checking that the LogIndex is monotonically incrementing. Afterwards, each leaf is encoded prior to hashing with this function. The encoding is not something Mr. Horton would be proud of: the function encodeMsg refers to this message marshalling descriptor which clearly neglets to take the LogIndex into account, meaning that the latter is neither encoded nor hashed in the final tree. This omission, combined with the fact that the pairs of leaves are pre-sorted (given that they left out the original OpenZeppelin's option), leads to the ordering of messages not being preserved, despite all their efforts to do so.
The net results is that differently ordered xchain.Msg sequences lead to the same Merkle tree root, as demonstrated in this PoC: one could not only swap the relative position of the leaves in a pair but also, and more importantly, one could alter and replace their original LogIndex arbitrarily. This means that, given a pair of cross-chain messages {"Ping", 1} and {"Pong", 2}, one could obtain the same Merkle root with another pair {"Pong", 1} and {"Ping", 2}, in reverse order, defying any law of cause and effect. The merkle trees are not exclusively used for membership verification but also for the verification of the block's message roots against attestations' roots, as found in the relayer code, defeating the original purpose's conservative design. It probably doesn't matter: as of their chain's current tip, their network runs on top of a BFT protocol with only four validators, so hashing sequences correctly might be the last of their concerns.