Summary: Harmony One proposed an asynchronous cross-shard CALL2 extension to the EVM, to complement its synchronous CALL of a contract on the same shard. Unfortunately, it is all too easy to introduce security vulnerabilities to existing contracts with such an extension, and very hard to make sure you don’t. Meanwhile, this model cannot bridge the essential gap between the EVM’s synchronous execution model and the necessary asynchronous nature of cross-shard communication. We instead propose an inherently more secure yet just as usable model in which contracts must explicitly send or receive asynchronous messages and cannot be directly CALLed across chains.
Cross-chain CALLs and the danger of leaky abstractions
On August 10, 2021, the Poly Network experienced the biggest cryptocurrency attack ever (so far), with over $600 million worth of cryptocurrency tokens stolen, including Ethereum, Bitcoin, and more. The attacker, unequipped to fence the tokens, later returned them in exchange for a reward and being made an advisor—after all, he did identify the vulnerability and secure the funds before a hardened criminal would have exploited the network. To read about the Poly Network hack, see for instance these two articles:
The attack shows how vulnerable cross-chain CALLs can be: the Poly Network implemented a messaging interface based on issuing low-level Ethereum contract CALLs from outside the network. But the messaging implementation is not part of Ethereum itself: it is built as a layer on top of it. And so the implementation necessarily introduces details that are not part of the intended computation model: new contracts with special powers, that implement the messaging. The attacker could get the tokens by exploiting unintended interactions between the cross-chain messages being routed and the contracts implementing the routing itself.
In Computer Science, we have a name for implementations that introduce new unintended interactions: leaky abstractions. An implementation that doesn’t leak at all is called a full abstraction. It is usually very hard or impossible to achieve full abstraction of a system as a layer on top of the system without forcing all new interactions to go through the new layer, at which point you often have to reimplement large parts of the system on top of itself. In simple enough cases, you might be able to “just” take extra steps to tightly restrict what kinds of interactions are allowed. But before you may even think of whether you have a full abstraction, first you have to identify what is the computation model being implemented, and what is the computation model being used to implement it (often the former is an extension of the latter). Only then can you start to analyze which interactions are or aren’t problematic. Sadly, it seems the authors of the Poly Network did not follow these basic principles for designing secure software.
After understanding that the Poly Network attack was at heart due to a leaky abstraction, it is natural to wonder about the security of Harmony One’s CALL2 proposal, that has a broadly similar intent, though with very different details,
CALLs across chains or shards
Harmony One is a sharded blockchain, where each shard supports smart contracts written in the EVM, the Ethereum Virtual Machine. The sharding enables Harmony One to scale way beyond the limits of a single-shard blockchain like Ethereum. The EVM allows Harmony One to reuse the wealth of existing smart contracts from the Ethereum ecosystem, most of them unmodified. But the way the EVM exists as of now does not support interaction between contracts across multiple shards.
On the EVM, users can CALL contracts, sending a message with some data, at which point the contract call may succeed or fail, and also return some data as part of the call. While they are being executed, contracts can recursively issue synchronous CALLs to other contracts, that will process a message, succeed or fail, and return data, as part of the same transaction. On the plus side, synchronous calls are a simple enough mechanism familiar to all programmers, who can keep using their usual programming model; but on the minus side, synchronous calls introduce security issues, notably including reentrancy issues (which famously caused the famous DAO bug). Also on the minus side, and importantly, synchronous CALLs are incompatible with interaction with contracts on other chains or other shards.
That is why the Poly Network introduced a protocol for CALLs to Ethereum contracts to be issued from other chains. This is also why Harmony One proposed a mechanism to asynchronously CALL contracts from other shards: CALL2.
Who is the CALLER?
When communicating between chains or shards, sending a message is part of a transaction in the chain or shard that issues the message, whereas processing the message will be part of a transaction in the chain or shard that receives the message. Some routing protocols will somehow take the message on the issuing chain or shard, and ensure it is processed once and only once on the destination chain or shard.
Now, contracts on the EVM manage authentication by looking at the identity of the caller, as provided by the protocol. But in the case of an asynchronous cross-chain message, who is the caller?
Because Poly was a layer on top of the EVM, the contract CALLs it issued had the special Poly contract as the caller, whose identity the contract being called could check for authentication purposes. This makes such CALLs much less useful than it would seem at first glance: contracts being called cannot use the EVM’s builtin authentication mechanism anymore, but must reimplement their own. Despite the EVM’s native CALL being used, the mechanism is useless to interact with regular Ethereum contracts not specifically meant to be called from another chain—it doesn’t provide access to the entire functionality of CALL. Meanwhile, the mechanism also leaks the authority of the implementing contract as the caller, another abstraction leak that made the attack possible. Oops.
Now, because Harmony One is modifying the EVM implementation itself, it can do much better than the Poly Network: it can remember the identity of the caller on the issuing shard as well as the shard number identifying this shard, and provide those details to the contract being called. In the existing CALL2 proposal, the caller could for instance be made available using the usual CALLER opcode, and the shard of the caller could be in a new CALLER_SHARDID opcode. The entire functionality of CALL is thus available—well, not the entire functionality, since there is no mechanism for the asynchronous caller to receive the synchronous results from the CALL.
Still, this is much more powerful than the Poly model… so much more powerful, actually, that it makes the contract calls into a leaky abstraction of its own—and yet it does nothing to solve the problem of the essential mismatch between the EVM’s synchronous CALL model and the asynchronous nature of cross-shard messaging.
CALL2 caller credential leak
Let’s suppose that CALL2 passes the identity of the caller to the contract being called as if the caller were on the same chain, which the called contract can check with the usual CALLER opcode. Then, a contract could for instance control an ERC20 token on another shard by sending asynchronous calls to its API. But then those tokens would be vulnerable to an attack using the same caller address on another shard!
Indeed, let’s imagine a Harmony One clone of Uniswap, where some swap contracts control ERC20 tokens and with which they automatically process swap orders. The swap contracts, as well as the ERC20 contracts, would all have been audited on Ethereum, and would be as trusted as such contracts go. And yet, they would become vulnerable the moment that CALL2 is enabled, if implemented as in the paragraph above: the creator of the swap contract could create a contract on another shard, that would have the same address, but completely different code and/or state. That other contract would then send a transfer order to steal all the ERC20 tokens in the original contract. Because the ERC20 contract was designed and compiled for a single-shard chain, it will check the CALLER but not the CALLER_SHARDID, and will authorize the transaction. Money stolen. Enabling CALL2 turned existing secure contracts into insecure ones.
Fixing CALLER for asynchronous calls
To address this particular vulnerability, the CALLER opcode could be made to instead return some hash of the actual asynchronous caller’s address and his shardId, possibly with a prefix to prevent clashes with future kinds of EVM addresses of the same length. The CALLER value would then uniquely identify the actual caller across shards (under the regular cryptocurrency assumption that no hash collision will be found). The particular attack above would be averted.
Furthermore, if the specific values of the actual asynchronous caller address and his shardid are required to e.g. send an asynchronous reply with useful return data via CALL2, then they could be made accessible with separate opcodes ASYNC_CALLER and ASYNC_CALLER_SHARDID. These opcodes would return 0 for synchronous calls on the same shard.
But are we sure the above vulnerability was the only one?
Once we add a new way to CALL existing contracts, one of the assumptions that could be made by those contracts when auditing their security is suddenly found invalid. In logical terms, we say that the language extension is not conservative. Therefore, all contracts must be audited again to see if they relied on that assumption. Some, possibly a lot, might be found wanting in the new context. This basically cancels the main advantage of having EVM compatibility.
Maybe we can avoid the need of re-auditing all contracts, by somehow proving that the new way to CALL existing contracts somehow does not introduce any dangerous new interaction for any contract whatsoever? But to establish that would require a deep analysis of this CALL extension. Just fixing one obvious security issue isn’t enough to be confident that there are no other, subtler, issues.
And subtler issues there are! ORIGIN should be treated like CALLER and be split into three opcodes too. And what if a program uses EXTCODESIZE, EXTCODECOPY or EXTCODEHASH as the basis for access control? Then it could possibly be fooled too. Who knows what obscure EVM feature, present or future, a contract might rely upon that isn’t valid anymore in this new context? And if the solution were to add opcodes, then soon enough EVM incompatibility would follow as opcodes keep getting added both by Harmony One and upstream Ethereum.
A sane alternative: explicit messaging
Once we realize that adding new ways to CALL contracts is a security nightmare, and that most existing EVM APIs are not meant to be called asynchronously, we can design simpler, better, and more suitably secure programming interfaces: explicitly sending and receiving asynchronous messages.
The extension is obviously conservative: a contract that doesn’t explicitly deal with cross-chain messaging doesn’t have to. If the contract was audited to work with arbitrary other contracts on the same chain, it still will work after the extension. Any trust toward arbitrary contracts can continue when these extensions are added: if the other contract doesn’t use these extensions, it can be trusted like before. And the other contract uses remote messaging, it can be trusted no less than if it had trusted local messaging. There are no possible remote exploits of existing contracts that wouldn’t also be local exploits on a single shard.
Messaging could use a new opcode, but it might be simpler and more compatible with future upstream EVM extensions instead to call a special “precompiled contract” that implements the messaging extension. The format for messages could be essentially the same as in the previous CALL2 proposal. However, the semantics would be different: each contract would have a message queue, in the style of an Erlang process. And each contract would have to explicitly watch, filter, and dequeue messages in its queue to partake in the protocol extension.
Depending on the contract, its operations typically query the contents of the queue, and then to either dequeue a specific message, or dequeue the “next” message in the queue. The contract would be responsible for processing the message. And if the “next” message is used, the contract would also be responsible for making sure the message can be dequeued even if this processing fails—but only if enough GAS was provided, otherwise there would be an attack whereby one could cause messages to be dropped by calling the processor with insufficient GAS. All in all, writing a safe message processor is hard, and it is even harder if you want to ensure every message will be processed without the assistance of someone specifically interested in making the message processing succeed. A library should be provided to help users do it right.
Synchronous Calls from Asynchronous Messages
Contracts can still explicitly issue synchronous CALLs to contracts on the same shard, based on the asynchronous messages they receive; and can then in turn process the results from the call and post a reply message. Contracts that translate between messages and calls can be thin proxies, or they can be elaborate in their processing. Either way, anything you could hope to do with a direct asynchronous CALL interface, you can still do, just in a safer way: the indirection of going through one proxy contract per user per shard ensures that security is preserved. To save on GAS costs, these proxy contracts may or may not share code with other such contracts from other users on the same shard.
Note that even with the “direct” call interface, you needed a proxy if you wanted to handle the return data and send a response to the sender. Thus, having to use a proxy doesn’t restrict the functionality of the system at all, it only makes it more secure.
Gas Price and Spam Routing Issues
In addition to abstraction leaks, there is an important issue with inter-shard messaging, whether a variant of CALL2 or of our explicit messaging: the implications for GAS and spam prevention.
If gas prices vary a lot, what if users use shards where the price is low to send lots of messages to a specific shard where the price is high, so as to overload it. Who’s going to pay for the destination shard to actually process all those messages from the all spamming shards?
Any stable solution requires some interested user to pay some GAS on the destination chain for messages to be properly transformed and timely processed. And then, special care must be taken so that a user interested in making the message processing fail cannot do it by somehow calling into the processing infrastructure with insufficient GAS.
A complete solution would therefore maintain on each shard a queue of outgoing messages as well as a queue of incoming messages for each contract. The miners of the recipient shard would not have to accept any specific message from any other shard—indeed receiving those messages would have a gas cost that contributes to the block limit, thus preventing spam. In practice, inter-shard communication would partake in the same auction for space as other transactions. Furthermore, to prevent “insufficient GAS” attacks by message senders, every outgoing message would be assigned a unique address that doesn’t clash with previous addresses, and it would be possible to send tokens to that address to increase the fee associated with the message after the fact, to increase the chances the message will reach its destination in a timely manner.
More advanced contracts may do load-balancing between shards to avoid the bottleneck and high fees associated with running on a single shard: they would use CREATE2 or some user-agnostic CREATE3 variant to ensure that identical replicas can be created on every shard present and future with the exact same code and address. Because messages are queued until processed rather than CALLed immediately, it also becomes possible to send messages to shards on which the contract wasn’t created yet, and it will still be processed correctly.
A Secure solution
MuKn is preparing a proposal, that we will send to Harmony One, to develop a secure solution for cross-shard contract interactions by creating an extension to the EVM. For security reasons, our solution will reject the remote call model and instead adopt the message queue model. A Solidity library will be provided to facilitate secure use of this functionality.