HIP-260: Smart Contract Traceability

Author	Danno Ferrin
Discussions-To	https://github.com/hashgraph/hedera-improvement-proposal/discussions/262
Status	Replaced ⓘ
Needs Council Approval	Yes ⓘ
Review period ends ⓘ	Tue, 21 Dec 2021 07:00:00 +0000
Type	Standards Track ⓘ
Category	Service ⓘ
Created	2021-12-02
Updated	2021-12-07, 2021-12-14, 2022-08-15
Requires	206
Superseded by	513

Abstract
Motivation
Rationale
User stories
Specification
Backwards Compatibility
Security Implications
How to Teach This
Reference Implementation
Rejected Ideas
- Trace Cross-contract calls in the record stream
Open Issues
References
Copyright/license

Abstract

Proposes additional data placed in transaction records to enable full replay of transactions by mirror nodes and downstream users of Hedera.

Motivation

Enabling full replay of smart contract service transactions will improve the usability, auditability, and debuggability of smart contract service transactions. Current facilities are insufficient to inspect “internal” transactions executed across contracts, and provide insufficient error messages when correcting errors during smart contract development.

Rationale

Creating a traceable output is always the classic balancing act of time vs. memory. The service node could easily put an entire operation-by-operation trace in the record stream but the storage and data transmission costs would quickly become unmanageable.

Instead, we add the minimum needed data that allows a user to fully replay the transaction and derive the same set of results. We also need to presume that the user does not have access to the full historical state of the chain. However, we do presume some access to select immutable data such as the bytecode of the contracts that are called as part of the transaction.

This will allow for the same level of transaction inspection that popular ethereum chains have on their block explorer sites without requiring speculative calculation of traces from transient execution data.

User stories

As a user of a mirror node I want to be able to see a list of state changes, a full representation of the calls between contracts, and a step by step evaluation of all EVM operations. I want to be able to create these records disconnected from the Hedera Network with only the record streams of the transactions.

Specification

Add ContractStateChange to ContractFunctionResult

Add and populate a new repeated field stateChanges of type ContractStateChange to the ContractFunctionResult message. This will be populated whenever a transaction causes contract storage states to change. This field will include the changes across all contract storage and not just the initial contract in the transaction.

// existing protobuf
message ContractFunctionResult {
    // existing fields
    ContractID contractID = 1;
    bytes contractCallResult = 2;
    string errorMessage = 3;
    bytes bloom = 4;
    uint64 gasUsed = 5;
    repeated ContractLoginfo logInfo = 6;
    repeated ContractID createdContractIDs = 7;
    
    // new field  
    repeated ContractStateChange stateChanges = 8;
}  

// new messages
message ContractStateChange {
    ContractID contractID = 1;
    repeated StorageChange storageChanges = 2;
}

message StorageChange {
    bytes slot = 1;
    bytes valueRead = 2;
    google.protobuf.BytesValue valueWritten = 3;
}

For each slot read or written by the smart contract transaction a ContactStateChange record should be added to the transaction record. As there can be multiple contracts called in one transaction the storage is grouped by contract.

The slot, valueRead, and valueWritten values are byte strings of up to 32 bytes. Semantically they are uint256 stored in big endian notation and they will have leading zero bytes stripped. Hence, any value less than 32 bytes is semantically identical to a value left padded with zeros.

A slot that is only read and not updated MUST omit the valueWritten field.

Values that are updated without first being read must have their valueRead field set regardless. This is required as gas calculation is dependent on the original value of the field updated. A message without a valueRead field will be treated as though a value of zero had been read, as that is what will be encoded.

Canonically the fields are sorted by the contractID then by slot. Contracts addressed by account alias (when supported) will sort after accounts addressed by contractID. Canonically the slot, valueRead, and valueWritten are stripped of preceding zero bytes. Because of this zero values will not be written by the protobuf encoder. Hence, a contract that reads and writes zeros to slot zero will have a message with only an empty valueWritten field and a contract that only reads a zero value from slot zero will have an empty message.

The value in valueRead reflects the storage value prior to the execution of the smart contract. The value in valueWritten, if present, represents the final updated value of the storage slot after the completion of the smart contract call. Transient states between the start and finish of the contract are not stored in the record stream.

Populating `contractCallResult` field in Precompiled Contract Records

As an update to HIP-206 smart contracts will need to populate the contractCallResult for all transaction records generated by precompiled contracts recording key facts about the smart contract call. The semantics of each field will be as follows:

contractID will be set to the address of the precompile that generated the record.
gasUsed will be set to the gas that the precompiled contract charged for its execution.
contractCallResult will be populated with the return of the call, if any.
bloom will be populated if the precompile produces EVM Logs.
logInfo will be populated if the precompiled contract produces logs.
stateChanges will be populated if the precompiled contract changes state in any contract.
createdContractIDs will not be populated as it is not anticipated any precompiled contracts will create new contracts.

Replaying the Transaction

When replaying the transaction it is presumed the program has access to the smart contracts deployed on the chain or can quickly address them on demand. From this byte code along with precompiled system contract results a replay can be performed.

First, all the read values of the storage changes are loaded into the local world state. The entire state of the contracts involved are not needed and for large contracts is not desirable for efficient execution.

Next the EVM operations are executed against this world state. For calls that go to other contracts the bytecode is provided by the mirror node. For calls that go to precompiled system contracts the results are mocked out by the related record stream.

Finally, the execution can compare the record stream results with its own results. This includes gas used and storage value changes.

A tracer can be attached to the EVM to capture intermediate data for presentation by block explorers. Three kinds of data are most relevant. State changes can be extracted from the record stream directly without replay. Internal Transactions or “flat traces” can be calculated by storing select data prior to a CALL, DELEGATECALL, CALLCODE, or STATICCALL operation and then using that after the call returns to produce the flat trace. A full EVM operation trace can be extracted in a similar manner.

Clients are encouraged to produce standard trace outputs, such as traces the OpenEthereum Trace Module would produce or that EIP-3155 would produce.

Backwards Compatibility

The added fields can be ignored by clients not wishing to trace or replay smart contract transactions.

Prior transactions lacking these fields cannot be replayed efficiently, and will require historical state reconstruction. This reflects current practice.

Security Implications

All data stored can be recreated by anyone who already has full access to the historical records. This EIP makes the data more efficient and economical to access.

In adversarial conditions it is estimated the record stream can have 30 KB of data added per million gas spent. This does not make smart contracts the most efficient attack surface for record stream bloating.

Because we are not matching external gas pricing for our precompiled system contracts we will be able to adjust the gas price of precompiled contracts if they become an economical way to bloat the record stream size.

How to Teach This

For creating a transaction replay clients should be pointed to the reference implementation.

For end users looking for the benefits of state, flat, and operation traces user help documentation and possibly a technical blog post should be sufficient.

Reference Implementation

None yet. Expected no sooner than 0.22.0

Rejected Ideas

Trace Cross-contract calls in the record stream

The initial proposal included adding the full “flat trace” as a part of the record. Early development showed that in adversarial interactions the record stream would produce about one byte of data per unit of gas spent on the transaction. When targeting millions of gas per second this would result in megabytes of record data generated per second. This load in the record stream is not sustainable for further scaling efforts.

Open Issues

References

OpenEthereum Trace Module
EIP-3155 EVM trace specification

Copyright/license

This document is licensed under the Apache License, Version 2.0 – see LICENSE or (https://www.apache.org/licenses/LICENSE-2.0)

Citation

Please cite this document as: