Decoding the EVM: How We Index Traces And Logs Without ABIs

Decoding the EVM

By Joel Nordell, Lead Accounting Engineer

At Unit 410, we’re obsessed with building a complete and accurate picture of on-chain accounting activity for our clients. This mission requires us to translate the raw, cryptic data from the Ethereum Virtual Machine (EVM) into a clear accounting narrative. To provide meaningful support, we need to understand the function calls (traces) and events (logs) emitted by smart contracts. The traditional approach to this problem relies on Application Binary Interfaces (ABIs), which are JSON files that describe a contract’s functions and events.

However, in my experience, relying on ABIs introduces a significant bottleneck. They aren’t always available, especially for new or closed-source contracts. Managing a large number of them is a complex task in itself, and an outdated ABI can be worse than no ABI at all. We needed a better way—a system that could decode EVM data universally and at scale.

This post explores how we built a robust and scalable decoding pipeline that does not depend on ABIs. I’ll walk through the basics of EVM data, the specific challenges that led us to this approach, and the innovative solutions we implemented to overcome them.

EVM Signatures 101: The Hashing Problem

To understand our approach, one first needs to understand how the EVM identifies functions and events. Every function and event in a smart contract has a “signature.” For a function, this is its name and argument types (e.g., transfer(address,uint256)). For an event, it’s the same pattern (e.g., Transfer(address,address,uint256)).

This signature text is then hashed using Keccak-256. It’s critical to understand that this is a one-way encoding; it is computationally impossible to reverse the hash to get the original signature. This is the foundational challenge of ABI-less decoding. Given the correct signature, we can decode the binary data, but how do we find the signature (without using an ABI) when all we have is a hash?

For Events: The full 32-byte hash is used as the first “topic” (topic0) of a log. Because the full hash is used, it is effectively unique for all practical purposes. A collision is theoretically possible, but so astronomically unlikely that we can reasonably treat it as a unique identifier.

For Functions: The process is different, and this is where the main problem arises. The Keccak-256 hash of the function signature is truncated to just the first 4 bytes. This 4-byte value is called the “function selector.” Because of this severe truncation, it is not a rare or theoretical problem for different function signatures to have the same selector—it’s a common, practical issue known as a “hash collision.”

This ambiguity is more than a data-parsing nuisance; it’s a security vulnerability. Malicious actors can deliberately craft contracts with function signatures that collide with common, legitimate ones (like transfer(address,uint256)). Their goal is to trick users and wallets into misinterpreting a malicious transaction as a benign one, a technique often used in phishing and other scams. A robust decoding system must be able to navigate this ambiguity to address these risks by providing a clear and accurate picture of what a transaction is truly doing.

Our Solution: A Centralized Signature Database

Since reversing the hashes is impossible, our approach is to build a massive, forward-looking lookup table. The heart of our system is a comprehensive database that maps these hashes back to their human-readable signatures. We work with an open-source database that has crowd-sourced millions of function and event signatures from the community. In addition to consulting this database to fill in the gaps whenever we can’t find a signature, we also regularly contribute back by submitting new signatures we discover through our own indexing efforts.

This database acts as our dictionary. When we encounter a function selector or an event topic hash, we query the database to find all known matching signatures. But what happens when multiple signatures produce the same hash? Let’s explore how we handle that in our decoding pipeline.

The Decoding Pipeline in Action

With our signature database in place, we can now process blocks in real-time. As our indexer ingests each new trace and log, it sends them through the following decoding pipeline.

A. Decoding Logs (Events): The Straightforward Case

For logs, the process is refreshingly straightforward, thanks to the full 32-byte topic hash. We extract the topic0 from the log, query our database, and get a single, unambiguous event signature.

Just as with traces, this signature is the crucial schema we need. It tells us the data types for both the indexed arguments (which are stored in the log’s other topics) and the non-indexed arguments (which are packed into the log’s data field). Using the signature as our guide, we can parse all the event’s parameters into a structured, human-readable format.

-- Simplified pseudo-code for our log decoding logic

FUNCTION DecodeLog(log):
  -- For logs, the topic hash is unique, so we expect only one signature
  IF log.signature is null THEN:
    RETURN error "No signature found for this log"
  END IF

  -- Decode the log's data and topics using the signature as a schema
  decoded_arguments, error = DecodeEvent(log.data, log.topics, log.signature)

  IF error is not null THEN:
    RETURN error "Failed to decode log"
  END IF

  RETURN decoded_arguments
END FUNCTION

B. Decoding Traces (Function Calls): The Real Challenge

When our indexer processes a trace (a function call), the first step is to extract the 4-byte selector from the calldata.

Because of the truncation-caused collisions, this selector is not a unique identifier. A query to our database for that selector might return multiple candidate signatures. So, why do we need the signature text?

This is the most important part of the process: the signature string, like transfer(address,uint256), is the essential “schema” or “guide” we need to parse the rest of the calldata. Without this guide, the calldata is just an opaque, meaningless blob of bytes. The signature tells us the exact data types (address, uint256) and the order of the arguments packed into that binary data.

Our resolution process is a multi-stage filter designed to find the single best match. We treat each candidate signature as a potential schema and attempt to parse the calldata.

First, we apply a “strict mode” check. A signature might successfully decode its required arguments but leave a trail of unused, extraneous bytes at the end of the calldata. This is a “sloppy” match. To prevent this, our strict mode performs a clever test: after a successful decode, it tries to decode the calldata again with the last few bytes truncated. If this second, shorter version also decodes successfully, it proves the original data had extra bytes. We discard these sloppy matches, as they don’t perfectly account for the entire calldata.

After filtering for only the most precise matches with strict mode, what if we still have multiple candidates? This is where our second layer of defense comes in: prioritization. We don’t treat all signatures equally. If multiple high-priority signatures still match, our system flags the transaction for human review to help ensure the highest level of accuracy.

This two-step process of strict validation followed by prioritization allows us to resolve collisions with a reasonably high degree of accuracy.

-- Simplified pseudo-code for DecodeCalldata, showing the strict mode check

FUNCTION DecodeCalldata(input_data, signature, strict_mode):
  -- First, try to decode the arguments from the input data
  decoded_arguments, error = UnpackArguments(input_data, signature)

  IF error is not null THEN:
    -- The data doesn't match the signature at all
    RETURN error
  END IF

  -- If we're in strict mode, we perform an extra check
  IF strict_mode is true THEN:
    -- Remove the last few bytes from the input data
    truncated_input = remove_last_bytes(input_data)

    -- Try to decode AGAIN with the truncated data.
    -- We call it non-strictly this time, as we only care if it succeeds at all.
    _, error = DecodeCalldata(truncated_input, signature, false)

    IF error is null THEN:
      -- If the SHORTER version also decodes, it means the original
      -- input had extra, unused data. This is a "sloppy" match.
      RETURN error "Strict mode failure: input data has extraneous bytes"
    END IF
  END IF

  -- If we get here, the decode was successful and passed the strict check
  RETURN decoded_arguments, null
END FUNCTION

-- Simplified pseudo-code for our trace decoding logic

FUNCTION DecodeTrace(trace):
  -- If we have multiple candidates, we must use strict mode
  use_strict_mode = (length of trace.signatures > 1)

  successful_decodes = []

  FOR EACH signature IN trace.signatures:
    -- Try to decode the trace's input data using the current signature
    decoded_arguments, error = DecodeCalldata(trace.input, signature, use_strict_mode)

    IF error is null THEN:
      -- If successful, add the result to our list of candidates
      result = new DecodeResult(signature, signature.priority, decoded_arguments)
      add result to successful_decodes
    END IF
  END FOR

  IF successful_decodes is empty THEN:
    RETURN error "Could not decode trace with any signature"
  END IF

  -- From our list of successful candidates, find the one with the highest priority
  best_match = FindHighestPriority(successful_decodes)

  RETURN best_match.arguments
END FUNCTION

Innovative Solutions for Complex Cases: Safe Wallet & Proxies

The power of this ABI-less approach is most apparent when dealing with the complexities of modern smart contracts, particularly proxy patterns. A great example is the Safe Wallet, a popular multi-signature wallet.

When a user interacts with a Safe, the top-level trace isn’t the true function call (e.g., a transfer), but rather a call to the Safe’s own execTransaction function. To a naive indexer, the real transaction is hidden inside the calldata of this outer function.

Our pipeline handles this recursively. When we successfully decode a call as execTransaction, we don’t stop there. We extract the internal transaction data from its arguments and feed that data right back into the start of the decoding pipeline. This allows us to unwrap the proxy call and decode the true underlying function, providing a high degree of confidence that a complete and accurate picture of the on-chain activity is captured.

Why Go ABI-less? The Advantages.

Building this system was a significant effort, but it provides powerful advantages:

Universality: We can decode data from virtually any smart contract, whether it’s newly deployed, closed-source, or a complex proxy. As long as we can determine the correct signature, we don’t need to wait for an ABI.
Scalability: The signature database is highly optimized, which promotes the ability to perform this decoding in real-time at a massive scale.
Resilience: The system is self-contained. We are not dependent on external ABI sources, which can be a single point of failure.

Conclusion

Transforming opaque blockchain data into meaningful, structured information is a fascinating challenge. By moving away from a dependency on ABIs and instead tackling the decoding problem head-on with a comprehensive signature database, we’ve built a system that is more scalable, resilient, and complete than traditional approaches. It’s a journey from an opaque blob of bytes to meaningful, structured information, and it’s this kind of deep data work that allows us to provide industry-leading on-chain accounting data.