What is the 'stale identity' vulnerability in the Identity Registry?

The `register()` function calls `_safeMint` before finishing internal bookkeeping and emitting events. The receiver can hijack the transaction mid-flight via `onERC721Received`, transferring the NFT away while the outer call continues to finalize URI writes and emit events after ownership has changed. This creates a 'stale identity' that can potentially throw off indexers and monitoring systems.

How can reputation be manipulated in the Reputation Registry?

The `giveFeedback` function is permissionless, so a single address can submit repeated entries for the same agent, and `getSummary` treats each with equal weight, allowing one entity to artificially inflate or deflate reputation. Additionally, due to the Decimal Trap, a submitted value can be rescaled during normalization to become far larger than the max value an int128 can represent, and downcasting can produce a negative summary value even with all non-negative feedback.

What is the Escrow Liveness Trap in EIP-8183?

If a client funds an open-assignment job (where the provider is `address(0)`) before naming a specific provider, the money becomes trapped in a 'digital purgatory'. Funding the job moves it from Open to Funded state, but `setProvider` only works in the Open state, so the client is locked out. The only way to recover the funds is to wait for the `expiredAt` timestamp to pass.

What happens at the expiration of an escrow transaction?

After the `expiredAt` timestamp passes, both the client and the evaluator have conflicting rights: the client can call `claimRefund()` and the evaluator can call `complete()` to pay the provider. The code does not define a priority, creating a 'digital standoff' where the winner is determined by whichever transaction hits the block first or by whoever can frontrun the other transaction.

How does the Decimal Trap in the Reputation Registry cause unsafe behavior?

The `giveFeedback` function validates that a value is less than the `MAX_ABS_VALUE`, but its decimal can be anywhere from 0 to 18. When `getSummary` rescales all values to 18 decimals before averaging, the sum can become significantly larger than the max int128 value. Since the modeDecimals can be any integer between 0 and 18, the subsequent downcasting to int128 can be unsafe, potentially showing a negative summary feedback value even when all feedback is non-negative.

The Rise of the Agent Economy, Part 2: Security Deep Dive into EIP-8004, EIP-8183, Hooks, and Evaluators

In Part 1 of this series, we described how EIP-8004, EIP-8183, and x402 can combine into a practical machine economy stack: identity, transaction coordination, and payment. The harder question is what happens when that stack is exposed to adversarial behavior.

1. Tracing the Identity Flow

Our investigation began with the Identity Registry, the agent’s digital "resume". We started by tracing the register() function in the IdentityRegistryUpgradeable contract.

We noticed a familiar pattern: the contract calls _safeMint before it finishes its internal bookkeeping and event emit. Because _safeMint can invoke onERC721Received, the receiver could "hijack" the transaction mid-flight, For example, the receiver can transfer the NFT away, and let the outer call continue to finalize URI writes, metadata writes, and Registered or MetadataSet events after ownership has already changed. While not an identity theft, it creates a "stale identity", potentially throwing off indexers and monitoring systems.

Figure 1: IdentityRegistryUpgradeable.register(string, MetadataEntry[]) (L83-92)

2. Manipulating Reputation

Next, we turned our attention to the Reputation Registry. Ideally, it should prevent bad actors from scamming the system, but as we looked at the code, we realized reputation scoring can be gamed.

First, the most obvious observation is that a single address can submit repeated entries for the same agent, and getSummary treats each one with equal weight. We realized that one entity could effectively flood the feedback for a particular agent to artificially inflate or deflate its reputation.

Figure 2: ReputationRegistryUpgradeable.giveFeedback (L115-124)

But the real "crooked scale" was in the Decimal Trap. In the giveFeedback() function, there’s validation that the value is less than the MAX_ABS_VALUE which is close to the max value an int128 can represent, but its decimal can be anywhere between 0 and 18.

Figure 3: ReputationRegistryUpgradeable.getSummary (L95-106)

When someone queries the summary for the agent, the contract rescales all values to 18 decimals before averaging them, and we found a mismatch: a value that looks bounded at write time can become far larger during normalization. Specifically, the sum below can be significantly greater than the max value an int128 can represent, even when fb.value is bounded by it.

Figure 4: ReputationRegistryUpgradeable.getSummary (L223-228)

Furthermore, the decimal being used is based on the most frequently used decimal of all the feedback. Given the permissionless nature of giveFeedback, the modeDecimals can be any integer between 0 and 18.

Figure 5: ReputationRegistryUpgradeable.getSummary (L236-244)

Lastly, the summaryValue is derived by scaling sum by the most frequently used decimal and dividing it by the number of feedback received, and then downcasting it to int128. When modeDecimals is 18 and the sum is much larger than the max value an int128 can represent, the downcasting can be unsafe, potentially showing a negative summary feedback value even though all feedback are non negative.

Figure 6: ReputationRegistryUpgradeable.getSummary (L247-249)

3. The Escrow Liveness Trap

Turning to EIP-8183 (The Escrow Kernel), we looked for ways to trap capital, either intentionally or unintentionally, and found a potential "liveness" bug hidden in the ACPCore lifecycle.

The system allows for "open-assignment" jobs where the provider is initially set to address(0). We discovered that if a Client funds one of these jobs before naming a specific Provider, the money enters a state of digital purgatory. Because the setProvider function only works while a job is in the Open state, and the fund moves it to Funded, the Client is suddenly locked out of their own transaction. The only way for the client to recover the funds would be to wait for the expiredAt timestamp to pass. Disallowing funding till a provider is set would prevent this type of accidental trap of funds.

Figure 7: ACPCore(L134-152, L93-101)

4. A Standoff at Expiry

One of the most high-stakes moments in any agent transaction is the expiration date. We looked at the code for complete() and claimRefund() and realized they were heading for a head-on collision.

Imagine it is one second after the expiredAt timestamp, the Client now has the right to call claimRefund() and recover their money. Simultaneously, the Evaluator (the "judge" of the deal) still has the authority to call complete() to pay the Provider. We saw that the code doesn't define a priority. It is a digital standoff where the winner is determined by whichever transaction hits the block first, or whoever is more willing and able to frontrun the other transaction.

Figure 8: ACPCore.complete (L185-192)

Figure 9: ACPCore.claimRefund (L255-268)

5. Holding Payments Hostage

The most substantial risk we found was in the Hooks—the optional logic extensions for things like KYC or reputation updates.

In both the complete() and reject() functions, the contract performs the settlement transfer and then calls the afterAction hook. Because the code does not catch hook failures, we realized a "parasitic" hook can revert in the very end, unwinding the entire payment. Settlement is only as reliable as the least reliable post-settlement extension attached to the job. The ACPCore implementation only protects the client by making claimRefund() unhookable, but the provider doesn’t enjoy the same level of protection in the complete() function.

Figure 10: ACPCore.complete (L203-209)

6. When the Evaluator Stops Looking

The ultimate trust assumption in this agent commerce framework is the Evaluator (the "logical brain" that decides who gets paid). Our investigation into the WebScrapingEvaluator relay contract revealed a shocking gap in the "brain's" vision.

We discovered that requestEvaluation() accepts an arbitrary caller-supplied deliverableUrl without verifying that it hashes to the committed deliverable already stored on-chain. We realized an attacker could front-run a legitimate request, inject a fake URL, and force the AI to judge a "ghost" artifact that the Provider never actually submitted. The "Judge" was essentially ruling on a case while looking at the wrong evidence.

Figure 11: WebScrapingEvaluator.requestEvaluation (L48-67)

7. The Architect’s Burden

Our "hunt" through the Agentic Economy ended with a realization. Standards like EIP-8004 and EIP-8183 are strong foundations, but as we build more complex layers, such as scoring systems, hooks, and AI evaluators, new risks are introduced.

In this new era, security is about closing the gap between what the code says it does and what it actually executes. As humans, our role shifts from active participants to the architects and auditors of the code that governs us. In the agentic frontier, the cracks in the foundation are where the real stories and the real dangers begin.