The Rise of the Agent Economy, Part 2: Security Deep Dive into EIP-8004, EIP-8183, Hooks, and Evaluators

技术博客 ·教育 ·
The Rise of the Agent Economy, Part 2:  Security Deep Dive into EIP-8004, EIP-8183, Hooks, and Evaluators

In Part 1 of this series, we described how EIP-8004, EIP-8183, and x402 can combine into a practical machine economy stack: identity, transaction coordination, and payment. The harder question is what happens when that stack is exposed to adversarial behavior.

1. Tracing the Identity Flow

Our investigation began with the Identity Registry, the agent’s digital "resume". We started by tracing the register() function in the IdentityRegistryUpgradeable contract.

We noticed a familiar pattern: the contract calls _safeMint before it finishes its internal bookkeeping and event emit. Because _safeMint can invoke onERC721Received, the receiver could "hijack" the transaction mid-flight, For example, the receiver can transfer the NFT away, and let the outer call continue to finalize URI writes, metadata writes, and Registered or MetadataSet events after ownership has already changed. While not an identity theft, it creates a "stale identity", potentially throwing off indexers and monitoring systems.

Figure 1 Figure 1: IdentityRegistryUpgradeable.register(string, MetadataEntry[]) (L83-92)

2. Manipulating Reputation

Next, we turned our attention to the Reputation Registry. Ideally, it should prevent bad actors from scamming the system, but as we looked at the code, we realized reputation scoring can be gamed.

First, the most obvious observation is that a single address can submit repeated entries for the same agent, and getSummary treats each one with equal weight. We realized that one entity could effectively flood the feedback for a particular agent to artificially inflate or deflate its reputation.

Figure 2 Figure 2: ReputationRegistryUpgradeable.giveFeedback (L115-124)

But the real "crooked scale" was in the Decimal Trap. In the giveFeedback() function, there’s validation that the value is less than the MAX_ABS_VALUE which is close to the max value an int128 can represent, but its decimal can be anywhere between 0 and 18.

Figure 3 Figure 3: ReputationRegistryUpgradeable.getSummary (L95-106)

When someone queries the summary for the agent, the contract rescales all values to 18 decimals before averaging them, and we found a mismatch: a value that looks bounded at write time can become far larger during normalization. Specifically, the sum below can be significantly greater than the max value an int128 can represent, even when fb.value is bounded by it.

Figure 4 Figure 4: ReputationRegistryUpgradeable.getSummary (L223-228)

Furthermore, the decimal being used is based on the most frequently used decimal of all the feedback. Given the permissionless nature of giveFeedback, the modeDecimals can be any integer between 0 and 18.

Figure 5 Figure 5: ReputationRegistryUpgradeable.getSummary (L236-244)

Lastly, the summaryValue is derived by scaling sum by the most frequently used decimal and dividing it by the number of feedback received, and then downcasting it to int128. When modeDecimals is 18 and the sum is much larger than the max value an int128 can represent, the downcasting can be unsafe, potentially showing a negative summary feedback value even though all feedback are non negative.

Figure 6 Figure 6: ReputationRegistryUpgradeable.getSummary (L247-249)

3. The Escrow Liveness Trap

Turning to EIP-8183 (The Escrow Kernel), we looked for ways to trap capital, either intentionally or unintentionally, and found a potential "liveness" bug hidden in the ACPCore lifecycle.

The system allows for "open-assignment" jobs where the provider is initially set to address(0). We discovered that if a Client funds one of these jobs before naming a specific Provider, the money enters a state of digital purgatory. Because the setProvider function only works while a job is in the Open state, and the fund moves it to Funded, the Client is suddenly locked out of their own transaction. The only way for the client to recover the funds would be to wait for the expiredAt timestamp to pass. Disallowing funding till a provider is set would prevent this type of accidental trap of funds.

Figure 7 Figure 7: ACPCore(L134-152, L93-101)

4. A Standoff at Expiry

One of the most high-stakes moments in any agent transaction is the expiration date. We looked at the code for complete() and claimRefund() and realized they were heading for a head-on collision.

Imagine it is one second after the expiredAt timestamp, the Client now has the right to call claimRefund() and recover their money. Simultaneously, the Evaluator (the "judge" of the deal) still has the authority to call complete() to pay the Provider. We saw that the code doesn't define a priority. It is a digital standoff where the winner is determined by whichever transaction hits the block first, or whoever is more willing and able to frontrun the other transaction.

Figure 8 Figure 8: ACPCore.complete (L185-192)

Figure 9 Figure 9: ACPCore.claimRefund (L255-268)

5. Holding Payments Hostage

The most substantial risk we found was in the Hooks—the optional logic extensions for things like KYC or reputation updates.

In both the complete() and reject() functions, the contract performs the settlement transfer and then calls the afterAction hook. Because the code does not catch hook failures, we realized a "parasitic" hook can revert in the very end, unwinding the entire payment. Settlement is only as reliable as the least reliable post-settlement extension attached to the job. The ACPCore implementation only protects the client by making claimRefund() unhookable, but the provider doesn’t enjoy the same level of protection in the complete() function.

Figure 10 Figure 10: ACPCore.complete (L203-209)

6. When the Evaluator Stops Looking

The ultimate trust assumption in this agent commerce framework is the Evaluator (the "logical brain" that decides who gets paid). Our investigation into the WebScrapingEvaluator relay contract revealed a shocking gap in the "brain's" vision.

We discovered that requestEvaluation() accepts an arbitrary caller-supplied deliverableUrl without verifying that it hashes to the committed deliverable already stored on-chain. We realized an attacker could front-run a legitimate request, inject a fake URL, and force the AI to judge a "ghost" artifact that the Provider never actually submitted. The "Judge" was essentially ruling on a case while looking at the wrong evidence.

Figure 11 Figure 11: WebScrapingEvaluator.requestEvaluation (L48-67)

7. The Architect’s Burden

Our "hunt" through the Agentic Economy ended with a realization. Standards like EIP-8004 and EIP-8183 are strong foundations, but as we build more complex layers, such as scoring systems, hooks, and AI evaluators, new risks are introduced.

In this new era, security is about closing the gap between what the code says it does and what it actually executes. As humans, our role shifts from active participants to the architects and auditors of the code that governs us. In the agentic frontier, the cracks in the foundation are where the real stories and the real dangers begin.

相关博客

The Rise of the Agent Economy, Part 1: How EIP-8004, EIP-8183, and x402 Turn AI Agents into Sovereign Economic Actors
新的 · 技术博客 ·教育

The Rise of the Agent Economy, Part 1: How EIP-8004, EIP-8183, and x402 Turn AI Agents into Sovereign Economic Actors

By integrating three critical technical pillars—Identity (EIP-8004), Commerce (EIP-8183), and Payments (x402)—we can build an open machine economy where AI agents operate as sovereign economic actors, secured by trustless primitives rather than centralized gatekeepers.

CertiK Expands AI-Native Security with Agent Integrations and AI Auditor

CertiK Expands AI-Native Security with Agent Integrations and AI Auditor

AI Auditor was originally built as an internal tool for CertiK’s own auditors, but is now available to the public after more than six months of rigorous application. In evaluations against 35 real-world Web3 security incidents from 2026, AI Auditor achieved an 88.6% cumulative exact hit rate, all while being engineered specifically to deliver high detection with exceptionally low noise.

OpenClaw Security Report

OpenClaw Security Report

The rapid adoption of OpenClaw, a popular open-source autonomous AI agent framework, reflects a broader shift toward AI-driven assistants. However, the widespread integration of this framework introduces critical security risks that may lead to unauthorized actions, data exposure, and system compromise.