Cloudflare’s content delivery network (CDN) is frequently used for websites and web apps to optimize load times and reduce traffic to the origin server. Cloudflare provides a reverse proxy service, which acts as an intermediary service between the host server and the visitors, thereby hiding the origin server. This makes it hard for anyone on the outside to locate the IP address of the origin server and identify the hosting company behind the domain. While this offers protection against cyberattacks such as a denial-of-service attack, it can be a double-edged sword as the gained anonymity can also be used for malicious purposes.
Using our historical data, we’ve developed a method that can find the origin servers behind Cloudflare-protected hostnames. By making a HTTP request for the website at the origin server, we can verify that the website is still hosted at this location.
We apply our method to 100,000 records that we’ve randomly sampled from the total of approximately five million unique domains that use Cloudflare’s CDN. We find that 30% (~ 30,000 domains) of sampled websites respond with a status code 200 from the origin server, meaning they’re still hosted there. From these websites, around 45% (~ 13,500 domains) are hosted in the US.
To demonstrate another use case of our method, we’ll now investigate a set of untrustworthy eCommerce websites that use Cloudflare. Our proprietary Trust Grade is based on features such as the presence (or lack) of an SSL certificate, contact information, products, prices and number of changes. After filtering eCommerce websites by a Trust Grade D, E or F, which means their legitimacy is doubtful, we identify nearly 29,000 unique domains that are using Cloudflare and are therefore hiding their IP address and the hosting company behind the domain.
Each domain that has ever been registered belongs to someone and, in many cases, this information is publicly available. Anyone can obtain this information through the WHOIS protocol, which is helpful when any issue related to a website arises. A large number of domains in our set of suspicious eCommerce websites are anonymous and seem to have used a domain privacy service that obscures the domain ownership records available through WHOIS.
But are these websites truly anonymous, or can we still find out where they’re hosted? After running our method on the dataset, we find that 49% (around 14,000) of Cloudflare-protected shady online stores still respond to requests on their old IP address. Looking at the distribution of hosting companies for the origin servers (Figure 1), we find that one third of these untrustworthy eCommerce websites are hosted by Shopify, followed by Google LLC (10%) and Amazon.com, Inc. (5%).
This makes sense as Shopify is one of the most popular eCommerce platforms. The combination of using Cloudflare while hosting on Shopify, however, doesn't necessarily mean that malicious practices are going on. If we use our platform to look at the complete set of websites hosted on the Shopify platform, we find that while 99% of Shopify websites use Cloudflare’s CDN service, only 1.5% of them are classified with a Trust Grade lower than C.
Next, we take a look at the countries where the servers are located. We find that around 37% of untrustworthy online stores are hosted in the US, followed by Canada with 18%. Figure 2 shows the top 10 hosting countries of the origin servers behind untrustworthy online stores making use of Cloudflare’s CDN.
Fraudulent eCommerce websites pose a serious problem in today’s digital world. The method applied in this article allows us to detect the origin server of domains protected by Cloudflare’s CDN and can be used to track down malicious usage. Our method has helped us uncover a large set of anonymous untrustworthy online stores and get insights into where and by whom they’re hosted, which shows that our method can serve as a useful tool in the fight against cybercrime.