Public Sector

Uncovering networks of websites spreading misinformation about COVID-19

Andriana Boyrikova
  • 7 months ago
  • 5 min read

As the COVID-19 pandemic has taken over the world, so has misinformation. The pandemic has led to an overabundance of inaccurate information and conspiracy theories: from sources claiming they offer treatments and miracle cures to outlets deliberately spreading false information to sway public opinion.

Often these websites refer to each other as sources, thus forming networks. Combating misinformation has become a crucial part of dealing with the pandemic and this is where technology can step in to uncover online networks that spread false and potentially harmful information.

Identifying websites that spread misinformation

We tend to refer to misinformation that is being spread online (and offline) as fake news, yet this encompasses not only fabrications and legitimate news that have been deliberately twisted but also unresearched information and stories that contain some truth but aren't completely accurate. Inevitably, this impedes the identification of fake news and makes it ever more important to be able to read and analyze news critically.

Naturally, the most important rule is to use common sense to evaluate if a message is plausible and collect information from multiple sources before drawing any conclusions. People tend to believe information that confirms their beliefs and discount information that contradicts what they support. However, keeping emotional responses in check and searching for what well-known and trusted sources have published on that subject is a crucial first step.

Next to that, checking the rest of the information on the website in question doesn’t take a lot of time and effort but plays an important role in indicating how trustworthy it is: what other stories have been posted, is there any information on the mission, staff members or physical location of that organization or company?

When reading a piece of online content, it's also important to pay attention to the sources that have been used, for example, quotes, an interview, a report, survey data or official statistics and to check if these sources actually exist (as sometimes references to non-existent studies are made).

Uncovering networks of COVID-19 fake news

A big number of these fake news websites are spin-offs of notorious websites publishing false health-related content and conspiracy theories. What’s more, they tend to link to each other as sources. So we can argue that if we find one website spreading misinformation about COVID-19, we can find more by, for example, following links on these websites. 

We will check if that holds true by doing an online investigation using our database of incoming links (links pointing to a particular website from other websites).

Our starting point is the infamous US website The homepage of claims that the website publishes “scientific articles exposing vaccine myths and pharma foibles,” and states that it “promotes alternative health news.” However, it has published stories with misleading information and unsubstantiated claims about COVID-19. For example, an article from April 2020 headlined “Is 5G a Deadly Trigger for the Coronavirus?” suggests that 5G cell phone technology is related to the outbreaks of the new strain of coronavirus in Wuhan, Milan and Iran. So far, there is no research that establishes a link between the COVID-19 pandemic and 5G. Additionally, the website doesn’t provide any ‘About’ information, physical location or contact details. 

Homepage of

Starting our search from, we collect 10 000 websites that link to it directly or through other websites. A cluster of 865 websites that are most likely related to fake news catches our attention as worth investigating. By applying a genetic clustering algorithm, we can determine that these 865 websites form a cluster as they have a relatively large number of links among them but fewer links to other websites.

A cluster of 865 websites identified by a genetic clustering algorithm. Each red dot represents a website, and each arrow represents a website linking to another website.

We pick one out of the 865 websites at random,, and take a look at it. The website claims to publish “independent news on natural cures, food lab tests, cannabis medicine, science, robotics, drones, and more.” At the same time, the story in its header is headlined “Why Trump will win - compelling analysis no one else will report.” What’s more, an article from January 2020 headlined “Is coronavirus a manufactured bioweapon that Chinese spies stole from Canada?” claims that “these Chinese agents [...] may have infiltrated North America for the sole purpose of hijacking this deadly virus in order to unleash it at a later date.” So far, there is no evidence to support this claim. To top it all, the website doesn’t provide any information on its mission, staff members, contact details or physical location.

We can go one step further and do a little bit more digging into that cluster of 865 websites by looking into keywords. After running an analysis, we come up with a keyword frequency plot based on the 33 most frequent keywords found on the websites. Among some of them are “vaccine,” “coronavirus,” “5g,” “children,” “truth,” “covid-19,” “medic,”“science,” “billionaires,” and “freedom.” We can deduce that legitimate news or scientific articles can be writing about 5G, coronavirus or billionaires but when they are used all together, this can be a red flag signaling fake news.

A keyword frequency plot of the 33 most common keywords in the cluster. The longest bars in the graph represent the most frequent keywords. The frequencies follow Zipf's law.

It’s interesting to note that some of these keywords - for example, “corona” and “5g” - also appear on websites that are outside the cluster but are somewhat connected to it; yet the further away we get from the cluster, the less frequently they show up.

The graph below identifies the high-risk websites in the cluster: the brighter colors signal a higher risk of COVID-19 misinformation based on the keyword frequency plot.

The cluster of 865 websites with a fake-news risk identification: a few high-risk websites are scattered across the cluster, shown in a brighter color and a larger size.

We select one of the websites from the highest risk group,, and take a look at it. The homepage welcomes its visitors with the following statement “Unmask the truth: end compelled masks in Ohio,” and some of its featured headlines read “Masks ineffective” or “Most already immune.” When clicking on the section titled “The Science,” we find stories that claim that “Covid tests are not fit for purpose” and “Former Pfizer VP: 'No need for vaccines,’ ‘the pandemic is effectively over.’” What’s more, we can’t find any contact or ‘About’ information anywhere on the website.

Going back to our initial search, if we compare the cluster that we are investigating to all 10.000 websites connected to, we can clearly see that the highest risk of fake news is still most prominent in the cluster where the websites link to directly or through other websites. The further away we get from the cluster, the lower the risk gets.

The network of all 10.000 websites together with the cluster of 865 websites, marked in red. The graph shows that websites within or near the cluster have a much higher risk of fake news, while websites that are further away from it tend to have a lower risk.


In an attempt to investigate the spread of online misinformation and fake news about the COVID-19 pandemic, we conducted research centered around a US website notorious for failing to comply with basic standards of transparency and credibility.

By using our database of incoming links we could find a cluster of websites tightly connected to that publish inaccurate information or unsubstantiated claims about the COVID-19 pandemic. Our analysis indicates that websites that spread inaccurate information or conspiracy theories tend to link to each other and technology can play an important role in discovering them.