Your Web search for data will often land on statista.com. Unless you’re a member, though, the data will be missing the source — which makes it worse than useless. Statista’s policies undermine the integrity of research and data and contribute to the flood of bad statistics and dubious surveys you read in media.
Let’s take an example. While researching an article, I Googled “share of internet users who get news from social media sites.” Google provided the following featured snippet:
Like many searches for survey and other statistics, this search lands on Statista. This is no coincidence — Statista’s visibility at the top of statistics searches is, I’m certain, a result of significant effort in search optimization on the part of the company.
When you click on the link, you see this:
Hurrah! A cool bar chart. Just clip it and paste it into whatever you’re working on — or just cite a number — and you’re done, right? Or if you’re not quite that sloppy and are trying to do the right thing, maybe you put “Source: Statista” in whatever you’re working on.
But the source is not actually Statista — Statista didn’t do the survey, it is just an aggregator. Somebody else did the study. If we want to judge the credibility, we need to know who conducted the research, when, and how.
We can see from the box on the right that this data comes from a survey conducted in July of 2019 with 5,107 respondents. But who conducted it? Unless you are a paying subscriber of Statista, if you click on “Show sources information,” you hit a paywall that displays this message:
So you need to sign up to pay $468 a year to find the source. Maybe 1% of those landing on a statistic like this will do that. The rest will just settle for “Source: Statista” or omit the source altogether.
Why don’t they just show the source link outside the paywall? Because if they did, you’d click away and forget about Statista. They’d rather have a small chance to get your subscription money than do the right thing as a responsible research should.
Not providing the source link is malpractice. And Statista is not only making it easy to falsely cite them as the source, they’re actively discouraging people from finding the original source.
Finding the original source
If you go back to the original Google search, you’ll also see a link to a report by Pew Research Center, called “Americans Are Wary of the Role Social Media Sites Play in Delivering the News.” Pew is a highly respected, nonprofit, independent research company, and their research methods and analysis are excellent. And Pew makes all of its research available for free. Its report is full of valuable and detailed information on Americans’ behaviors around news and social media. If you dig into the report a bit, you find this graphic:
Clearly, this is where Statista got its data. But even in this graphic, you get a bit more context, namely how many people use each social network site, regardless of whether they’re getting news from it.
Unless you are a premium customer of Statista, you cannot click on their source link to get to this research. It’s fine for Statista to show statistics from Pew (provided they have permission, of course). But if you’re showing statistics that you didn’t create, you should always provide the original source, with a link, so readers can judge the credibility of the data. It’s also the right thing to do to give appropriate credit to the original researchers.
Statista is undermining the integrity of statistics
I have nothing but admiration for diligent researchers at Pew, universities, and reputable private research companies like Edison Research and Forrester. These companies and organizations invest significant effort and resources in generating primary data that we can use to make decisions. If you want to share research from such sources, you must cite the source and provide a link. It will add to your own credibility and provide the credit that such companies deserve for their work. (Every link you provide adds to the researchers’ reputation, as it should.)
I have no issue with Statista aggregating statistics, with appropriate permission, and charging a subscription for that. The aggregation serves a useful function.
But I have two problems with the role Statista now plays in the research ecosystem.
First, like any other organization that cites research, they should provide a link to the original source — outside the paywall. Failing to name the source or provide a link is research malpractice. Shame on them.
Second, Google should deprecate Statista links until they change their policy. Google’s snippet link to this research is a key element contributing to Statista’s role in the spread of unsourced research. Shame on them, too.
Until this changes, the responsibility likes with you as you use the Web to source research. Always look for the original source. And if you omit the citation — or just write “Source: Statista” — we’ll know you’re too lazy to do the necessary work to verify your stats are are coming from. That reflects poorly, not just on Statista, but on you.