Trump now says hospitals should send patient information to a database in the Department of Health and Human Services, bypassing the Centers for Disease Control (CDC). This will doubtless be followed by complaints about it being bad data, after which they’ll cut off access to the data — data which at the CDC was publicly available.
All data is imperfect. (And yes, I’ll be sticking to the singular “data” for this post.) This does not make it right or wrong, or good or bad. Any time you hear complaints about “bad data,” especially in a political setting, keep this in mind.
For a politician, “good data” means “data that backs up my position” and “bad data” is “numbers that make us look bad.” Consider unemployment rates, which were cited widely by the Trump administration except when they made it look bad.
The pandemic numbers — cases, hospitalizations, deaths, mortality rates, and infection rates, for example — are noisy and flawed. Among other problems, they include:
- Repeat testing of some of the same people.
- Asymptomatic people who don’t get tested, but may have the virus.
- Inaccuracies in tests results.
- Variations in reporting criteria among states.
- Variations in what is reported as a case of COVID-19 vs. pneumonia or other conditions.
- Lags in reporting of cases and deaths.
- Variations in what is reported as a “probable” case.
- Questions about whether COVID-19 or other health conditions caused deaths.
This is less than ideal. It’s also typical of any data collection. Data always has problems. These problems include inaccuracies, false positives and negatives, human errors, missing data, biases, imprecise definitions, and uncertainty.
Do you throw out the data? No. It is better to have a flawed set of indicators than no information at all.
What to do about flawed data
When data has problems and you know about them, you keep working to improve it — as well as to improve the ways you use it. These are all strategies that work:
- Find and fix problems. If there’s a reporting issue, improve reporting systems.
- Create consistent methods and definitions. For example, there should be a single definition of a death from COVID-19, or a set of definitions like “caused by COVID-19,” “COVID-19 contributing factor,” and “suspected COVID-19.” The pursuit of consistency is an endless task, but it’s always possible to improve.
- Learn from others. How did other countries or states do it? Can you use their methods?
- Identify sources of bias. Is something skewing the data? Can you compensate?
- Smooth out noise. This is why you often see COVID-19 statistics with a 7- or 14-day moving average. Such a method removes daily variations due to coincidences or glitches in data collection and replaces them with a steadier and more accurate picture less subject to daily anomalies.
Even with flawed data, it’s possible to see trends. You can make comparisons between states with different strategies. You can observe lag times between case reports, hospitalizations, deaths, and recoveries. You can identify correlations between mask usage and infections, or treatments and deaths. Most importantly, you can see how an indicator, flawed thought it might be, changes over time, to see when the news is good, and when disaster is on the way.
Hide the data and you’re flying blind — and you can do none of these things. We, the public, deserve to know the data that is availble.
Epidemiologists and statisticians can address the flaws in the data and the limitations in the conclusions you can draw from it. They can tell you where the uncertainty is — and where the certainty is, as well. They can help you interpret what you are seeing, and possible ways to interpret it.
None of this is perfect. It’s all subject to debate. But there can be no debate and no interpretation if the data is not public, or is being manipulated to a specific political end.
So here’s to flawed data. We’ll work to make it better. But when a politician says “this is bad data” as an excuse to hide it twist it, then you’d better recognize that you’re being hoodwinked.