Public Media: Evaluating their Use of Health Data

Public Media: Evaluating their Use of Health Data

By Terri Lewis, PhD.

Terri A Lewis, PHD

Terri A. Lewis, PhD

Editor’s Note: Dr. Lewis sent this essay to the National Pain Report recently. With vast amount of data driven news stories about health care, chronic pain and, particularly, the effect of the use of opioids, it seemed appropriate to publish this today.

Clearly, many of us lack the fundamental background of understanding beyond high school or college mathematics to determine when or how or whether statistics about health data are being properly applied to the public conversation.  So for the sake of empowering all of our readership, this is a basic primer in the theory behind public health information data bases.  This deserves attention as it can be a confusing set of ideas.

Much of the public conversation around substance abuse, addiction, drug diversion, and its relationship (or lack thereof) to the treatment of persons who have chronic pain is derived from statistics that are broadly extracted from public indexes like the CDC’s death reporting system.  This system relies on death certificate reporting from local jurisdictions to develop statistics in specific categories of indicators for each state. The system is located at, .  Annually, each state reports statistics to the CDC - you can find your state reports by searching (Your state name) + (death) + (statistics).

What is Data?

Data is information (a result) that is generated as a result of an activity.  There are two types of data – quantitative data which can be described and manipulated numerically, and qualitative data, which is based upon judgments about quality or worth of quantitative data.  There are typically four levels of measurement that are used to collect information:

  • Nominal data assigns a name to classify the information that is collected (e.g., Male, Female), and is the weakest form of data
  • Ordinal is information units that are rank ordered or sequenced (1,2,3,4… or A,B,C,D…)
  • Interval data uses metrics wherein the distance between attributes has the same meaning (e.g. a thermometer)
  • Ratio data relies on measures where there is always an absolute zero that is meaningful (e.g. fractions) and is it is the strongest form of data for interpretation

In health care, the most common tool for capturing health data is the electronic health record. This tool collects both qualitative data and quantitative data.  It allows the clinician to select from menus and to enter impressions using descriptive or qualitative terms.  It also allows for copy and paste (which introduces errors). There is little standardization between systems which has made interoperability and understandability of data somewhat of a barrier to interpreting data collected across users.

Statistics is the science of collection, classification, measuring, controlling, analyzing, and interpreting of numerical facts to infer and communicate uncertainty. It offers a set of tools and rules for evaluating the data drawn from health care activities and a method to determine the relative importance of results.  Statistics offers clear methods for creating understandability about the data derived from individual, population, and public health activities.

We frequently find statements that discuss the significance of a particular finding, often expressed as power. Significance is a test to determine the strength of a difference from one set of results or conditions to another after we apply an analysis.  It is important to note that a significant result does not mean that the finding itself is significant.  It means that, based on the result, the difference between two conditions is weak or strong.  There is no value assumed in whether this difference is good or bad or right or wrong. Results are always interpreted in relation to the hypothesis or question asked about two relationships.  Strongly significant results can mean something is really wrong just as easily as no demonstrated difference can mean something is really right.  This statistic is often misreported and leads to a great deal of data abuse by researchers and media.

Errors of association are also a large problem in the interpretation of relationships between two phenomena.  We hear the term correlation in relationship to an association drawn between two conditions. Correlations are weak or strong but do not mean that one condition causes another condition to occur. Correlation is not causation, even though we tend to interpret these relationships in the media through the use of ‘if-then statements (if two events occurred together then they are associated and it is likely that one caused the other).

We tend to think of data as providing the evidence basis for our decisions about the worth or value of our decisions – which is where the overused term ‘evidence based practice’ comes from.  But, Statistics can also cause us to make very large errors of conclusion if we violate the rules, don’t address errors, ask the wrong questions, or over interpret beyond the data context.  It is perfectly possible to obtain perfect results from questions that are imperfectly asked or data that is perfectly wrong.  So that has serious implications for us when it comes to informing our decisions with evidence.  It is very easy to abuse statistics when we violate the contexts from which the information is derived.  The media does this all the time, as do those who wish to perpetuate a specific agenda.

Health - Individual, Population, and Public health

Individual health data occurs in the context of the individual person, their socioeconomic circumstances, and other health determinants related to where one was conceived, born, bred; how they shaped by their environment and communities; and the influence of certain health exposures over the lifetime.  There is more to health than the absence of disease. The context—or what some call the “statistically normal environment” or the “standard circumstances”—is needed to understand how health is promoted, maintained or disrupted, and how health data are collected and interpreted.  The context theorizes health behavior as a continuum of activities rather than “health versus disease.”  This contextual continuum relies on data that is nominal (attributes, demographics, descriptive terms), ordinal (health rankings), interval (often reflected in laboratory tests), and ratio (Clinical indicators of health).

Limitations of individual health data. For the most part, individual health is measured in units of clinical integrity and comparisons of social, economic and mental health well-being of the individual to others who are similar. Humans are variable, their interactions with their environment are unique, and therefore any data collected about individual interactions or diagnoses must be carefully defined for comparability.

That’s not always the case. Because individual health data is neither collected nor reported under controlled conditions at the clinical level, it may be incomplete, incorrect, or inconsistently applied from one clinician to another.  The individual data that is reported to public agencies relies on ICD-9 and ICD-10 codes and clinical notes to support billable services and there are many hands in the work. There is wide variation in how clinicians and their supporting employees attend to the generation of this type of information.  Because of this, data collected under these conditions cannot be generalized to most types of research studies.  The use of data sets derived from government data warehouses for research purposes should be interpreted cautiously and limitations (see below) elaborated.

Population health data reflects “the health outcomes of a group of individuals, including the distribution of such outcomes within the group of study.” Kindig and Stoddard propose that the field of population health includes health outcomes, patterns of health determinants, and policies and interventions that link these two.  Within this definition, groups vary in their composition so it is important to understand that health outcomes and determinants are related to the group that is being studied or investigated, and NOT the population as a whole.  Population health data often relies on summarized individual health data. This makes it unreliable for determination of public policies.  Summary measures of population health (SMPH) represent aggregated, singular indices of the quantity and sometimes distribution of health within a given population of investigation. These measures combine data obtained from the population of study or extrapolated from “similar contemporary” populations. Here, one sees metrics such as healthy life expectancy (HALE), disability-adjusted life years (DALY); and, the prevalence of specific health conditions within the population of study (Morbidity) and death statistics (Mortality).

Public health refers to all organized measures (whether public or private) to prevent disease, promote health, and prolong life among the population as a whole. Its activities aim to provide conditions in which people can be healthy and focus on entire populations, not on individual patients or diseases. Public health incorporates measures of lifestyle, environment, human biology, and healthcare. In general, the United States lacks a mandate to assure that major determinants of health such as access to medical care, education, and income are allowed the adequate attention and resources necessary to attend to traditional and emerging public health functions. Making broad pronouncements about the public health based on population data is a very dicey proposition.

Morbidity. The word morbid is related to the rate or incidence that a given sickness or disease occurs in the population. As a concept, morbidity can be applied to an individual (e.g., someone with Ehlers-Danlos Syndrome) or to a population (e.g., the incidence of EDS in the population). There is also comorbidity, which refers to two or more illnesses affecting an individual at the same time. For example, chronic pain and depression are often co-morbid.  Chronic pain may be co-morbid with substance abuse. Cancer of the liver may be co-morbid with abuse of alcohol or OTC medications.  Morbidity rates vary depending on the disease in question and are more likely to affect one demographic than another. Morbidity rates are used to project risks associated with treatment and to make recommendations for personal health matters.  The USA relies on these datasets -

The term “relative risk” is used to encompass a variety of types of risk to give an indication of the “strength of association” based on the rate or frequency with which an event occurs in the population.  Relative risk is a simple ratio (RR), but errors of interpretation tend to occur when the terms “more” or “less” are used.  Risk is reflected in ratio data and to express how many times more probable a projected outcome is like to occur in an exposed group.

When interpreting a risk ratio, one will always be correct to say, “Those who had (name the exposure) had RR ‘times the risk‘ compared to those who (did not have the exposure).” Or “The risk of (overdose due to opiates) among those who (are regular users) was RR ‘times as high as’ the risk of (death by car accident) among those who did not (use opiates regularly).”

In a recent study about the safety of Burprenorphine versus Methadone, the authors (Marteau, McDonald, and Patel, 2015) attempted to determine the relative risk of overdose per 1000 prescriptions issued for methadone as opposed to buprenorphine.  They determined, “Our analysis of the relative safety of buprenorphine and methadone for opioid substitution treatment reveals that buprenorphine is six times safer than methadone with regard to overdose risk among the general population.”

Mortality.  The “crude death rate” is calculated as the total number of deaths in a year, per 1,000 individuals compared to the number of people being born.  The mortality rate varies tremendously by geographic location, wealth, incidence of illness (morbidity), age, etc. Various types of mortality rates are calculated to project a more accurate picture about global health and well-being.

The Limitations of Health Data Sourced from Government Databases

As utilized in the conversation at large, there are a number of threats to the validity of individual health data.  Chief among these are missing data, wrong data, incomplete data, and incorrectly reported data.  These limitations are frequent and affect interpretation of population and public health datasets used for research that compares individuals to the population at large-

  • The public databases have large amounts of missing data because quite frankly, not everyone interacts with health care or interacts in the same way. Some rarely do, others are frequent fliers.
  • Missing data (remember that the population from 19 states is missing from public health datasets for Medicaid as we speak) makes it difficult to project morbidity or mortality incidence and prevalence rates into the population at large.
  • The existing datasets are biased by the attributes of their users and the conditions under which data was collected, so data cannot be generalized to others who do not share those characteristics or conditions.
  • The data may be misclassified in the ICD-9,10 classification system or erroneous at the time it was collected and coded.
  • A diagnosis must be provided for reimbursement. In the absence of a clear diagnosis, the physician may code acute symptoms into the electronic health record which are not necessarily diseases or comorbidities.
  • Diagnosis information may not be comprehensive enough in some cases to allow detailed analysis or to identify needs for treatment.
  • The data do contain information on chronic diseases; however, knowing that someone has a chronic disease does not reveal how long they have had the condition (incidence vs. prevalence) or the severity of their morbid condition and may provide no direction for prescribing drugs that have multiple indications.
  • When providers know in advance payment for services will be denied, their bills may be inconsistently submitted and, therefore, inconsistently recorded in the files.
  • Different care settings use different coding systems for procedures treated in inpatient and outpatient settings. For example, inpatient care is coded using ICD-9 or ICD-10 procedure codes while physician/supplier and durable medical equipment data are coded using CPT and HCPCS Furthermore, hospital outpatient care is coded as a mix of CPT and revenue center (hospital billing center) codes.
  • Covered services for which claims are not submitted are not included in the data. Where the information impacts payment, the quality of that information is likely to be better
  • Different types of care may be subject to different payment rules which means that comorbidity and severity of illness information may be inconsistently recorded when they are subject to varying payment rules. Some components of treatments may not be included in bills (and therefore in the claims data) where reimbursement rates are very low, even if the treatment is provided.
  • Mortality data generally omits morbidity or underlying disease data and reflects the presumed (and often unverified) cause of death where autopsies are not performed. Physiological measurements such as blood pressure, pulse, and cardiac ejection fraction are absent from the utilization files, and this missing data may impair effective cause of death certification.
  • The incidence of certain disease statuses may remain under recognized and under reported due to limitations of diagnostic processes or physician experience.
  • Prevalence of a morbid condition may remain under detected because individuals do not have the financial resources to interact with health care providers to seek treatment.

All of these limitations influence how the public conversation is managed around the use and abuse of prescribed medications, suicides, and accidental poisonings.  In point of fact, we don’t really know as much as we claim to know because the data we have to work with is so inconsistent and was not collected under research conditions. This weakens the public conversation.  Further, we should be far more cautious than we are about using our current datasets to construct public policy.  But anyone can spin a tale and ripping headlines from the news that are removed from their data contexts seems to go with election year activities.  Unless you are aware of how these things work it is easy to be misled by statements about ‘significance,’ correlations, or magnified numbers from one condition to another.

But now you are little more equipped to respond to the parts of the public data conversation that bother you and it might even embolden you to learn more or participate with more confidence. You might even find your own experience with medical data reflected in the limitations identified here.

Finally, the enclosed links offer you, the consumer, a patient friendly strategy for requesting that your own health records be corrected to properly reflect your health status and needs should you find wrong data, incomplete data, or missing data.

Below are some useful additional readings for those of you a little more interested in taking on this topic.

Institute of Medicine. (2002). The Future of the Public’s Health in the 21st Century. Washington, DC, The National Academies Press.

Kindig, D and Stoddard, G (2003). What Is Population Health? Am J Public Health, 93(3); Mar PMC1447747. Retrieved from

Marteau, McDonald, and Patel (2015). The relative risk of fatal poisoning by methadone or buprenorphine within the wider population of England and Wales, BMJ Open

Retrieved from

RSDAC (n.d). Strengths and Limitations of CMS Administrative Data in Research, Retrieved from

Subscribe to our blog via email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Authored by: Terri A Lewis, PhD.

newest oldest
Notify of
David True

Good article

Carol Reinhardt

Well-written, clear, factual…I only wish the sensationally-driven media would read and heed this….

Terri Lewis PhD

Thank you Dr. Hennessey for suggesting that editing for clarification might be useful.

The Research Data Assistance Center (RESDAC) has extensively analyzed CMS claims data and in this process have identified a number of limitations that impair the use of claims data for research purposes. Their observations are published online regarding the manner in which coding systems are used across settings and the potential for errors that this creates. They note that “different care settings use different coding systems for procedures treated in inpatient and outpatient settings.”

For example, inpatient care is coded using ICD-9 (or for those transitioning to ICD-10, a crosswalk lookup is located at procedure codes while physician/supplier and durable medical equipment data are coded using CPT and HCPCS.
Hospital outpatient care is coded as a mix of CPT and revenue center (hospital billing center) codes. Currently, there are several less-than-perfect crosswalk between ICD-9 codes and CPT codes. Online crosswalk links are located at -

Terri Lewis PhD

I would like to hear from those of you who can be silent no longer, who have skin in this game.
We have work to do. If we are not personally affected today, we will be tomorrow. Because…life happens.



Steve: “Cui Bono”? 1. Big-name academic CME engines that were churning out “Opiates BAD!” CME programs at pricey resort venues the very week the CDC “Guidelines” were published. 2. Big Pharma: look for a bolus of high cost “new” opiate alternatives and delivery systems (e.g. implants) that are so outrageously priced as to be largely beyond the reach of all but the wealthiest self-pay patients. Most are just re-badged versions of generic medications that have existed for decades. (This, from the industry that brought us OxyContin. Res ipsa loquitor.) Timely coincidence? I think not. 3. “Churn & Earn” chemical dependency treatment centers, again, often with big name institutional branding & slick, aggressive sales-proselytizing activities, that give but lip service to “evidence-based medicine” & “medication-assisted treatment”. And once in their clutches, patients are subjected to “therapies” that are in reality, neither. Once discharged (often w payor patience &/or resources exhausted), the patient is left w subacute withdrawal, craving, and reduced physiological tolerance. Far too often, patients with legitimate chronic pain are left in agony from precipitated withdrawal, hyperalgesia, and their underlying chronic pain condition-and with naught but an NSAID for relief. And when (as is all too often the case) they relapse, they are put back through the same failed cycle of “treatment”-IF they survive the relapse. This, in a professional culture that defines insanity as, “…doing the same thing repeatedly, and expecting a different outcome…”! EBM (albeit imperfect, underfunded, and based on all-too-short study duration) suggests that combination of early (often physician in-office based brief) intervention, extended psychosocial support (including structured, secure, supported housing), relapse prevention training, and extended, supervised and supported use of appropriate medications individualized to the patient’s substance use disorder AND med-psych coexisting conditions, offers the best chances for sustained recovery (indeed, survival). And it is highly cost-effective. Yet how often do our patients actually receive that? And if available, how often is it actually funded? The currently-fomented “all meds=bad drugs” mantra makes the untenable status quo saleable to a shamed, miserable and vulnerable public. 4. Health insurers, facing ever-diminishing returns from the increasingly onerous machinations of their inadequately-overseen and monopolistic delegated pharmacy benefits management (PBM) vendors (often in league with, if not owned outright by, big pharma), seeking to lower Rx PMPM’s by shaming and hassling pain patients and legitimate opiate-prescribing physicians. Similar attacks on cancer patients because of the staggeringly-manipulated anti-cancer drug prices and often wastefully-inappropriate and futile use of these medications is largely under-the-radar. Cancer patients, especially young and mid-life ones, generate sympathy and are often highly visible. Pain patients are often invisible and largely unsympathetic, especially in our current climate of stigmatization, vilification, and over-simplification. So declare a war on opiates, pain patients, and illicit addicts. Maybe the public won’t notice unconscionable price-gouging in US pharmaceuticals and the gross failures of current “managed” care industry and its fellow travelers (e.g. NCQA), which have arguably failed in their professed mission to control cost and improve quality-often, egregiously-so. And another election cycle will pass without… Read more »

Cathy M

Thanks for the clear explanations. I haven’t cracked a statistics book since 1975, but I’ve learned how to “read between the columns” in these statistical studies. You are right that the headlines are often wildly incorrect about what the data asserts, and that the data can be wildly incorrect to start with (and they don’t always mention this - though, it’s to CDC’s credit that they admit the data on which they base their opioid guidelines is a “3”, meaning “poor”).

And the checkbox/ numerical scale/ ICD method of “recording” a medical session is ludicrous, IMO - and I was against it from the beginning, knowing that the “shorthand” would be taken as “truth” once the bean counters got to it! Just one example: my doctor had added a diagnosis of “Adjustment Disorder” to my chart, and as a retired therapist, I knew that was wrong - so I read her the criteria and pointed out it didn’t fit. She replied that she’d only wanted to indicate that I was ‘very stressed’ (she didn’t seem that bothered at the mistake) - I replied that any other doc reading that in a few years wouldn’t take that interpretation! She removed it, but had it stayed, it would be one more “opioid user” with a “mental health diagnosis”! She also refuses to identify the illnesses I’m being treated for - (ex: she lists “bladder pain” though she’s giving me Elmiron for interstitial cystitis). And when they upgraded their computer system, my medications got wiped out of the record! My chart alone is a very good example of why individual medical session data shouldn’t be aggregated!!

Patrick T. Hennessey, MD, MPH, FACP

(Dr. Lewis, please re-edit your paragraph, “Different care settings use … coded as a mix of CPT and revenue center (hospital billing center) code.” I do not believe it accurately reflects actual practice in use of ICD-10 (diseases/conditions associated with) vs. CPT (Procedures and services) & HCPCS (DME & medical supplies) codes.

Sorry to seem nitpicky. But if we are going to take on the purveyors of junk science, sensationalism, politicized public policy wonkery, thinly-masked emotional agendas and lies-by-statistics, we have to have our i’s dotted and t’s crossed.)

Otherwise, a cogent & timely article.


Thank you for posting this. It is quite informative, educational and yes dry but necessary for greater understanding. Not sure how many will read and understand and use it for their benefit. Also as a suggestion, you may consider an expert post on what to look for when reading medical journals , studies and the like. It can be pretty simple if you know what to look for. In addition reading general media quotes and summaries of studies does never supply the ultimate representation of the study and neither do abstracts as they also convey the authors slants on the studies. Always go to the original source of the study you are researching and avoid summaries. Perhaps this has already been done so please direct me if this has been posted previously . I’m new at this site


Thank you for this article. Sometimes the facts and statistics are so obviously manipulated to serve an end that even a child can see it. Unfortunately there are professional obfuscators and smoke-blowers around every corner of this issue. As they say, “There are lies, damned lies, and statistics.”


Whenever we have a “War” on something, someone somewhere is behind it making money. My guess is that this is not different, especially in “Wars” always hurting the innocent, in this case, chronic pain sufferers. Cui Bono? Maybe money to drug rehab programs? Who else?

Michael G Langley, MD

Agreed Jeremy. I find it interesting that the entire success of chronic pain management is ignored. If I had a mortality rate of less than 1% in fields like oncology or cardiology, I would be lauded. But, a >98% success rate in pain management is ignored. Too much emphasis is being placed on that small portion of the people that abuse, intentionally, the life saving medications, used by us chronic pain patients! If the true addiction rate is around 500,000 (.16%?, including street drugs) that is a pretty small part of the population as a whole, and only a slightly larger portion of those millions in our country that use opiates for controlling their chronic pain. All the pain patients want is enough pain control to keep them from thinking their quality of life is better than death. (suicide)

Jeremy Goodwin, MS, MD

All I wish to satisfy THANK YOU!

I am frustrated with for a number of reasons and even more so with my colleagues for failing to stem the politically driven and disgracefully unscientific manner in which the currently fabricated ‘opioid death crisis’ is based.

This was very, very useful. Statistics is not a glamorous subject and it is hard to make it interesting to the majority of people. But you did an excellent job of pulling out the basics that should help many read and learn with greater understanding of the material being scrutinized.

Thank you.