Exposing the Flaws in a Common Quality Metric

    • June 25, 2012

The problem. The health care field has adopted techniques from industry, such as continuous quality improvement (CQI), to improve patient care processes and outcomes. However, CQI in the health care field has come to mean different things to different audiences. Information about quality improvement needs to be defined, measured, and reported in a consistent manner, in order for improvement practitioners and researchers to use it.

An early exposure to workflow. Michael D. Howell, MD, MPH, grew up in Clear Lake, Texas, a Houston suburb near the Johnson Space Center. While in college at Rice University, he worked in the aerospace industry at Rockwell Space Operations Co., the prime contractor for the space shuttle, doing workflow mapping, and automation for purchasing and materials management. "It was fun and it was a useful way to think about the world," he recalled, "but it was not something I expected to ever use again."

After college, Howell went on to medical school at Baylor College of Medicine, and then to Boston for what he thought would be a three-year stint for his internship and residency at Beth Israel Deaconess Medical Center.

He remembers the first time he went to put in a central line in a patient: "I had to go to 13 places to gather up things—and it was a fairly urgent clinical situation. I realized that this was exactly the same problem that Rockwell had with buying things. The process had evolved. It wasn't designed."

Howell stayed at Beth Israel Deaconess as chief resident and as a fellow in the Harvard Combined Program in Pulmonary and Critical Care Medicine. He found himself drawn to projects that "organized the way that information and people flow around the needs of patients." As his fellowship was finishing, Beth Israel Deaconess was looking for someone to organize and improve the quality and safety in the medical center's nine adult ICUs. He moved right into that position.

Connecting with RWJF. As Howell became more involved with quality improvement, he became increasingly frustrated with the lack of "recognizable" science. He saw two components to the problem: industrial tools like statistical process control, while valid, weren't familiar or recognizable as science to physicians, many of whom had spent a decade learning clinical epidemiology and other methods. And he found that often changes in health care settings were made in the name of quality improvement without much supporting evidence.

Howell was hunting for funding so he could explore this area when he came upon the Robert Wood Johnson Foundation's (RWJF) new program, Improving the Science of Continuous Quality Improvement Program and Evaluation. (For more information, see the Program Results Report.) "I was actively looking for something that was investing in emerging markets," he says, "and that is what this program seemed to be: a meaningful investment by the Foundation in what I hoped was going to be a market that was growing—the application of recognizable science to health care operations."

Assessing the reliability of quality metrics. Howell became interested in the reliability of common quality improvement metrics when considering the problem of ventilator-associated pneumonia (VAP) in the ICU. Most of the strategies to prevent VAP focus on getting patients off the ventilator sooner. But, the standard national metric was "pneumonias per 1,000 ventilator days." What if, Howell wondered, the risk were higher on the first few days that a patient is intubated? Couldn't shortening the length of time that the patient remained on the ventilator perversely cause the VAP metric to go up, falsely indicating worse performance when in fact the patient's care had improved?

"The metrics we are using for real world improvement tracking might not be giving us the answers that they should," Howell realized. He had hoped to study VAP further, but when the necessary data wasn't available, he considered other common metrics and settled on falls per 1,000 patient days.

"Falls were attractive," Howell said. "They affect a lot people around the country, there is a lot of interest in them, and we could get data that would let us test the hypothesis."

A flawed metric. Fall risk varied greatly across patient length of stay, Howell discovered, with twice the risk on day 14 as on day 1. Yet, he said, "The metrics implicitly assume that every patient day has the same risk as every other patient day. But it's not the case. Not all patient days are equivalent in terms of risk. The implicit assumptions of the metric—basically an averaging problem—cause you to come to the wrong conclusion."

Using a mix of very detailed Beth Israel Deaconess data, combined with a larger national dataset, Howell found the metric could cause some hospitals to erroneously conclude that they had improved or worsened their fall risk by a clinically meaningful amount from one year to another, when, in fact, the risk had remained the same. Howell also found that, in a model where the fall risk was the same across hospitals, the flawed metric could result in substantial percentages of hospitals erroneously found to differ from each other. This could cause policy-makers and payers to draw the wrong conclusion: calling out that one hospital is worse or better than another, when in fact all they are seeing is a mathematical flaw in the metric itself.

"The most important result is that subtle things in quality measurement can really matter," Howell said. "And that naïve denominators [i.e., those that do not account for variation in risk by patient day] can make your metrics so flawed as to give you the wrong answer, both at a hospital level and at a policy level.

Expanding beyond metrics. Howell and his team continue to work on the issue. More than simply pointing out the flaws in the metrics, Howell hopes to offer a reliable and useful alternative. He says that they are "probably 80 percent of the way to making a public statement about the right answer," which would likely be called "length-of-stay-adjusted falls per 1,000 patient days" or "LOS-standardized falls per patient days".

Howell also has done some preliminary work on VAP rates and plans to address other metrics, such as central line-associated bloodstream infections and catheter-related urinary tract infections.

Other activities have flowed from this work. Howell is finishing up as a member of the final cohort of RWJF Physician Faculty Scholars. (His project is titled "Preventing the need for rescue care"—Grant ID 66350.) In addition, Beth Israel Deaconess has funded the INSIGHT [Integration of Standard Information Gathered using Health Care Technology] Research Core, which Howell directs. INSIGHT has 20 to 25 ongoing projects that use the hospital's operational data to study quality and patient safety. And a new research center at the hospital, the Center for Health Care Delivery Science, which Howell serves as the executive director of, will focus on the intersection of rigorous science and hospital operations.

Having the "breathing and thinking space" afforded by RWJF support has "led to a much more expansive view as a medical center that invests resources in this important area," Howell said. "We clearly would not have been able to do that without the funding and support from the Foundation. Having this little window of space was really transformational for me."

RWJF perspective. Improving the Science of Continuous Quality Improvement Program and Evaluation funded teams of researchers to address a core question within health care environments: "How will we know that change is an improvement?" Research teams tackled an array of projects aimed at improving evaluation frameworks, quality improvement measures, and data collection and methodology. RWJF authorized the program for up to $1.5 million for 48 months, from August 2007 through August 2011.

"There is a lot of talk about the way we do quality improvement but not a lot of organized initiatives about how we actually do research about quality improvement," says RWJF director Lori Melichar, PhD, MA. "I am proud of the results of these nine projects. They addressed the challenges we were experiencing by creating and testing survey instruments, developing new research and evaluation methods, and exploring the importance of context in quality improvement. We have created something of a community."