August 2004

Grant Results

SUMMARY

From 2001 to 2003, investigators at the Tufts University School of Medicine tested physician-level measures of quality of care for patients with diabetes using a database from a previous study of ambulatory care quality for Medicare beneficiaries.

Key Findings

  • Researchers found that it is possible to develop a precise, reliable score by combining five or more measures based on actions taken and tests ordered by the physician using data from either the medical record or health insurance claims data.

Funding
The Robert Wood Johnson Foundation (RWJF) supported this project through a grant of $386,196.

 See Grant Detail & Contact Information
 Back to the Table of Contents


THE PROBLEM

Individual physicians are the basic link between patients and medical care. Yet patients have less information with which to choose a physician than they have for selecting any other feature of their health care. Some health care leaders believe that physician profiling — defining and measuring physician work efforts and linking those efforts to practical and measurable clinical outcomes — has the potential to improve patient satisfaction and increase the quality of care.

A number of public and private funding agencies and accrediting organizations are pursuing physician profiling with various goals in mind. The federal Centers for Medicare & Medicaid Services (CMS), through its Value-Based Purchasing Initiative, is seeking to identify and reward individual physicians providing best practice care for their constituents. Health plans are interested in using profiles to improve efficiency and to guide decisions about hiring and compensating individual physicians. Professional societies, certifying organizations and health care institutions are interested in using profiles as a guide in the granting of clinical privileges and certification and re-certification activities; for targeting professional educational activities; and for reducing clinical error rates. Yet several recent studies contend that current approaches to profiling are not methodologically sound for any of these purposes.

The field needs reliable measures and methods to estimate physician performance.

 Back to the Table of Contents


RWJF STRATEGY

This project promotes RWJF's goal to improve the quality of care and support for people with chronic health conditions. RWJF is pursuing a four-part strategy that builds on previous work to improve the quality of chronic care. The four elements of the strategy are to:

  • Define and measure the quality of outpatient care at the individual practice and provider levels.
  • Build consumer and purchaser demand for public reporting of quality measures and for the delivery of high-quality care.
  • Line up forces within specific local markets to achieve public reporting and facilitate improvements in care (e.g., purchaser demand, measurement, consumer demand, provider improvement efforts, provider competency, leadership).
  • Track and communicate national progress on the Foundation's quality strategy.

This grant directly supports the first and second parts of the strategy. Previous RWJF grants supported the National Committee for Quality Assurance, a nonprofit organization that assesses and reports on the quality of managed health care plans, in developing performance measures to assess how well managed care organizations are caring for their enrollees with chronic conditions (Chronic Care Initiatives in HMOs Grant Results on ID#s 024586 and 028316).

The measures developed for diabetes care were incorporated into HEDIS® (the Health Plan Employer Data and Information Set, a report card on managed care plans produced by the National Committee for Quality Assurance) and endorsed by CMS, the American Diabetes Association's Provider Recognition Program, the Veterans Administration and the Foundation for Accountability. (See Grant Results on ID#s 037080 and 029663.)

This project leverages CMS's considerable past and current efforts in the areas of measures development, specification and testing, data collection methods (e.g., medical record abstraction tools, training and software) and database construction.

 Back to the Table of Contents


THE PROJECT

The project sought to address the methodological challenges and feasibility of using physician profiling as a means for improving chronic disease care in individual physician's practices. Co-investigators Sheldon Greenfield, M.D., and Sherrie Kaplan, Ph.D., then at Tufts University School of Medicine (now at University of California-Irvine), identified and tested physician-level measures of quality of care for patients with diabetes using a longitudinal database (repeated observations over a period time) from a previous study of ambulatory care quality for Medicare beneficiaries.

The Ambulatory Care Quality Improvement Project, funded and directed by CMS from 1994 to 1998, included performance feedback to individual physicians along with a wide range of voluntary, well-documented quality improvement interventions (see Methodology).

The investigators secondary analyses of the database addressed eight major challenges to profiling the quality of individual physician's diabetes care practices:

  • Choosing appropriate quality of care measures for use in assessing physician-level performance. The investigators sought to test whether process measures used at the patient level could be used to measure a physician's performance, as well.
  • Assessing the extent to which patients with certain characteristics gravitate toward physicians with certain characteristics — "nonrandom clustering" of patients by physician for each quality measure.
  • Assessing how consistent the physician is in the quality of care he or she provides across multiple samples of patients in his or her practice — physician-level reliability for each measure.
  • Determining the least number of patients per physician that need to be sampled to insure reliability — determining the power related to patient and physician sampling.
  • Developing scoring methods for creation of profile scores. Because diabetes is a multi-dimensional disease, one cannot devise one measure for quality care, but several measures combined into one aggregate score.
  • Determining whether aggregate profile scores are measuring a physician's consistent behavior across patients — reliability of the scores — and actually measuring what it was intended to measure — validity of the scores.
  • Identifying appropriate methods for adjusting profiles scores to account for the kinds of patients a physician typically sees in his or her practice — case-mix adjustment.
  • Identifying physician/practice characteristics related to profile scores. The investigators wanted to know how much of an individual physician's performance was influenced by the setting in which he or she worked — for example, having a good computerized data system, having a skilled nurse practitioner on staff, or working with other physicians in a group practice.

Methodology

The Ambulatory Care Quality Improvement Project, funded and directed by CMS, sampled physicians (293 in 1995 and 214 in 1998) practicing in fee-for-service settings in Alabama, Iowa and Maryland. The study also sampled patients from these practices who were older than 65, had diabetes and had seen the physician at least three times in either 1995 or 1998. The study used quality of care measures for diabetes that had been well established in the field — process measures, including annual checkups for lipids, creatinine, urine proteins, serum glucose (hemoglobin A1C), performing foot and eye exams, etc., and outcome measures, that is, patients achieving certain thresholds for cholesterol levels, blood pressure, serum glucose levels, etc. These measures were originally intended to assess quality of care at the patient level; little or no empirical support was available at the time to determine their appropriateness at the physician level.

However, the project was a longitudinal study (involving repeated observations over a period of time) and included performance feedback to individual physicians along with a wide range of voluntary, well-documented quality improvement interventions. The project provided a unique opportunity to examine the sensitivity of physician-level scores to quality improvement activities in relatively diverse practice settings and among diverse patient and physician populations.

The current researchers determined that the most prudent use of these measures would be to identify those physicians in the sample at the highest and lowest quartiles (in the top 25 percent and the bottom 25 percent). For the purposes of their study, the investigators limited their sampling to those physicians with 19 or more patients and those who had both medical record and claims data for both of the observation years (1995 and 1998). These constraints yielded a sample with an average of 22 patients per physician.

According to the researchers, because of the lack of complete overlap of patient samples in the original study design, the current data should be treated as a cross-sectional study (analysis that provides a "snapshot" view of data as it appears in a singular moment or period of time) at two observation points, rather than a longitudinal study. The investigators examined two different random samples of patients and physicians, in 1995 and in 1998.

 Back to the Table of Contents


FINDINGS

The investigators reported the following findings to RWJF in 2003:

  • It is possible to develop a precise, reliable score for diabetes care performance using five or more process measures (actions by the physician and tests ordered), with 20 to 25 patients per doctor, and using data from either the medical record or claims data. Including outcome measures (patient test results) did not produce as consistent a score and will require further research.
  • The aggregate score (which combines the process measures) discriminates among doctors who are average, above average and below average with minimal overlap. This phenomenon occurred across two different years, 1995 and 1998, and across different patients. If they perform well on one measure, they perform well on the other measures.
  • Case-mix adjustment — adjusting for case mix, the kind of patients a physician has in his or her practice — has overall effects on the scores, as expected, but does not affect the individual physician ratings. The question of whether and how to adjust process profile scores for differences in case mix will require further research.

Limitations

  • The standard of care for diabetes today differs slightly from that being practiced in 1995 and 1998, when the data from the Ambulatory Care Quality Improvement Project was being collected. However, the investigators note that an aggregate score (averaging five or six measures) makes it simple to substitute new measures, as standards of care for diabetes evolve.
  • This study should be considered a good demonstration of methods for generating a sound profile for measuring physician quality of care. More research is needed using different diseases, different patient and physician samples, and better case-mix measures.
  • The investigators caution that replication of their study is needed. In addition, they note that the varied uses of physician profiling (e.g. accountability vs. improvement) affect the methodological rigor required. For example, if profiles are presented to the public as a guide for choosing physicians, very little error can be tolerated. For use internally with physicians for quality improvement (by health plans or physicians' practices, for instance), profiles may not need the same degree of rigor.

Communications

Investigators presented their findings at the American Health Services Academy meeting in June 2002, at the Physician Level Performance Measurement Conference in Washington in October 2002 and at the Commonwealth Fund national meeting in the fall of 2003. Investigators expect to submit several articles of their findings to peer-reviewed journals by the summer of 2004.

 Back to the Table of Contents


SIGNIFICANCE TO THE FIELD

The investigators contend that the results of their study may help the field get past the two major obstacles to implementing physician profiling — fear and cost. "We were compulsive about the critical need for case mix adjustment, and we are finding that the impact of case mix may not be as dramatic as we thought," says principal investigator Greenfield. Being able to overcome concerns about the impact of case mix on performance scores may win physicians who are leery of profiling.

"In addition," Greenfield continues, "the scores we got using the medical record were surprisingly parallel to those we got with the claims data. That's a big lesson, because it gets expensive if you have to use the medical record." Being able to get reliable measures from claims data would significantly streamline the profiling process and save the health system considerable dollars, according to Greenfield.

 Back to the Table of Contents


LESSONS LEARNED

  1. When working in a politically charged area, researchers must be good educators as well as scientists. This project's measurement experts were surprised at the level of resistance to their work on physician profiling. "We did not anticipate the education that was needed before many physicians and physician organizations could understand the science of measurement well enough to endorse its products," wrote principal investigator Greenfield in his final report to RWJF. Health researchers must "prepare information that is friendly to clinicians." (Principal Investigator)
  2. The field needs more and better venues for addressing the significant differences of opinion in how physician profiling should be used. Gaps remain between clinicians and health services researchers, between guideline developers and performance measure developers, and between those who think doctor performance must be made public and those who feel such data should be used only for quality improvement. "It cannot be assumed that the science that produces fair, reliable, valid and credible measures will automatically be accepted by even leaders of the…medical community," wrote the Principal Investigator in his final report to RWJF. (Principal Investigator)
  3. Transparency about measurement strengths and weaknesses is critical for advancing provider buy-in and appropriate measurement use. (RWJF Program Officer, C. Tracy Orleans)

 Back to the Table of Contents


AFTER THE GRANT

With funding from the Commonwealth Fund, the investigators tested their diabetes care profiling methods with another database provided by the National Committee for Quality Assurance's Provider Recognition Program. CMS also has included the diabetes care measures developed for this project in the standardized ambulatory measurement set being used in its Doctor's Office Quality Project, which focuses on developing measures that will be used to improve care for chronic a set of conditions in office practices. The investigators also are also using data from CMS' Doctor's Office Quality Project (which focuses on developing measures that will be used to improve care for a set of chronic conditions in office practices in several states) to test the diabetes care measures developed in this project.

In a parallel project, researchers at the National Committee for Quality Assurance are testing the physician level diabetes care measures developed in this project against "systems" measures — those factors related to the context in which a physician practices.

The RWJF program officer notes that professional societies are beginning to require physicians to participate in quality improvement efforts and are using physician performance measures as part of recertification. The investigators are convening focus groups and panels to determine how best to communicate physician profiling methodology to various groups interested in using this information.

 Back to the Table of Contents


GRANT DETAILS & CONTACT INFORMATION

Project

Conducting a Physician-Level Quality Assessment of Chronic Illness Care

Grantee

Tufts University School of Medicine (Boston,  MA)

  • Amount: $ 386,196
    Dates: August 2001 to March 2003
    ID#:  039225

Contact

Sheldon Greenfield, M.D.
(949) 824-7286
sgreenfi@uci.edu
Sherrie H. Kaplan, Ph.D., M.P.H.
(949) 824-7286
skaplan@uci.edu

 Back to the Table of Contents


APPENDICES


Appendix 1

Glossary

Aggregate score — the way in which separate quality of care measures are combined into one score. In this study, investigators used a simple algebraic sum to create separate process and outcome scores for each physician in the sample. For example, the aggregate process score related to the proportion of physicians' diabetic patients who received the annual tests deemed essential for quality diabetes care. The aggregate outcome score related to the proportion of physicians' diabetic patients who had medical test readings that met certain quality-of-care thresholds.

Case mix — the mix of patients treated within a particular institutional setting, such as the hospital.

Case-mix adjustment — the statistical adjustment of outcomes measures to account for risk factors that are independent of the quality of care provided and beyond the control of the plan or provider, such as the patient's gender and age, the seriousness of the patient's illness, and any other illnesses the patient might have.

Case mix bias — the nonrandom tendency of patients with certain characteristics (such as age, co-morbid conditions or severity of illness) to choose physicians with certain characteristics (such as specialty), which thus affects outcomes for that physician. For example, a physician who sees more older and sicker patients may appear to provide lower quality of care than a physician seeing younger patients with fewer chronic illnesses. Case-mix adjustment attempts to correct this problem.

Cross-sectional study — the kind of analysis that provides a "snapshot" view of data as it appears in a singular moment or period of time.

Longitudinal database — a collection of repeated observations or examinations of a set of subjects over time with respect to one or more study variables (such as general health, the state of a disease, or mortality).

Longitudinal study — research involving repeated observations or examinations of a set of subjects over time with respect to one or more study variables (such as general health, the state of a disease, or mortality).

Nonrandom clustering — the tendency of patients with certain characteristics to gravitate toward physicians with certain characteristics. Patients in such a cluster might be more like each other than and differ from patients drawn to another physician's practice. The smaller and more homogeneous the unit (for example, the individual physician's office rather than the hospital), the greater the potential for nonrandom clustering.

Outcome measures — health care quality indicators that gauge the extent to which health care services succeed in improving patient health.

Physician-level reliability — how consistent the physician is in the quality of care he or she provides across multiple samples of patients in his or her practice.

Power — the number of patients that must be included in a study sample for measures to be reliable at the physician level.

Process measures — health care quality indicators related to the methods and procedures that providers use to furnish care.

Reliability — whether a quality of care measure reflects a physician's consistent behavior across a sample of his or her patients and over time.

Validity — the extent to which a score actually measures what it was intended to measure.

 Back to the Table of Contents


Report prepared by: Kelsey Menehan
Reviewed by: Robert Crum
Reviewed by: Marian Bass
Program Officer: C. Tracy Orleans