The Reliability of Medical Group Performance Measurement in a Single lnsurer's Pay for Performance Program
Even for a large insurer with a significant market share, the reliable measurement of performance is challenging due to data limitations, according to this study. This suggests mechanisms must be developed for multiple stakeholders to collaborate and pool patient data.
Most Pay for Performance (P4P) programs in the U.S. are implemented by a single insurer. Each insurer uses the data from their covered medical groups as a whole to assess the performance of each group. But there is concern that a single insurer does not have sufficient data to create accurate and reliable performance standards. Using seven years (2001-2007) of patient data (a total of 197,905 “person-years”) from a single insurer that covers 20 medical groups in Washington state, this study examines whether there are ample annual sample sizes to establish reliable standards in eight clinical care process measures.
- Only 45 percent of the annual data groups for the various measures had sufficient patient counts to produce accurate performance measures.
- Some measures involving larger numbers of patients did reliably have sufficient data, including those related to breast and cervical cancer screenings and diabetes care.
- Other patient groups in this study, including those related to asthma treatment, coronary artery disease, and the provision of comprehensive well child visits, did not provide sufficient data.
- Statistical adjustments related to patient age, sex, and disease comorbidities greatly improved the precision of performance measures.
The authors note that the “fragmented organization and implementation” of P4P initiatives runs a “high risk” of unreliable measurement of medical group performance, as is evidenced by this study. They call for the development of collaborative mechanisms to allow multiple payers and insurers to pool data, but acknowledge there are significant competitive and practical challenges.