Improving Data Quality in New York City's Immunization Registry

In October 1997, when CIR applied for the All Kids Count grant, 60 percent of private, office-based providers were sending immunization reports to the registry in response to a mandate that went into effect in January that year. By the end of the All Kids Count program, that total had increased to 91.5 percent, a gain of 31.5 percentage points.

In 1998, it became clear that CIR data was not high quality. While some record fragmentation and duplication was expected to occur in a database containing records on 1.7 million children provided by some 1,200 different sources (104 public providers, 1,083 private providers, and 30 managed care organizations), duplication approached three records for every two children.

To address data quality on the "back end," two sophisticated computer programs were developed with expert consultants. MEDD (Maximum Entropy De-Duplication) and Smart Search identify and merge vast quantities of fragmented or duplicate records with up to 99 percent accuracy.

In the longer-term, CIR and the Lead Quest program within the New York Department of Health are developing an integrated system that will leverage resources of the immunization and lead programs.