Accurately linking data from disparate systems and sources is often the most challenging aspect of any business analytics or big data initiative.  This challenge grows exponentially when the data being linked is sourced from many different organizations and is related to people.

For the past several years, iBusiness Solutions had the privilege of solving this linking challenge for the State of Minnesota on a scale never before attempted.  The new Minnesota P20W data warehouse and its integral data linking engine, provides a valuable data repository for educational and employment analytics for many audiences to improve the educational experiences and outcomes of individuals from early childhood through college graduation and increase the likelihood of meaningful, related, and sustained employment. The P20W data warehouse serves as an umbrella structure for three separate but overlapping data projects – Early Childhood Longitudinal Data System (ECLDS, birth to grade 3 data), Statewide Longitudinal Education Data System (SLEDS, kindergarten through postsecondary and workforce), and Workforce Data Quality Initiative (WDQI, education and work).

The P20W data warehouse brings together data from education and workforce to:

  • Identify the most viable pathways for individuals in achieving successful outcomes in education and work,
  • Inform decisions to support and improve education and workforce policy and practice, and
  • Assist in creating a more seamless education and workforce system for all Minnesotans.

The P20W data warehouse encompasses the longitudinal life of an individual from early childhood, elementary and secondary education, up to and through higher education and employment, including the various paths people take through education, to employment, and back and forth.  In order to produce a longitudinal view of a person, all records for a given individual must be matched and linked across dozens of data collection systems that capture varying data elements for personally identifying information (PII) and contain varying levels of quality and accuracy of that PII.  As a result, person linking is part science and part art form that utilizes probabilistic matching algorithms to match and link people longitudinally with a relatively high degree of confidence, but always with an estimated error rate.

For a complete overview of the architecture, the linking logic, the lessons we learned, and more, please read our whitepaper on this topic, Person Linking for the State of Minnesota’s P20W Initiative.