Written by Rosa Bianca Gallo, Alessia Peviani & Sofia Bazakou
Choosing the right data model is never a trivial task. In the biomedical data space there are tons of proprietary and open models available: one could create a custom format or work with a widely adopted industry standard. Whatever choice is made, one always has to deal with versioning and consider integration with other formats and systems used within the specific data ecosystem. We at The Hyve help our clients make the right choice by evaluating fit for purpose based on the analysis use cases and develop data harmonization strategies. When needed, we also create ad hoc adaptations or newer versions of the data models. A standard we advocate and specialise in is the OMOP / OHDSI Common Data Model. In this article we will focus on comparing OMOP CDM version 5.3.1 against CDM version 6.0. This comparison should help you answer the question which version would work best in your data landscape.
Before we dive into a version comparison, let’s start with a short history of the OMOP CDM.
The Observational Medical Outcomes Partnership Common Data Model, aka OMOP CDM, aims to harmonize disparate coding systems with minimal information loss to a standardized vocabulary.
To quote Patrick Ryan, leader of the OHDSI community: “OMOP was initially formed in 2008. The majority of the effort in the initial months was designing the series of methodological experiments. There was discussion on the need to bring together a community of disparate databases and to test a centralized model and distributed network approach, but the specific details of what the CDM would look like to enable that research didn’t come out until 2009.”
If you are completely new to the OMOP / OHDSI community, we recommend reading these first:
If you are familiar with the OMOP CDM, you may have worked with version 5. Subversion v5.3.1 is the latest and is supported by all the recent OHDSI tooling. To each subversion, the community added new tables (e.g. cost, visit_detail) and new fields (e.g. condition_status, admitting_source). All the details of added features can be found in the release notes.
To the most recent version, v6.0, a number of groundbreaking changes were made, most notably the removal of the death table. It was released in October 2018.
Which version of the OMOP CDM researchers are choosing highly depends on their use cases or research questions. One major advantage of CDM v5.3.1 over v6.0 is that it is supported by all analysis tools (Achilles, Atlas, etc.). Therefore, if the aim of converting your data to OMOP is to use the OHDSI tools you are better off with v5.3.1.
However, as the community grows and there are more use cases to work with the need emerged for some changes. This is exactly what CDM version v6.0 takes into consideration. It provides a very good model for fitting all use cases together.
Examples of analysis use cases with OMOP.
One of the main developments in the newer version is the introduction of tables helping with the representation of a more versatile dataset. One of the new tables is the “location history” table, to connect person or care site information with geographic locations over time. “Survey conduct” is another addition to v6.0, intended to store an instance of a completed survey or questionnaire. Arguably one of the most important additions is the oncology extension module.
Appropriate characterization of cancer characteristics requires a high level of detail, such as anatomical site, biomarkers and so on. Typical observational data sources cannot capture this level of detail. Therefore, the oncology extension aims to provide a foundation for representing cancer data at the level of granularity and abstraction required to support observational cancer research.
Version v6.0 also contains a number of smaller but significant changes. For example, the status of a number of fields has changed from optional to ‘required’ to safeguard against missing information vital for later steps of the analysis process, like building study cohorts. Other changes address more technical issues. The “data type” of concept_id fields, for example, have been modified from integer to bigint to minimise errors in the handling of the information.
Overview of the pros & cons between versions
If your data has been converted from an earlier OMOP version to v5.3.1 and you are considering an upgrade, these are the main challenges that such a migration implies.
Challenges with migrating from version 5.3.1 to 6.0
The latest version of the OMOP CDM brings some structural changes that can present a challenge to users interested in upgrading from v5.3.1. The most striking issues are:
- the complete removal of the “Death” table
- a profound restructuring of the “Cost” table
- newly added tables and fields
Mapping from CDM v5.3.1 Death table to CDM v6.0 Person, Observation, and Condition Occurrence tables
With the “Death” table being gone, death records in v 6.0 are captured in the “Observation” table with a death-specific observation type, and the cause of death can be linked to these observations through a record in the “Condition Occurrence” table. The time of death is recorded in the “Person” table using the new v6.0 field death_datetime. Even though it is not a simple mapping from old to new CDM, its rules are sufficiently well defined to allow for an automated migration process. The Hyve’s engineering team will launch a mapping tool that accelerates this migration process.
Mapping from CDM v5.3.1 to CDM v6.0 Cost table fields (note: only some fields are shown, and mapping details might vary depending on the source data).
The changes to the “Cost” table have introduced a more flexible and straightforward way of capturing different types of costs, removing previous breakdown fields. Translating older records into this new format is likely to require custom mapping logic to succeed. This change will be mainly a concern for users dealing with U.S. healthcare system data as this table is rarely used in other regions.
Tables newly added to CDM v6.0 (Location History, Survey Conduct) should in principle be of no concern to current users, unless there is a good reason to recapture some of the existing data into them. Concerning changes at the single column level, in several tables datetime is now a mandatory field, but populating it from existing date fields should be fairly straightforward. New visit_detail_id, event_id and event_field_concept_id fields have also been added to some tables (Observation, Cost, Note) to facilitate linking to records in other CDM tables, but they could simply be ignored. The only exception is when mapping old death records to observation records, which we mentioned earlier.
Users who are currently adopting a customized version of CDM v5.3.1 should additionally consider the following:
- any modification to existing CDM v5.3.1 tables (custom columns, changes to the properties of standard columns) would need to be properly documented and ported to the new CDM version, where applicable.
- custom tables and vocabularies could in principle be migrated as they are.
Hybrid solution: why and how?
An intermediate solution could be a hybrid of the two versions that would support the new tables along with the oncology extension and any other custom tables but at the same time keep the “Death” table. In such a hybrid version, the ETL would be built to populate all the fields that are required in either version.
Oftentimes, the work needed to achieve a successful conversion of data into any OMOP CDM version is underestimated or simply unclear to data consumers. This blog has highlighted some of the complexity and has identified key advantages of the latest CDM versions.
Our team of data engineers and semantic experts can help you choose the right analysis use cases for your data and help accelerate the adoption of the open-source toolkit. A fundamental question for such projects is always: "What questions are you trying to answer with OMOP? "
Many clients found the OMOP CDM beneficial to explore multiple datasets in order to better inform research decisions. Observational data can also be linked to innovative trial design or accelerate time to market of new drugs. We support our clients with gap analysis across different data sources to help them understand safety and efficacy profiles. We are an expert in converting EHR data, claims and commercial data to achieve label expansion by reducing the need for new randomized controlled trials by improving evidence of economic value.
Overall, The Hyve supports and advocates the open-source OMOP/OHDSI ecosystem by offering a modular package of services that includes:
- integration of your own data model with the OMOP CDM and ontologies;
- deployment of stack and custom development of new features;
- data mapping and conversion of EHR, EMR, RWD, claims, commercial data
- ETL design
Last but not least, our Health Data Infrastructure team can support the integration of RWD with genomics and oncology data and tools. Get in touch with our sales team if you want to know more!