On September 4th 1998 in Menio Park, California, Larry Page and Sergey Brin founded Google, their mission, to organise the world's information and make it universally accessible and useful. In 2020, the world witnessed the global outbreak of the SARS-CoV-2 virus and there has since been a tidal wave of effort to collect data to understand everything from associated symptoms, risk factors and prevalence through to biomolecular data including protein structures, biochemical pathways, compounds and drug targets.

Our work at Connected Diagnostics has seen us design and implement connectivity for a range of medical devices where digital data such as clinical results and valuable epidemiological data can be sent to appropriate repositories for stakeholder utilisation.

The question of what format these data should be made available in (FHIR, HL7 etc.) and the processes of how to actually get them into databases i.e. Application Programming Interfaces (APIs) are not the subject of this article but more so ‘what data repositories exactly should epidemiological data be sent to?’

The deployment location of connected instruments will normally govern where clinical test results for patients are sent, typically to a Laboratory Information Management Systems (LIMS) or the patient Electronic Medical Record (EMR) but what about anonymised data for global disease research and surveillance? Is there just one central database that is the obvious choice? Is there one database that rules them all?

Data collection is nothing new in the fight against disease, although there still remain too many instances where data are being collected and reported manually around the world using paper records. The vast majority of health systems now utilise various digital data repositories to store and hopefully utilise their collected information. The recent SARS-CoV-2 pandemic has however seen a dramatic demand for data along with increased recognition of its value, yet much of these data are still stored in silo across fragmented organisations, institutions and locations.

This isn’t intended to be a peer reviewed and published article in a noted journal and so you’ll forgive the following general and sweeping statements but even the briefest of searches will throw up many search results for databases and portals that have been established, to collect and offer for use, different subsets of SARS-CoV-2 data. The purpose of these portals is to try and offer as much data as possible through a single site.

Whilst this approach isn’t new and clearly provides more benefit than not, where does it stop? In trying to explain the issue to my wife recently I used the analogy of car insurance comparison sites. First came the offering of car insurance directly through company websites, then shortly after the first comparison site offering to aggregate quotes from the individual companies. Then the inevitable, a comparison site that offered to aggregate quotes from other comparison sites.

With the many databases that now exist with various SARS-CoV-2 data, who is the aggregator? Will we see an aggregator of the aggregate databases or do they already exist? Where should anyone with data to offer to the global cause of fighting this pandemic send it? The world’s largest data aggregator Google, whilst useful for many things, isn’t where many academics and researchers would head to for trusted disease related data that isn’t influenced by advertising revenue or political agenda.

So for the time being I think we will continue to see the scenario that collected data are stored in siloed databases with simple APIs that allow data mining for anyone that is interested. We will continue to see various portals attempting to aggregate the listing of these databases and at the same time continue to hear calls from the global health community for reliable and trusted data that is easily accessible without having to search through hundreds of databases to get it.

For manufacturers of diagnostics, this is a rapidly changing environment where flexibility and control of data routing are required to enable data to end up in the right database for the right stakeholder, no matter where that might be.

Chris Isaacs

CEO – Connected Diagnostics