The COVID-19 pandemic is challenging science and society to an unprecedented degree. Containment of the virus and effective medical treatment will ultimately depend on having robust multivariate clinical and genomic data sets, and the infrastructure necessary to support a large-scale, coordinated scientific and public health effort.
The COVID-19 Data Forum: is an ongoing series of multidisciplinary, webinars and online meetings for topic experts to discuss data-related aspects of the scientific response to the pandemic. These topics include identifying the various kinds of data required for epidemiological modeling and public health decision making; data collection, curation and sharing; setting standards for accuracy and other measures of data quality, data privacy issues; and establishing best practices for harmonizing the data collected by multiple institutions and best practices for supporting public health officials and decision makers.
A series of private and public discussions, involving active participants in the wide range of disciplines concerned with COVID-19 related data. These discussions will also actively seek dialog with decision-makers and others relying on information from the data and from models or analysis based on it. The Forum places particular emphasis on being open to all relevant interested groups and, with respect to computing, to considering all useful tools, languages and environments. We hope that the COVID-19 Data Forum discussions can usefully proceed through three stages of questions:
At all stages, there are many specific topics that need discussion. To sort them out, three kinds of activities are useful categories: obtaining the data; using the data; and communicating about the data.
The COVID-19 data challenges begin with just acquiring data of the range and quality needed. A very wide range of data is needed, in three dimensions: geographical, time and domain. Depending on the purpose, data may be needed either very specifically local or at the widest global level. Both are challenging — finding reliable local sources and resolving hugely variable international ones, for example. Particularly on the global (or even national) scale, variable quality will often be a challenge. Timeliness of the data is clearly essential, particularly as public health regulations and other societal responses change. But scientific models and analysis may also need to have data over a long time span.
The pandemic has touched our lives in many ways: directly in our health but also in nearly all aspects of our economy and society. As the world responds, data science will need to consider all these aspects, requiring data from the microscopic level of the virus to the population data for epidemiology, social science, and economics.
The response to the COVID-19 pandemic from the Community continues to generate crucial data-based results. Epidemiologists, public health experts, data scientists, and other researchers have produced a large number of predictive models, interactive resource allocation applications, and disease tracking dashboards.
Moving ahead, it will be important to have easy, consistent access to the best data for all these efforts. Co-operation and co-ordination among the teams involved can enhance the scope and help ensure that model results and comparisons use consistent, well-defined data sources.
A key goal of the Data Forum is to improve communication between decision-makers (in public health, government, and elsewhere) and the data science and general research community. Many tools have been developed for visualizing and interacting with data. It’s important to understand how these can be used and enhanced for the decision-making community. We look forward to participation in our meetings by interested members of this community.
Another important goal is to improve the information flow to the broader community, with emphasis on giving insight and avoiding misdirection.