Intro to the first meeting
Welcome everyone! to the first session of the COVID-19 Data Forum.
Now, and in the discussions, talks and other events we hope will follow, our goal is to provide a forum for those active in data-related efforts to deal with the COVID-19 pandemic, where they can share experience, suggestions and ideas for the future.
We see these sessions as evolving through three stages:
1- Sharing current experiences: what has been important and helpful; what challenges have had to be met.
2- Ideas about steps to improve what’s possible, perhaps by sharing or augmenting current resources in data or software. Particularly steps that could be taken fairly directly now.
3- More ambitiously, looking for ways to make sharing and extending more effective, perhaps by adopting some shared standards or interfaces among existing software and data structures.
A natural question is: “Why a Data Forum? Are data-specific questions important enough for this attention?” We strongly believe so.
Data questions arise in a scientific study at three broad stages.
-
First, obtaining the data. How does what you record relate to what you wanted to record? There will be errors, usually, but even more the observed quantity may not be what you would really like to record. Variations over geography or time may be important. Combining data of different domains (e.g., public health combined with economic data) raises key consistency questions.
-
The second stage is using the data for analysis and modeling. One particular aspect not always appreciated is the importance of matching the software structure provided with the valid scientific meaning of the data. The analyst needs to express meaningful operations on the data naturally. But at least as important — the structure should prevent, or at least make unintuitive, operations that produce meaningless combinations from the data. (A whole chapter on the data frame as a structure possible here!)
-
Then, the key third stage: communicating about the data, often augmented by results of models or other analysis. The key challenge is to produce interpretable, clear visualizations or other summaries, without distortions or implications not supported by the actual results. Perhaps the most challenging of all the data-related operations.
Our prediction, fairly confidently, is that today’s session and future discussions will bring out important examples of these and other data-based aspects of our response to this challenge.