data owner ↔ public interest
2013-2015 West African Ebola virus outbreak
Two years later, we have a detailed understanding of the outbreak. Why did this take so long?
Sharing of Ebola virus sequences
- Baize et al.: Samples collected in March 2014, sequences in GenBank within a month
- Gire et al.: released data as it was generated
- Followed by a long gap...
- Early insights into transmission dynamics are important
- Sharing has to be immediate -- not upon publication
Recurring problem
- 2002 Severe acute respiratory syndrome (SARS)
- 2003 H5N1 influenza outbreak. Some countries stopped sharing any data
- 2013-2015 Ebola virus outbreak in West Africa
- 2014-2016 Zika Virus outbreak: Controversies about attribution and reuse
- 2014- H7N9 influenza outbreak: Controversies about attribution and reuse
Different disease -- different scientists and institutions.
→Lessons need to be relearned.
Barriers to data sharing: scientists
- Privacy of study participants
- Fear of being scooped/ensure maximal return
- Secondary analysis perceived as freeloading: "data parasites"
- Don't want to be second guessed
- Release and curation is laborious
- Sloppy records
Barriers to data sharing: organizations and governments
- Economic consequences of outbreaks (tourism, agriculture)
- Conflicts between high and low/middle income countries
- Concerns about IP and commercial exploitation
- Legislative/regulatory barriers
Overcoming Barriers
Open vs restricted sharing
- Alternatives to GenBank: GenBank is "public domain", no requirement to credit data producers
- GISAID/EpiFlu: sign-up and agree to terms and conditions
- virological.org: Platform for sharing and discussing molecular epidemiology
- Explicit data reuse terms
- Outline planned projects in white-paper
- Caveat: Very difficult to enforce...
Building Trust
- Peter Bogner coordinated Influenza data sharing
- Andrew Rambaut coordinated Ebola virus data sharing
- During the EBV outbreak, WHO and journals explicitly encouraged data sharing
Make sharing easy and provide incentives!
nextstrain.org
- Global analysis provides context for new sequences
- Once nextstrain became the largest collection of Ebola/Zika sequences, everybody wanted their sequences on nextstrain
- We take care to highlight the scientists contributing data
- Link to original source, rather than reshare
Outlook
- Response has to be fast
→ pre-existing trusted framework to share data
- Wider acceptance of preprints should help
→ establishes priority and a citation hook
- We need incentives for high quality data sets
- We need better ways to credit data producers
→ data citations
- Not only data: Open source code and analysis pipelines
nextstrain.org
- Trevor Bedford
- Colin Megill
- Pavel Sagulenko
- Sidney Bell
- James Hadfield
- Wei Ding