Summary The 2017 GBIF Ebbe Nielsen Challenge will award a total of €14,000 to developers and data scientists who create tools capable of liberating species records from open data repositories for scientific discovery and reuse. Background This year's Challenge will seek to leverage the growth of open data policies among scientific journals and research funders, which require researchers to make the data underlying their findings publicly available. Adoption of these policies represents an important first step toward increasing openness, transparency and reproducibility across all scientific domains, including biodiversity-related research. To abide by these requirements, researchers often deposit datasets in public open-access repositories. Potential users are then able to find and access the data through repositories as well as data aggregators like OpenAIRE and DataONE. Many of these datasets are already structured in tables that contain the basic elements of biodiversity information needed to build species occurrence records: scientific names, dates, and geographic locations, among others. However, the practices adopted by most repositories, funders and journals do not yet encourage the use of standardized formats. This approach significantly limits the interoperability and reuse of these datasets. As a result, the wider reuse of data implied if not stated by many open data policies falls short, even in cases where open licensing designations (like those provided through Creative Commons) seem to encourage it. The Challenge The 2017 GBIF Ebbe Nielsen Challenge seeks submissions that repurpose these datasets and adapting them into the Darwin Core Archive format (DwC-A), the interoperable and reusable standard that powers the publication of almost 800 million species occurrence records from the nearly 1,000 worldwide institutions now active in the GBIF network. The 2017 Ebbe Nielsen Challenge will task developers and data scientists to create web applications, scripts or other tools that automate the discovery and extraction of relevant biodiversity data from open data repositories. Such tools might generate datasets ready for publication on GBIF.org by:
Automating searches of open data available in public repositories
Effectively mining the information needed to generate checklists, species occurrence and sampling-event datasets (e.g. scientific names, date and location of occurrence et al.) from datasets in these repositories
Mapping datasets’ column headings and/or contents with standardized Darwin Core terms
Routinely converting the reformatted data into Darwin Core archive formats ready for publication through GBIF.org
Resources and reference material Background on Darwin Core and Darwin Core Archives
What is Darwin Core (and why does it matter)?
Darwin Core Archive: A how-to guide
Explainer on GBIF dataset types/classes
DwC-A templates for checklists, occurrence datasets and sampling-event datasets
Data quality recommendations
Recommended terms for sampling events
DwC-A Validator
Examples of datasets manually harvested and published from open-data repositories Global compendium of Aedes aegypti and Ae. albopictus occurrence
Kraemer MUG, Sinka ME, Duda KA, Mylne A, Shearer FM, Brady OJ, Messina JP, Barker CM, Moore CG, Carvalho RG, Coelho GE, Van Bortel W, Hendrickx G, Schaffner F, Wint GRW, Elyazar IRF, Teng H, Hay SI (2015) Data from: The global compendium of Aedes aegypti and Ae. albopictus occurrence. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.47v3c.2 Originally published inKraemer MUG, Sinka ME, Duda KA et al. (2015) The global distribution of the arbovirus vectors Aedes aegypti and Ae. albopictus. eLife 4:e08347 http://dx.doi.org/10.7554/eLife.08347Kraemer MUG, Sinka ME, Duda KA et al. (2015) The global compendium of Aedes aegypti and Ae. albopictus occurrence. Scientific Data 2(7): 150035. http://dx.doi.org/10.1038/sdata.2015.35
On new GBIF.org: Global compendium of Aedes albopictus occurrence: https://demo.gbif.org/dataset/33614778-513a-4ec0-814d-125021cca5fe
On new GBIF.org: Global compendium of Aedes aegypti occurrencehttps://demo.gbif.org/dataset/d4eb19bc-fdce-415f-9a61-49b036009840
LTER sampling-event dataset, Bird census at the beach of Doñana Natural Space
On DataOne: https://search.dataone.org/#view/knb-lter-europe-deims.13610.15384
on LTER-Europe: https://data.lter-europe.net/deims/dataset/2a0762f2-4630-11e3-aeb9-005056ab003f
On GBIF Spain IPT: http://www.gbif.es/ipt/resource?r=donana
On new GBIF.org: https://demo.gbif.org/dataset/9a57e938-3616-4f8c-985a-c9b66e7a1347
Open-data repositories and aggregators The following list is not by any means exhaustive. We welcome suggestions on other relevant services to highlight for prospective Challenge entrants.
Dryad | Data access | rdryad
FigShare | API feature list | rfigshare
Zenodo | Developers site | rzenodo
Mendeley Data | Dataset API
OpenAIRE | API documentation
DataONE | API reference | R for DataONE
Extra credit Keeping the 2016 Ebbe Nielsen Challenge in mind, GBIF is particularly interested in tools that address data biases and fill gaps by mobilizing occurrences from under-represented geographies, taxa, time periods, or thematic areas like vectors of human disease or alien and invasive species. GBIF is also eager to see tools capable of converting open-access repository datasets into the quantitative 'sampling-event' format recently supported in the Darwin Core standard. Such datasets can capture richer information like species abundance, presence/absence, level of effort, and standard sampling methodologies and protocols. Sponsor Special thanks to the Swedish Research Council for its support of the 2017 Ebbe Nielsen Challenge.
June 15, 2017 - Sept. 5, 2017
GBIF
Online
$14,000