Open Science

in

Big Data

A workshop of the IEEE BigData Conference exploring the challenges
and opportunities of Open Science philosophy and practice in Big Data

Research

Fundamental research into novel tools and theoretical methods to advance the frontier of big data applications.

Reproducibility

Practicing open and reproducible research and development, addressing the unique challenges of this undertaking in big data.

Accessibility

Ensuring tools and techniques are accessible to a broad demographic of all who are interested in big data research.

Education

Incorporating the latest in big data research and implementation into higher education.

Workshop Topics

Exploring the intersection of Open Science and Big Data

"Open science" encompasses efforts on the part of scientists to improve reproducibility of original research. This includes publishing data sets, providing free and open access to resulting publications, and releasing code under open source licenses. Proprietary software and closely guarded datasets have given way to vibrant open source communities and open access journals.

Applications in big data, however, have been uniquely challenging to incorporate into open science. These complications include datasets too large to host publicly, extensive codebases in highly customized compute platforms, and lack of available computing resources to efficiently replicate the original research environment.

This workshop will focus on the current practices of and future directions for democratizing big data analytics and improving reproducibility of research in big data. This includes, but is not limited to

  • Core research in big data that uses open source frameworks
  • Open source tools and subprojects for specific big data use-cases such as neuroimaging or multimodal data integration (e.g. DL4J, thunder, Alluxio, Arrow)
  • Next-generation open source big data paradigms (e.g. Beam, Flink, Apex)
  • Creating, sharing, and maintaining large and open datasets (e.g. dat)
  • Open cloud resources for interacting with big data (e.g. Databricks Community Edition, mybinder)
  • Containerized and open source big data applications and environments (e.g. Docker)
  • Open science and big data in the classroom
  • Other open science use cases in big data analytics

As such, submissions must have a strong open source / open science component, in addition to relevance in big data analytics.