Future data analysis services at European and national

Future data analysis services at European and national photon and neutron facilities

Z. Matěj, J. Brudvik, A. Salnikov

MAX IV Laboratory, Lund University, Lund, Sweden

zdenek.matej@maxiv.lu.se

The most powerful X-ray laser facilities as well as multiple 4^th generation synchrotron light sources are in user operation across the Europe. Preparation for the European spallation source is ramping up, more photon and neutron (PaN) facilities are planning upgrades. With the excellent brightness, larger and faster detectors enormous volumes of scientific data are produced. The variety of experiments as well as PaN user communities are broadening, publishing practices and academic institutions requirements for scientific data management, including interest in open-data, are increasing. In late 2018, within the European Open Science Cloud (EOSC) initiative, several major European PaN facilities started a project called PaNOSC [1], which was complemented by ExPaNDS [2] project at the national research PaN institutes a one year later. Both projects aim for expanding scientific data catalogues and analysis services in order to make scientific data at PaN facilities comply with the FAIR data principles. This includes adjustments to scientific data policies, extension of scientific data retention period, tools to search for datasets of possible scientific or scholar interest, improvements of data accessibility, data formats and metadata catalogues and finally a possibility to reproduce the scientific results by means of remote data analysis services. Within the wide scope of the projects several application use cases have been chosen to prototype all the services including data analysis. The selected scientific use cases cover multiple methods including crystallography, for the ExPaNDS project in particular serial crystallography [3], CryoEM and powder diffraction [4], but also other techniques as small angle scattering, reflectometry or ptychographic X-ray computed tomography [5]. The use cases represent several types of PaN sciences analysis workflows including Python Jupyter notebooks, conventional high-performance distributed computing, cloud-like containerization for data-science and remote desktops for visualization. The idea is to match the environments to run scientific software with archived datasets and records in metadata catalogues. The projects outcomes include definition of application interfaces and a functional protype that can be deployed at research facilities, can be interconnected with other tools, developed and extended in a sustainable way, allowing to bridge more scientific data into EOSC.

1. PaNOSC: Photon and Neutron Open Science Cloud, https://www.panosc.eu (Nov 9, 2020).

2. ExPaNDS: EOSC Photon and Neutron Data Services, https://expands.eu (Nov 9, 2020).

3. X. E. Zhou et al., Sci Data, 3, (2016), 160021. doi:10.1038/sdata.2016.21

4. B. H. Toby, R. B. Von Dreele, J. Appl. Crystallogr., 46, (2014), 544. doi:10.1107/S0021889813003531

5. M. Kahnt et al., J. Appl. Crystallogr., 53, (2020), 1. doi:10.1107/S160057672001211X