e-INFRA CZ Interactive and FAIR: Beyond Qsub Computing

I. Křenková1, A. Křenek2

1CESNET, z.s.p.o, Generála Píky 430/26, 160 00 Prague 6

2Institute of Computer Science, Masaryk University, Šumavská 416/15, 602 00 Brno

ivana.krenkova@cesnet.cz

 

For decades, resources provided by large computing and data infrastructures have been accessed by non-interactive batch job submission and data storage on shared filesystems. There are good reasons to retain this approach -- it still offers the optimal performance and the best utilization of the precious resources. On the other hand, it suffers from limited user comfort and support for recently emerging requirements of reproducible science.

Therefore, e-INFRA CZ provides and fully supports alternative computing environments:

Jupyter notebooks have gained attention in recent years. The user's work is recorded as a sequence of computational steps (``cells''), which may include simple calculations and data processing, visualization of intermediate results, and also spawning extensive calculations which allocate further resources (CPU, GPU, memory) dynamically.

At the technical level, those calculations are containerized applications running in the same Kubernetes environment.

We provide easy-to-adopt and extend examples with molecular dynamics simulation and refer to examples of complex workflows (molecular dynamics combined with machine learning, ab initio corrections of molecular mechanics forcefields).

With the use of widgets libraries, experimental Jupyter notebooks can evolve continuously into full-featured GUIs. Currently, we provide such interfaces for Alphafold/Omegafold/Esmfold and a pilot for the molecular dynamics hub; others may emerge according to the community requirements.

Galaxy is a widely supported community framework where thousands of pre-canned tools (originating in genomics/proteomics/metabolomics but not limited to those anymore) are composed into workflows. We provide general-purpose usegalaxy.cz installation, which mirrors the reference usegalaxy.eu set of tools but provides considerably larger resources to e-INFRA CZ users and a few specialized ones for specific communities (Repeat Explorer, UMSA).

The OnDemand platform provides a similar environment for launching graphical applications.

We also provide cloud-native access through Kubernetes, enabling researchers to execute large-scale containerized computations seamlessly. 

Both Jupyter notebooks and Galaxy address the reproducibility requirement ("R" from FAIR) natively by recording the history of user's calculations, as well as interoperability ("I") to some extent (support of data format conversions, etc.). 

e-INFRA CZ gradually develops tools to address findability ("F", e.g. pilot of semi-automated metadata provisioning for molecular dynamics calculations). Accessibility ("A") is the principal goal of the current development of the National repository platform.

This complex approach positions our e-INFRA CZ infrastructure at the forefront of empowering  researchers through the synergy of advanced computing and innovative data storage services.