The Proceedings of the PASC Conference are published in the Association for Computing Machinery’s (ACM’s) Digital Library. In recognition of the high quality of the PASC Conference papers track, the ACM continues to provide the proceedings as an Open Table of Contents (OpenTOC). This means that the definitive versions of PASC Conference papers are available to everyone at no charge to the author and without any pay-wall constraints for readers.
The OpenTOC for the PASC Conference is hosted on the ACM’s SIGHPC website. PASC papers can be accessed for free at: www.sighpc.org/for-our-community/acm-open-tocs.
The following papers will be presented as talks at PASC26, and will be accessible on the OpenTOC library post-conference.
Accelerate Radiative Transfer Simulations for the Cosmic Epoch of Reionisation
The most time-consuming part of radiative transfer cosmological simulations for the Epoch of Reionisation is the ray-tracing algorithm. This involves computing the hydrogen column density along the path of the ionising photons. In this work, we present an updated version of pyC2Ray, a massively parallel ray-tracing and chemistry library that supersedes C2Ray which has been extensively employed in reionisation simulations and often requires millions of CPU-core hours across several thousand computing nodes on high-performance computers to simulate large-scale volumes. pyC2Ray implements the Accelerated Short-characteristics Octahedral RAy-tracing (ASORA) algorithm, a novel ray-tracing method designed explicitly for multi-source ray-tracing on GPUs. The algorithm is written in C++/CUDA and wrapped in a Python interface that enables easy, customised code usage without compromising computational efficiency. We demonstrate that the ASORA updates introduced in this work lead to a speed-up factor of 800 compared to C2Ray on a multi-core CPU. It is an indication that ray-tracing algorithms for cosmological simulations strongly benefit from GPU-acceleration and specialised software design development to achieve computational efficiency and physical accuracy.
Organizer(s): Michele Bianco (ETH Zurich), Tommaso Boschi (ETH Zurich), Christopher Bignamini (ETH Zurich / CSCS), Osman Seckin Simsek (University of Basel), Sebastian Keller (ETH Zurich / CSCS), Florina M. Ciorba (University of Basel), Yves Revaz (EPFL), and Alexandre Refregier (ETH Zurich)
Domain: Physics
Accelerating Electrostatically-Embedded Fragmentation Methods using Graphics Processing Units
Predicting the physicochemical properties of large molecular systems requires quantum chemistry methods that are both accurate and computationally scalable. Electrostatically embedded fragmentation approaches, such as the Fragment Molecular Orbital(FMO)method, have been highly successful in extending Hartree–Fock(HF)and post-HF theories to systems with thousands of atoms, but their iterative electrostatic potential(ESP)cycles and communication patterns pose challenges on modern heterogeneous architectures. In this work, we develop a distributed-memory, multi-GPU algorithms for electrostatically embedded fragmentation, targeting both FMO and the recently proposed Coulomb-Perturbed Fragmentation (CPF) method. CPF removes the expensive self-consistent ESP iterations at the monomer level by converging monomers once in vacuo and then using these fixed densities to construct the electrostatic embedding for all fragments. Our implementations in the Extreme-Scale Electronic Structure System(EXESS)feature GPU-accelerated HF and RI-MP2 kernels, specialised ERI kernels for ESP terms, a multi-layer dynamic load balancing scheme, and MPI Remote Memory Access (RMA) to efficiently distribute monomer densities across nodes. Accuracy is assessed for water hexamers, neutral molecular crystals, and ionic liquids at the HF and RI-MP2 levels, where CPF3 and FMO3 reproduce full-system energies with mean absolute deviations of only a few kJ,mol^{-1} and correctly recover subtle energetic orderings. Performance benchmarks on Gadi and Perlmutter demonstrate speedups of up to $\sim$6$\times$ over the parallel CPU FMO implementation in GAMESS on a single node, and strong scaling efficiencies approaching 90% on up to 128 GPU nodes. Overall, CPF emerges as a highly accurate and markedly more scalable alternative to FMO for large-scale electrostatically embedded quantum chemistry on GPU-accelerated supercomputers
Organizer(s): Fazeleh Kazemian (The Monash University, The Australian National University), Jorge Galvez-Vallejo (The Australian National University), and Giuseppe Barca (The Monash University, The Australian National University)
Domain: Chemistry and Materials
An Algorithm for Feature Extraction from Large Scale Meteorological Data Stores on Irregular Grids
In recent years, the volume of meteorological data has grown rapidly due to higher model resolutions and more frequent data output. To handle this increase efficiently, the European Centre for Medium-Range Weather Forecasts has developed new ways to extract data without accessing full meteorological fields through their feature extraction capabilities. This includes in particular the Polytope algorithm, which lets users extract time series or regional subsets directly from the data backends, greatly reducing data transfer and processing time. However, the current algorithm only works on structured iso-latitude grids such as octahedral or HEALPix grids. Many modern meteorological models, by contrast, use unstructured grids where points do not align along latitudes, such as Lambert conformal or ICON icosahedral grids. In this paper, we extend the Polytope algorithm to support these irregular grids. We first explain why the original approach was limited to structured data before introducing a new extraction method, which uses the well-studied quadtree data structure. As a popular data structure in geospatial applications, we focus on describing how quadtrees can be integrated within the Polytope feature extraction framework to handle complex grid geometries. We then assess the performance and scalability of this algorithm, which performs on par with state-of-the-art alternatives used in geospatial applications. In particular, the extended algorithm outperforms other methods when accessing large or complex regions and therefore offers a compelling alternative to current data extraction techniques for large-scale datasets. Finally, we conclude by discussing how this algorithm can be integrated into an operational data service.
Organizer(s): Mathilde Leuridan (ECMWF, University of Cologne), James Hawkes (ECMWF), Tiago Quintino (ECMWF), and Martin Schultz (Forschungszentrum Jülich, University of Cologne)
Domain: Climate, Weather and Earth Sciences
astroCAMP: A Co-design Framework for Sustainable Radio-Interferometric Imaging
The Square Kilometre Array (SKA) will operate one of the world’s largest continuous scientific data systems, sustaining petascale imaging under strict power envelopes. Yet current radio-interferometric pipelines typically achieve only 4–14\% of hardware peak because of memory and I/O bottlenecks, resulting in high energy, operational, and carbon costs. Progress is further constrained by the absence of standardised cross-layer metrics and survey-level fidelity tolerances for principled hardware–software co-design. We present astroCAMP, a reproducible benchmarking and co-design framework for SKA-scale imaging. astroCAMP contributes: (1) a unified metric suite spanning performance, utilisation, memory/data-movement behavior, sustainability, economics, and scientific fidelity; (2) standardised SKA-representative datasets, reference outputs, and benchmark configurations for reproducible cross-platform evaluation; (3) a multi-objective co-design formulation linking quality constraints to time-, energy-, carbon-, and cost-to-solution; and (4) a reproducible design-space exploration workflow to derive Pareto-optimal operating regions. We release datasets, scripts, benchmark results, and a reproducibility kit, and evaluate WSClean+IDG on an AMD EPYC 9334 CPU and an NVIDIA H100 GPU. The evaluation shows substantial end-to-end orchestration and synchronization bottlenecks despite efficient kernels in active phases, limited CPU strong scaling, and location-dependent carbon/cost efficiency under realistic grid and electricity-price assumptions. We further illustrate the use of astroCAMP for heterogeneous CPU–FPGA design-space exploration, and its potential to facilitate the identification of Pareto-optimal operating points for SKA-scale imaging deployments. Lastly, we call on the SKA community to define quantifiable fidelity metrics and thresholds to accelerate principled optimisation for SKA-scale imaging.
Organizer(s): Denisa-Andreea Constantinescu (EPFL), Rubén Rodríguez Álvarez (Embedded Systems Laboratory, EPFL), Jacques Morin (Univ Rennes, INSA Rennes), Etienne Orliac (SCITAS, EPFL), Mickaël Dardaillon (Univ Rennes, INSA Rennes), Sunrise Wang (Université Côte d’Azur, Côte d’Azur Observatory), Hugo Miomandre (Univ Rennes, INSA Rennes), Miguel Peón-Quirós (EcoCloud, EPFL), Jean-François Nezan (Univ Rennes, INSA Rennes), and David Atienza (Embedded Systems Laboratory, EPFL)
Domain: Computational Methods and Applied Mathematics
CelloAI: Leveraging Large Language Models for HPC Software Development in High Energy Physics
Next-generation High Energy Physics (HEP) experiments will generate unprecedented data volumes, necessitating High Performance Computing (HPC) integration alongside traditional high-throughput computing. However, HPC adoption in HEP is hindered by the challenge of porting legacy software to heterogeneous architectures and the sparse documentation of these complex scientific codebases. We present CelloAI, a locally hosted coding assistant that leverages Large Language Models with Retrieval-Augmented Generation to support High Energy Physics code documentation and generation. This local deployment ensures data privacy, eliminates recurring costs, and provides access to large context windows without external dependencies. CelloAI addresses code documentation and code generation through specialized components. For code documentation, the assistant provides: (a) Doxygen style comment generation by retrieving relevant information from text sources, (b) File-level summary generation, and (c) An interactive chatbot for code comprehension queries. For code generation, CelloAI employs syntax-aware chunking that preserve syntactic boundaries during embedding thus improving retrieval accuracy in large codebases. The system integrates callgraph knowledge to maintain dependency awareness during code modifications and provides AI-generated suggestions for performance optimization and accurate refactoring. Our results demonstrate that CelloAI can enhance code understanding and streamline certain development workflows, however domain expert oversight and validation is critical for reliable use of LLM-assistants in scientific computing contexts.
Organizer(s): Mohammad Atif (Brookhaven National Laboratory), Kriti Chopra (Brookhaven National Laboratory), Ozgur Kilic (Brookhaven National Laboratory), Tianle Wang (Brookhaven National Laboratory), Zhihua Dong (Brookhaven National Laboratory), Charles Leggett (Lawrence Berkeley National Laboratory (LBNL)), Meifeng Lin (Brookhaven National Laboratory), Paolo Calafiura (Lawrence Berkeley National Laboratory (LBNL)), and Salman Habib (Argonne National Laboratory (ANL))
Domain: Physics
Characterizing the Usefulness of Code Review Comments in Scientific Software for Software Quality and Scientific Rigor
<em>Context:</em> Innovation thrives on scientific software, with useful code review feedback enhancing its correctness and impact. However, unlike general-purpose commercial and open-source software, the usefulness of code review feedback (<em>CR Comments</em>) in scientific software remains largely unstudied. Objective: This paper aims to characterize the usefulness of CR Comments in scientific open-source software (Sci-OSS), leveraging existing research on useful <em>CR Comments</em>. Method: To achieve this objective, we mine successful Sci-OSS from GitHub, analyze their <em>CR Comments</em> with usefulness-related features, and compare the findings from prior research on general-purpose commercial and open-source <em>CR Comments</em>. <em>Results</em>: The investigation on the usefulness of <em>CR Comments</em> in Sci-OSS confirms many characteristics that prior research identified in general-purpose software. For example, subjective or negative <em>CR Comments</em> remain not useful for the Sci-OSS. We also find CR Comments which receive negative emoji reactions have a very small correlation with not useful <em>comments</em>, whereas the positive emojis show mixed correlations.<br>Importantly, 6-33% <em>CR Comments</em> in Sci-OSS are not useful in our mined repositories. <em>Conclusions:</em> Our investigation into Sci-OSS extends findings from <em>CR Comments</em> usefulness research on general-purpose software, benefiting developers, scientists, and researchers in the Sci-OSS community.
Organizer(s): Sharif Ahmed (University of Central Arkansas), and Nasir Eisty (University of Tennessee)
Domain: Computational Methods and Applied Mathematics
Combining Domain and Tensor Parallelism to Train Multi-Billion-Parameter AI Weather Models
AI-based methods have recently revolutionized atmospheric modeling. Successes in medium-range forecasting have led to rapid developments towards AI-based models. However, accurate modeling of complex atmospheric dynamics at high spatial resolutions requires billions of neural network parameters and gigabyte-sized data samples, making accelerator memory and I/O-bandwidth the bottlenecks for model training. To overcome these limitations, we introduce Jigsaw, a distributed training and inference scheme that leverages domain and tensor parallelism to eliminate memory redundancy across model-parallel processes and reduce I/O demands. We apply the Jigsaw parallelization scheme into an MLP-Mixer architecture, WeatherMixer, a multi-layer-perceptron-based model with global vision that is well-suited for learning weather phenomena. Using Jigsaw, we train WeatherMixer with up to 3.2B-parameters, achieving predictive performance competitive with numerical weather prediction and state-of-the-art AI models. To highlight the computational performance, we perform scaling experiments on global 0.25° (≈ 30 km resolution) ERA5 data across two HPC systems. Anticipating that future reanalysis datasets will include even higher resolutions, we demonstrate, for the first time, training on 0.125° data. Scaling experiments demonstrate that the dataloading bottlenecks arising from high-resolution input data samples are reduced through domain parallelism, subsequently improving per-GPU computational throughput. In compute–communication–limited regimes, Jigsaw achieves state-of-the-art performance in distributed model training, with 97% of theoretical peak performance on 4 GPUs; and a strong scaling speedup of 6.4 when training across 8 GPUs. By combining domain, tensor, and data parallelism at larger scales, training on 256 GPUs reaches 11 PFLOPs with a scaling efficiency of 72% compared to 51% without Jigsaw.
Organizer(s): Deifilia Kieckhefen (Karlsruhe Institute of Technology (KIT)), Markus Götz (Karlsruhe Institute of Technology, Helmholtz AI), Lars Helge Heyen (Karlsruhe Institute of Technology (KIT)), Achim Streit (Karlsruhe Institute of Technology (KIT)), and Charlotte Debus (Karlsruhe Institute of Technology (KIT))
Domain: Computational Methods and Applied Mathematics
A Comparison of Massively Parallel Performance Portable Particle-in-Cell Schemes for Electrostatic Kinetic Plasma Simulations
We compare different Poisson solvers within the context of an electrostatic Vlasov-Poisson system. These schemes are implemented as part of the IPPL (Independent Parallel Particle Layer) library, which provides performance portable and dimension independent building blocks for scientific simulations requiring particle-mesh methods, with Eulerian (mesh-based) and Lagrangian (particle-based) approaches. The simulation used to compare the performance and portability of the schemes is Landau damping, part of a set of mini-applications implemented to benchmark and showcase the capabilities of the IPPL library. We use grid-sizes of $512^3$ and $1024^3$ with 8 particles per cell, running with different algorithms in the solve phase of the Particle-in-Cell (PIC) loop: a fast Fourier transform (FFT) pseudo-spectral solver, a matrix-free finite difference Conjugate Gradient solver, and a matrix-free Finite Element (FEM) solver. We also compare these PIC schemes to the novel Particle-in-Fourier (PIF) scheme, which performs interpolations using non-uniform FFTs thereby avoiding a grid in the real space. We obtain results on different computing architectures, such as AMD GPUs (LUMI at CSC), and Nvidia GPUs (Alps at CSCS and JUWELS Booster at Jülich Supercomputing Center), showcasing portability. In terms of absolute time the FFT solver is advantageous, but is limited in its applicability. All other field solvers in the PIC scheme are an order-of-magnitude slower, but scale similarly to the FFT case in the electrostatic PIC context. The PIF scheme serves as a high fidelity alternative to standard PIC, while costlier than the FFT-based PIC scheme, it shows excellent scalability on all the architectures.
Organizer(s): Sonali Mayani (Paul Scherrer Institute, ETH Zurich), Paul Fischill (ETH Zurich), Sriramkrishnan Muralikrishnan (Forschungszentrum Jülich), and Andreas Adelmann (Paul Scherrer Institute, ETH Zurich)
Domain: Physics
Contiguous Storage of Grid Data for Heterogeneous Computing
Structured Cartesian grids are a fundamental component in numerical simulations. Although these grids facilitate straightforward discretization schemes, their na\”{i}ve use in sparse domains leads to excessive memory overhead and inefficient computation. Existing frameworks address are primarily optimized for CPU execution and exhibit performance bottlenecks on GPU architectures due to limited parallelism and high memory access latency. This work presents a redesigned storage architecture optimized for GPU compatibility and efficient execution across heterogeneous platforms. By abstracting low-level GPU-specific details and adopting a unified programming model based on SYCL, the proposed data structure enables seamless integration across host and device environments. This architecture simplifies GPU programming for end-users while improving scalability and portability in sparse-grid and gird-particle coupling numerical simulations.
Organizer(s): Xiangyu Hu (Technical University of Munich)
Domain: Engineering
Cross-Platform GPU Implementation of OpenFOAM Using Only ISO C++ Standard Parallelism.
In this paper we present our port of OpenFOAM to GPUs using the C++ standard parallelism execution (stdpar hereafter) introduced with ISO C++17. With a very low intrusive approach, which mainly consists in replacing the serial loops with the stdpar programming model, we managed to offload to multicore and many-core architectures the full workload occurring during typical CFD simulations. This approach is vendor agnostic and allows us to retain a single version of the code which can be easily integrated into the main release. Results are presented using the icoFoam and simpleFoam solvers for the following four different test cases: the 3D lid-driven cavity, the 3D conical diffuser, the HPC motorbike and the drivAer automotive test case. We tested using different NVIDIA and AMD architectures, included CPU only, hybrid CPU-GPU and unified memory CPU-GPU hardware. Speedup versus a full socket 32-core CPU ranges from 0.5x to 18x, according to the complexity of boundary conditions, turbulence model and solver type. Details on the porting and performance are given.
Organizer(s): Mayank Kumar (STFC), Jony Castagna (STFC), Mattijs Janssens (Keysight (OpenCFD)), Raynold Tan (STFC), Wendi Liu (STFC), Gavin Tabor (University of Exeter), and Liam Berrisford (University of Exeter)
Domain: Engineering
Customized Precision for Discontinuous Galerkin Methods Using Adaptive Spectral Block Floating Point
Discontinuous Galerkin (DG) methods offer high-order accuracy and geometric flexibility, but come with significant memory demands for storing degrees of freedom of the numerical solution — this remains a major performance bottleneck for large-scale simulations. Building on prior work introducing a 64-bit Adaptive Spectral Block Floating Point (ASBFP) format for modal 1D DG discretizations, we develop a more general framework that supports \emph{arbitrary polynomial order} and \emph{arbitrary bit-width allocations}. The extended ASBFP design constructs shared- and biased-exponent structures tailored to exploit the spectral decay of solution coefficients in modal DG bases, enabling fine-grained control over precision while providing both reduced- and extended-precision representations within a unified encoding model. Numerical tests in one dimension show that the generalized ASBFP format maintains the expected accuracy and convergence behaviour while substantially reducing the memory footprint across a wide range of DG orders.We further extend the ASBFP methodology to multi-dimensional DG discretization based on tensor-product polynomial spaces. By identifying patterns in the decay of modal coefficients for multidimensional tensor-product bases and encoding hierarchical exponent offsets accordingly, this tensor-product-aware scheme enables more aggressive compression while maintaining numerical fidelity comparable to FP64 baselines. Together, these developments provide a flexible family of degree-aware spectral block floating-point formats for high-order DG methods in one and multiple dimensions.
Organizer(s): Shivam Sundriyal (University of Bayreuth), Markus Büttner (University of Bayreuth), Tobias Kenter (Paderborn University), and Vadym Aizinger (University of Bayreuth)
Domain: Computational Methods and Applied Mathematics
Exploring Sustainability in Scientific Software through Code Quality & Test Coverage Metrics
<em>Context:</em> Scientific open-source software (SciOSS) plays a foundational role in research and engineering, yet its long-term sustainability has often been overlooked and remains a significant concern. <em>Objective:</em> This study investigates the long-term sustainability of SciOSS through code and test quality metrics. <em>Method:</em> We analyze CASS Software Portfolio projects, classifying them by sustainability and comparing their code structure, test coverage, and links between code quality and testing across the dataset. <em>Results:</em> Sustainable projects show higher, more consistent test coverage and clearer code–test correlations, while unsustainable ones show weaker patterns. Overall, test coverage is low in scientific software, and high complexity and coupling reduce testability. <em>Conclusion:</em> In this study, we present a practical, data-driven approach for assessing sustainability in scientific software, offering a foundation for evaluating long-term software health and supporting future efforts in quality assurance and sustainability monitoring.
Organizer(s): Sheikh Md. Mushfiqur Rahman (University of Tennessee), Gregory Watson (Oak Ridge National Laboratory), and Nasir Eisty (University of Tennessee)
Domain: Computational Methods and Applied Mathematics
GPU Halo Replay: Lossless Twin Simulations for Flexible In Situ Analysis of Stencil-Based Solvers
We introduce GPU Halo Replay, a solver‐aware in situ framework for stencil‐based applications that creates a lossless simulation “twin” for advanced visualization and analysis. By decoupling analysis from the primary simulation and delegating it to dedicated twin simulations, GPU Halo Replay overcomes the synchronization and fidelity limitations of traditional post hoc and in situ methods. We demonstrate the functionality and feasibility of GPU Halo Replay on the computational fluid dynamics code HARVEY, a massively parallel solver. Compared to existing in situ approaches, it is co-designed with stencil solvers and exposes new analysis patterns. To meet the diverse needs of real-world applications, we developed GPU Halo Replay variants tailored to common usage scenarios. We illustrate the adaptability of these variants across three representative case studies: concurrent simulation replay, spatially-targeted subdomain analysis, and temporally selective domain evaluation. These results illustrate the potential of simulation twins to enhance analysis capabilities without sacrificing performance or fidelity.
Organizer(s): Ayman Yousef (Duke University), Aristotle Martin (Duke University), and Amanda Randles (Duke University)
Domain: Engineering
A High-Performance Elliptic Solver for Plasma Boundary Turbulence Codes
Elliptic equations play a crucial role in turbulence models for magnetic confinement fusion. Regardless of the chosen modeling approach – whether gyrokinetic, gyrofluid, or drift-fluid – the Poisson equation and Ampere’s law lead to elliptic problems that must be solved on 2D planes perpendicular to the magnetic field. In this work, we present an efficient solver for such generalised elliptic problems, especially suited for the conditions in the boundary region. A finite difference discretisation is employed, and the solver is based on a flexible generalised minimal residual method (fGMRES) with a geometric multigrid preconditioner. We present implementations with OpenMP parallelisation and GPU acceleration, with backends in CUDA and HIP. On the node level, significant speed-ups are achieved with the GPU implementation, exceeding external library solutions such as rocALUTION. In accordance with theoretical scaling laws for multigrid methods, we observe linear scaling of the solver with problem size, O(N). This solver is implemented in the PARALLAX/PAccX libraries and serves as a central component of the plasma boundary turbulence codes GRILLIX and GENE-X
Organizer(s): Andreas Stegmeir (Max Planck Institute for Plasma Physics), Cristian Lalescu (Max Planck Computing and Data Facility), Mou Lin (Max Planck Computing and Data Facility), Jordy Trilaksono (Max Planck Institute for Plasma Physics), Nicola Varini (EPFL), and Tilman Dannert (Max Planck Computing and Data Facility)
Domain: Physics
Inline Lossy Compressed MPI Communications in GPU-based Plasma Turbulence Simulations
The performance gap between computing power and data movement bandwidth across the HPC hardware stack is one of the primary obstacles to application scaling at exascale. Fusion Plasma simulations are particularly affected by this due to their high-dimensional phase-space representation and communication requirements. This work explores the use of agnostic lossy compression inline with MPI communications as a way to reduce network contention and improve application performance. By striking a balance between compressor throughput, ratio, and fidelity preservation, communication-bound HPC applications could potentially benefit from this data reduction technique. We develop a set of performance models that help determine whether compression libraries provide enough throughput for inline usage. We then evaluate three state-of-the-art lossy compressors with the proposed models: ZFP, SZ and MGARD. Additionally, we illustrate the process of compressing MPI boundary exchanges and analyze the impact of applying lossy reduction in-simulation to target Quantities of Interest in two plasma turbulence applications: GENE and BSL6D. Our main findings show that on both applications, message sizes can be reduced by a factor of 8 while still providing sound physics information and achieving up to 3X and 6X time improvements in boundary exchanges for GENE and BSL6D, respectively.
Organizer(s): Diego Jimenez (Max Planck Computing and Data Facility), Felix Jung (Technical University of Munich), Nils Schild (Max Planck Institute for Plasma Physics), Jan Laukemann (Friedrich-Alexander-Universität Erlangen-Nürnberg), Carl-Martin Pfeiler (Max Planck Institute for Plasma Physics), Tilman Dannert (Max Planck Computing and Data Facility), Martin Schulz (Technical University of Munich), and Erwin Laure (Max Planck Computing and Data Facility)
Domain: Computational Methods and Applied Mathematics
KBase Research Agent: Automated Multi-Agent Workflow Construction for Reproducible Genome Analysis
Constructing multi-step bioinformatics workflows, from read quality control through genome assembly to functional annotation, requires expertise in both biology and computational tool selection, creating a bottleneck for scalable and reproducible analysis. We present the KBase Research Agent, a multi-agent system for automating such workflows within the DOE Systems Biology Knowledgebase (KBase). Given a set of sequencing reads and a research objective, the agent constructs an analysis plan grounded in KBase documentation and a Knowledge Graph (KG) of the KBase application catalog, then selects, parameterizes, validates and executes appropriate KBase applications to carry out the workflow. The resulting analysis is preserved as a reproducible KBase Narrative. We evaluate the system’s planning and execution quality against ground truth constructed from reference workflows derived from peer-reviewed Microbiology Resource Announcements. We further apply the agent to 100 previously unanalyzed bacterial isolate genomes from the JGI IMG/M database, where it autonomously performed read quality control, genome assembly, taxonomic classification with GTDB-Tk, and downstream analysis producing annotated genomes, reproducible Narratives, and draft manuscripts without human intervention. Across these experiments, the KBase Research Agent demonstrates the feasibility of domain-grounded, end-to-end scientific workflow automation in a production bioinformatics platform.
Organizer(s): Prachi Gupta (Lawrence Berkeley National Laboratory), William Riehl (Lawrence Berkeley National Laboratory), Mikaela Cashman (Lawrence Berkeley National Laboratory), Dylan Chivian (Lawrence Berkeley National Laboratory), Christopher Neely (Lawrence Berkeley National Laboratory), Shane Canon (Lawrence Berkeley National Laboratory), Robert Cottingham (Oak Ridge National Laboratory), Chris Henry (Argonne National Laboratory), Adam Arkin (Lawrence Berkeley National Laboratory), and Paramvir Dehal (Lawrence Berkeley National Laboratory)
Domain: Engineering
Kokkos Comm: Performance Portable Communication for Distributed Kokkos Applications
This paper introduces Kokkos Comm, a new library and API specification for enhanced performance, portability, and productivity. Kokkos is a widely used C++ performance portability ecosystem that addresses performance portability on-node through advanced C++ metaprogramming, including GPU and OpenMP targeting support. Kokkos Comm addresses challenges of integrating Kokkos with distributed memory programming models. Kokkos Comm alleviates accidental complexity associated with coordinating non-blocking communication operations and \Kokkos{} execution spaces, as well as handling non-contiguous Kokkos::Views. Automation of common low-level, error-prone implementation details, such as packing and unpacking of non-contiguous data, increases programmer productivity while decreasing code complexity with the added potential for higher performance (e.g., vs. MPI derived datatypes). Further, this library serves as a platform for researching improved methods for managing non-contiguous data and exploring new communication APIs with performance portability across various underlying transports and accelerators. For instance, Kokkos Comm enables use of varied internal implementations of the data transfer functionality (e.g., MPI RMA, NCCL, SHMEM dialects, libfabric), while maintaining overall support for MPI elsewhere in an application (e.g., ScaLAPACK).
Organizer(s): C. Nicole Avans (Sandia National Laboratories, Tennessee Technological University), Gabriel Dos Santos (CEA, Université Paris-Saclay), Cédric Chevalier (CEA, Université Paris-Saclay), Hugo Taboada (CEA, Université Paris-Saclay), Marc Pérache (CEA, Université Paris-Saclay), Matthew G. F. Dosanjh (Sandia National Laboratories), Stephen L. Olivier (Sandia National Laboratories), Carl Pearson (Sandia National Laboratories), Evan D. Suggs (Tennessee Technological University), Vivek Kale (Sandia National Laboratories), and Anthony Skjellum (Tennessee Technological University)
Domain: Computational Methods and Applied Mathematics
The Memory Scaling of Reverse-Mode Differentiation in Particle Accelerator Simulations with Space Charge
The recent development of \emph{differentiable} simulation codes for particle accelerators has enabled gradient-based workflows that promise finer control and more realistic modeling of accelerator facilities. However, when using reverse-mode automatic differentiation, the memory usage continuously increases during the simulation, and can potentially exceed the available hardware memory — especially when costly space charge computation is included. To study the memory requirements for differentiable simulations, we have implemented space charge in Cheetah, a PyTorch-based beam tracking code that supports reverse-mode differentiation. We find that the memory usage for reverse-mode differentiation grows linearly with the number of macroparticles and cells, and that it is proportional to the number of space charge kicks involved in the simulation. This general scaling can be used to evaluate whether a given differentiable simulation is feasible given hardware memory constraints.
Organizer(s): Arjun Dhamrait (Lawrence Berkeley National Laboratory), Edoardo Zoni (Lawrence Berkeley National Laboratory), Axel Huebl (Lawrence Berkeley National Laboratory), Ji Qiang (Lawrence Berkeley National Laboratory), Chad Mitchell (Lawrence Berkeley National Laboratory), Ryan Roussel (SLAC), Jan Kaiser (DESY), Chenran Xu (Argonne National Laboratory), Jean-Luc Vay (Lawrence Berkeley National Laboratory), and Remi Lehe (Lawrence Berkeley National Laboratory)
Domain: Physics
Multi-Artifact Analysis of Self-Admitted Technical Debt in Scientific Software
Context: Self-admitted technical debt (SATD) occurs when developers acknowledge shortcuts in code. In scientific software (SSW), such debt poses unique risks to the validity and reproducibility of results. Objective: This study aims to identify, categorize, and evaluate scientific debt, a specialized form of SATD in SSW, and assess the extent to which traditional SATD categories capture these domain-specific issues. Method: We conduct a multi-artifact analysis across code comments, commit messages, pull requests, and issue trackers from 23 open-source SSW projects. We construct and validate a curated dataset of scientific debt, develop a multi-source SATD classifier to guide SATD management, and conduct a practitioner validation to assess the practical relevance of scientific debt. Results: Our classifier performs strongly across 900,358 artifacts from 23 SSW projects. SATD is most prevalent in pull requests and issue trackers, underscoring the value of multi-artifact analysis. Models trained on traditional SATD often miss scientific debt, emphasizing the need for its explicit detection in SSW. Practitioner validation confirmed that scientific debt is both recognizable and useful in practice. Conclusions: Scientific debt represents a unique form of SATD in SSW that that is not adequately captured by traditional categories and requires specialized identification and management. Our dataset, classification analysis, and practitioner validation results provide the first formal multi-artifact perspective on scientific debt, highlighting the need for tailored SATD detection approaches in SSW.
Organizer(s): Eric Melin (Boise State University, Oak Ridge National Laboratory), Nasir Eisty (University of Tennessee), Gregory Watson (Oak Ridge National Laboratory), and Addi Malviya-Thakur (Oak Ridge National Laboratory, University of Tennessee)
Domain: Computational Methods and Applied Mathematics
Multi-Device Shallow Water Simulations on CPUs, GPUs, and FPGAs with SYCL
Shallow water models are an essential tool for simulating tsunamis and storm surges, where they need to efficiently execute for different spatial resolutions and time scales. In this work, we present a discontinuous Galerkin shallow water solver implemented in SYCL providing a common numerical code base suitable for portable and scalable multi-device execution on CPUs, GPUs, and FPGAs. To this end, different communication strategies are adapted to the available device capabilities. The implementation is validated using a range of Mediterranean Sea meshes of increasing resolution. In a challenging strong scaling scenario, FPGAs reach the highest aggregate performance. GPUs by AMD, Intel, and NVIDIA from three different clusters can play out their superior peak performance in a weak scaling benchmark and achieve a parallel efficiency of up to 0.8-0.9 on 64 GPUs.
Organizer(s): Markus Büttner (University of Bayreuth), Christoph Alt (Paderborn University, Friedrich-Alexander-Universität Erlangen-Nürnberg), Tobias Kenter (Paderborn University), Harald Köstler (Friedrich-Alexander-Universität Erlangen-Nürnberg), Christian Plessl (Paderborn University), and Vadym Aizinger (University of Bayreuth)
Domain: Computational Methods and Applied Mathematics
OpenMP Target Offloading for Hybrid Fluid–Kinetic Plasma Simulations in JOREK: Accelerating Fusion Research on GPU Enabled Clusters
The study of plasma instabilities in magnetic confinement fusion devices is crucial for the design and operation of future fusion power plants. Numerical simulations play a key role in understanding these complex physical phenomena. The finite element code JOREK implements magnetohydrodynamic (MHD) and hybrid fluid–kinetic models to simulate these instabilities and explore potential mitigation strategies. With the increasing prevalence of GPU-accelerated architectures in high-performance computing, adapting JOREK to efficiently exploit these systems has become essential. The complexity of the code presents significant challenges due to its use of high-order finite elements, various physics models, including a kinetic description of runaway electrons (REs) and the fully implicit time stepping required by extreme scale separations in fusion plasmas. We have ported the matrix construction, the iterative solver, and the particle loop for the runaway-electron kinetic model to GPUs using OpenMP target offloading and optimized GPU libraries, achieving good performance and scaling across multiple nodes. These developments mark an important step, enabling large-scale, high-fidelity hybrid fluid–kinetic simulations of tokamak plasmas on GPU-accelerated clusters, accelerating progress toward predictive, numerical, physics-based fusion reactor design.
Organizer(s): Patrik Rác (Max Planck Institute for Plasma Physics), Edoardo Carrà (Max Planck Institute for Plasma Physics), Ihor Holod (Max Planck Institute for Plasma Physics), and Matthias Hölzl (Max Planck Institute for Plasma Physics)
Domain: Physics
A Performance-Portable, Massively Parallel Distributed Nonuniform FFT
The nonuniform fast Fourier transform (NUFFT) enables spectral methods for problems with irregularly spaced samples, with applications in medical imaging, molecular dynamics, and kinetic plasma simulations. Existing implementations are limited to shared memory execution, restricting problem sizes to what fits on a single node. We present the first distributed, performance-portable NUFFT for heterogeneous supercomputers. Our Kokkos-based implementation runs without modification on NVIDIA and AMD GPUs. We develop multiple spreading and interpolation kernels optimized for different accuracy requirements and architectures. Our spreading kernels match or exceed the single-GPU throughput of the state-of-the-art CUDA based NUFFT library cuFINUFFT at production particle densities, while our Kokkos-based implementation additionally supports AMD GPUs. Strong scaling experiments on Alps (NVIDIA GH200), JUWELS Booster (NVIDIA A100) and LUMI (AMD MI250X) demonstrate scaling up to 1024 GPUs. At scale, the distributed FFT is a significant part of the total runtime, making higher NUFFT accuracy less expensive. We apply the method to massively parallel Particle-in-Fourier simulations of Landau damping with up to 1024^3 Fourier modes and 8.6 billion particles on Alps, JUWELS, and LUMI, demonstrating that distributed NUFFTs enable kinetic plasma simulations at resolutions previously inaccessible to spectral particle methods.
Organizer(s): Paul Fischill (ETH Zurich), Andreas Adelmann (Paul Scherrer Institute, ETH Zurich), and Sriramkrishnan Muralikrishnan (Forschungszentrum Jülich)
Domain: Computational Methods and Applied Mathematics
Physics-Aware Multi-Task Learning for Atmospheric Turbulence Parameterization: Auxiliary Tasks versus Architectural Conditioning
Dynamic subgrid-scale (SGS) turbulence parameterizations in Large Eddy Simulation (LES) achieve superior physical fidelity but impose 2–4× computational overhead compared to static schemes, creating a critical bottleneck for high-resolution atmospheric modeling on HPC systems. Neural network based emulation offers a pathway to comparable accuracy at reduced computational cost, but realizing this potential requires architectures that generalize reliably across diverse atmospheric conditions and variable grid configurations.<br>We systematically compare two physics-aware multi-task learning strategies for emulating Smagorinsky-based SGS closure in the UK Met Office NERC Cloud Model (MONC): a baseline approach using Richardson number prediction as auxiliary gradient regularization, and an Ri-conditioned approach that explicitly feeds predicted stability into coefficient (viscosity and diffusion) prediction heads. Evaluating 54 model configurations across three neural architectures<br>(multi-layer perceptron (MLP), MLP with residual blocks (ResMLP) and Tabular Transformer (TabTransformer)) trained on mixed-resolution, multi-regime atmospheric data (66% coarse tropical<br>convection, 34% fine shallow cumulus), we find that uncertainty-based task weighting consistently outperforms manual tuning and dynamic weighting alternatives. The simple MLPs with Richardson<br>conditioning provide the best robustness-accuracy trade-off under distribution shift during inference, and the architectural complexity amplifies cross-regime failures despite improving in-distribution metrics. Notably, models maintain physical constraint compliance even when predictive accuracy degrades substantially, suggesting that the data coverage limitations, rather than any fundamental physics incompatibility, drive the cross-regime transfer failures.<br>All results represent offline validation on static simulation data. Ongoing work focuses on online MONC integration to assess numerical stability, energy conservation, and computational performance under coupled feedback dynamics.
Organizer(s): Sambit Kumar Panda (University of Reading), Todd R. Jones (University of Reading), Muhammad Shahzad (University of Reading), Bryan N. Lawrence (University of Reading, National Centre for Atmospheric Science), and Anna-Louise Ellis (Met Office)
Domain: Climate, Weather and Earth Sciences
Porting the code_saturne CFD Solver to GPU: Methodology, Feedback, and Insights
This paper describes recent developments in the open source general purpose CFD solver code_saturne, to enable execution of its main code paths on GPU, while keeping the code base sustainable, limiting non-portable sections to a few performance-critical kernels and using portable constructs for the bulk of the code.<br>Developed since 1997, and originally written in Fortran, with pre- and post processing stages and MPI parallelism introducing C code, most of \CS\ had been migrated to C over the years. <br>Based on a co-located finite volume approach for unstructured meshes and with domain partitioning, the numerical approach allows for a broad scope of applications and can handle multi-billion cell meshes, at the cost of low computational intensity and synchronization points, whether for dot products or halo exchange.<br><br>For the bulk of the code, priority is given to ease and productivity of code adaptation over optimization, so as to minimize memory transfers. A simple migration of the initial C-based code to C++ allows for the replacement of traditional loops with parallel_for constructs which can use different back-ends, encapsulating the previous OpenMP constructs on CPU while using CUDA or SYCL on GPU.<br><br>Profiling has been essential to avoid performance pitfalls related to unified shared memory and GPU memory allocation. The solutions considered and their trade offs are described here, focusing on the memory handling tweaks that have been essential to improving performance.<br><br>Some results are presented for the case of the flow in a tube bundle, showing good performance on several GPU machines.
Organizer(s): Yvan Fournier (EDF France), and Charles Moulinec (STFC)
Domain: Climate, Weather and Earth Sciences
Sampling Parallelism for Fast and Efficient Bayesian Learning
Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantification of predictive uncertainty is essential. However, many uncertainty quantification (UQ) methods remain difficult to apply due to their substantial computational cost. Sampling-based Bayesian learning approaches, such as Bayesian neural networks (BNNs), are particularly expensive since drawing and evaluating multiple parameter samples rapidly exhausts memory and compute resources. These constraints have limited the accessibility and exploration of Bayesian techniques thus far.<br>To address these challenges, we introduce sampling parallelism, a simple yet powerful parallelization strategy that targets the primary bottleneck of sampling-based Bayesian learning: the samples themselves. By distributing sample evaluations across multiple GPUs, our method reduces memory pressure and training time without requiring architectural changes or extensive hyperparameter tuning. We detail the methodology and evaluate its performance on a few example tasks and architectures, comparing against distributed data parallelism (DDP) as a baseline. We further demonstrate that sampling parallelism is complementary to existing strategies by implementing a hybrid approach that combines sample and data parallelism.<br>Our experiments show near-perfect weak scaling, confirming that sample evaluations parallelize cleanly. Although DDP achieves better raw speedups under strong scaling, sampling parallelism has a notable advantage: by applying independent stochastic augmentations to the same batch on each GPU, it increases augmentation diversity and thus reduces the number of epochs required for convergence.
Organizer(s): Asena Karolin Özdemir (Karlsruhe Institute of Technology), Lars Helge Heyen (Karlsruhe Institute of Technology), Arvid Weyrauch (Karlsruhe Institute of Technology), Achim Streit (Karlsruhe Institute of Technology), Markus Götz (Karlsruhe Institute of Technology, Helmholtz AI), and Charlotte Debus (Karlsruhe Institute of Technology)
Domain: Computational Methods and Applied Mathematics
Scalable Agentic Reasoning for Designing Biologics Targeting Intrinsically Disordered Proteins
Intrinsically disordered proteins (IDPs) represent crucial therapeutic targets due to their significant role in disease–approximately 80% of cancer-related proteins contain long disordered regions – but their lack of stable secondary/tertiary structures makes them“undruggable.” While recent computational advances, such as diffusion models, can design high-affinity IDP binders, autonomous systems can accelerate their translation to practical drug discovery by reducing the need for expert intervention across large-scale design campaigns. To address this challenge, we designed and implemented StructBioReasoner, a scalable multi-agent system for designing biologics that can be used to target both IDPs and structured proteins. StructBioReasoner employs a novel tournament-based reasoning framework where specialized agents compete to generate and refine therapeutic hypotheses, naturally distributing computational load for efficient exploration of the vast design space. Agents integrate domain knowledge with access to literature synthesis, AI-structure prediction, molecular simulations, and stability analysis, coordinating their execution on HPC infrastructure via an extensible federated agentic middleware, Academy. We benchmark StructBioReasoner across Der f 21 and NMNAT-2 and demonstrate that over 50% of 787 designed and validated candidates for Der f 21 outperformed the human-designed reference binders from literature, in terms of improved in silico binding free energy. For the more challenging NMNAT-2 protein, we identified three binding modes from 97,066 binders, including the well-studied NMNAT2:p53 interface. Thus, StructBioReasoner lays the groundwork for agentic reasoning systems for IDP therapeutic discovery on Exascale platforms.
Organizer(s): Matthew Sinclair (Argonne National Laboratory), Moeen Meigooni (Argonne National Laboratory), Archit Vasan (Argonne National Laboratory), Ozan Gokdemir (University of Chicago), Xinran Lian (Argonne National Laboratory), Heng Ma (Argonne National Laboratory), Yadu Babuji (Argonne National Laboratory), Alexander Brace (University of Chicago, Argonne National Laboratory), Khalid Hossain (Argonne National Laboratory), Carlo Siebenschuh (University of Chicago), Thomas Brettin (Argonne National Laboratory), Kyle Chard (University of Chicago), Christopher Henry (Argonne National Laboratory), Daniel Schabacker (Argonne National Laboratory), Venkatram Vishwanath (Argonne National Laboratory), Rick Stevens (Argonne National Laboratory, University of Chicago), Ian Foster (Argonne National Laboratory, University of Chicago), and Arvind Ramanathan (Argonne National Laboratory)
Domain: Life Sciences
Serverless Computing for Life-Critical Science: Design Patterns and Co-Design Insights from a Real-Time Earthquake Loss Alert System
Urgent scientific computing for disaster response requires both sub-minute latency and constant availability, yet traditional always-on infrastructure is uneconomical for rare, unpredictable events. We present a serverless architecture for real-time earthquake loss estimation that achieves 2-minute end-to-end alerts while remaining idle 99.99% of the time. Our system, deployed as QLARM version 4 within the Horizon Europe project GOBEYOND, demonstrates that serverless computing can meet the demands of latency-critical scientific workflows while maintaining pay-per-use economics. We identify four design patterns that emerged from adapting urgent earthquake loss estimation to serverless constraints: custom orchestration for workflow coordination, selective containerization that balances startup time with computational capability, warm-up strategies to mitigate cold starts, and infrastructure-as-code (IaC) for reproducibility and auditability. Performance measurements show over 100x speedup for large earthquakes compared to the previous generation, with calculation times ranging from 6 seconds for small events to 22 seconds for major disasters affecting thousands of settlements. We include an ongoing investigation on workload partitioning for further latency reduction. Our experience reveals both strengths and limitations of serverless for scientific computing. Pay-per-use economics and IaC prove well-suited to urgent computing, while opaque container lifecycle management and coarse resource allocation present challenges. We discuss these findings as potential insights for HPC co-design, particularly for systems requiring instant availability with minimal idle costs.
Organizer(s): Michel Speiser (ICES Foundation)
Domain: Climate, Weather and Earth Sciences
Solver-Integrated Lossy and Lossless Compression for Scalable Flow Simulations
Large-scale computational fluid dynamics (CFD) simulations routinely generate terabytes of data, making I/O and storage a dominant bottleneck for post-hoc analysis and data-driven workflows. This challenge is amplified on modern GPU-accelerated systems, where the cost of data movement and checkpointing can rival or exceed computation. We present a solver-integrated compression framework for high-order CFD that combines a portable compressed data representation, an embarrassingly parallel I/O pipeline, and a discretization-aware analysis of error propagation. The approach is implemented in the spectral element solver \texttt{nekRS} using Blosc2 for lossless compression and SZ3 for error-bounded lossy compression, while remaining applicable to a broader class of high-order discretizations. A central contribution is a quantity-of-interest (QoI)-aware analysis that accounts for spectral element interpolation, differentiation, and geometric mappings. This framework provides practical guidance for selecting compression tolerances based on workflow requirements, distinguishing between interpolation-dominated, derivative-sensitive, and projection-based QoIs. We evaluate the approach across representative workflows: (i) jet-in-crossflow simulations for visualization, (ii) turbulent channel flow for statistical analysis, (iii) reduced-order modeling (ROM), and (iv) graph neural network (GNN) training. Results show that moderate lossy compression achieves substantial data reduction and I/O speedups while preserving key QoIs, whereas derivative-based quantities require stricter tolerances consistent with the proposed analysis.These findings demonstrate that compression can be used in a principled, workflow-aware manner, enabling scalable and portable data management for exascale CFD and data-driven scientific computing.
Organizer(s): Viral Sudip Shah (University of Illinois Urbana-Champaign), Harikrishna Tummalapalli (Argonne National Laboratory), Shivam Barwey (University of Notre Dame), Riccardo Balin (Argonne National Laboratory), Ramesh Balakrishnan (Argonne National Laboratory), Paul Fischer (University of Illinois Urbana-Champaign, Argonne National Laboratory), Sheng Di (Argonne National Laboratory), and Franck Cappello (Argonne National Laboratory)
Domain: Computational Methods and Applied Mathematics
Statistical Equivalence of AI Emulators and Earth System Models: A Large Ensemble Study with Ultra-Low-Resolution E3SM
We evaluate the statistical fidelity of a very large ensemble of an AI/ML emulator, FourCastNetv1, by evaluating it against a similarly large ensemble of an ultra–low–resolution configuration of E3SMv3 for forecasts up to 10-day lead time. FourCastNetv1 is trained on this E3SMv3 configuration, and initial conditions for FourCastNetv1 forecasts are taken directly from trajectories of the E3SMv3 ensemble. We compare the emulator-generated and E3SMv3 ensembles for RMSE growth, multivariate error covariance, and extremes across the 20 prognostic variables emulated by FourCastNetv1. FourCastNetv1 ensembles are found to be strongly underdispersive: ensemble spread grows robustly with lead time in E3SMv3 while remaining much smaller in FourCastNetv1. For small initial perturbations (e.g. $O(10^{-4}K$) in the temperature field), FourCastNetv1 produces essentially no spread and no dispersion growth, in strong contrast to E3SMv3. A nonparametric permutation test on the 20-dimensional RMSE covariance matrices shows that the covariance and correlation structures of the emulator and E3SMv3 differ significantly, with the observed Frobenius-norm statistic lying far in the tail of the permutation null distribution. Across variables, FourCastNetv1 also under-populates the upper tails of the ensemble distributions, indicating an under-representation of rare but dynamically important events. Our results indicate that AI/ML emulators of global atmosphere models, while offering orders-of-magnitude speedups for ensemble prediction, require explicit evaluation of ensemble spread, dependence structure, and extremes to establish their suitability as replacements for their parent dynamical systems.
Organizer(s): Salil Mahajan (Oak Ridge National Laboratory), Michael Kelleher (Oak Ridge National Laboratory), and Ming Fan (Oak Ridge National Laboratory)
Domain: Climate, Weather and Earth Sciences
XOR Bidding and Knapsack Formulations for HPC Network Resource Allocation
Modern High Performance Computing (HPC) centers face growing challenges in ingesting large and diverse data streams. These issues often create bottlenecks that limit bandwidth use and delay scientific progress. Traditional static allocation and simple queuing methods are not sufficient. This paper presents a dynamic and value-based approach to bandwidth allocation. We formalize the problem by incorporating both network and processing constraints. To address it, we introduce two new auction-based mechanisms: the Greedy Value Density Auction, which is fast to compute, and the Vickrey–Clarke–Groves (VCG) Knapsack Auction, which offers strong theoretical guarantees. Both mechanisms rely on user bids that include data needs and scientific value. The goal is to maximize the total value of successful transfers, often referred to as social welfare. Simulation results show that our auction mechanisms significantly outperform First Come First Served (FCFS) baselines. In high-load conditions, they reduce average and tail completion delays by over 80%. Predictability also improves, with the Coefficient of Variation of delay falling by 75–85%. Network stability increases as well, with load volatility (Peak to Average Ratio) dropping by up to 60–70%. This value-driven and adaptive strategy helps reduce congestion, improve bandwidth use, and ensure fairer access based on scientific importance.
Organizer(s): Abrar Hossain (University of Toledo), and Kishwar Ahmed (University of Toledo)
Domain: Engineering



