BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260522T162632Z
LOCATION:Bldg. 6 - Room 104
DTSTART;TZID=Europe/Stockholm:20260629T133000
DTEND;TZID=Europe/Stockholm:20260629T140000
UID:submissions.pasc-conference.org_PASC26_sess109_msa161@linklings.com
SUMMARY:From Parallelism to Bytes in Flight: Modernising ecWAM for Memory-
 Bandwidth-Dominated AI-Era GPUs
DESCRIPTION:Anastasia Stulova and Lukas Mosimann (NVIDIA Inc.)\n\nWeather 
 and climate models are complex scientific applications, often written in F
 ortran requiring performance portability across diverse high-performance c
 omputing architectures. As these codes are mainly maintained by domain sci
 entists, extensive hardware-specific rewrites are often impractical. Direc
 tive-based programming models such as OpenACC therefore remain a practical
  path to GPU acceleration while preserving high-level code structure.\n\nR
 ecent NVIDIA GPU generations, driven by AI workloads, deliver unprecedente
 d memory bandwidth compared to earlier supercomputing systems. To fully ex
 ploit this capability, exposing parallelism alone is insufficient; applica
 tions must also maximise effective memory bandwidth utilisation. This requ
 ires increasing the number of bytes in flight while moving data closer to 
 the computing cores to hide memory latency and reduce long stalls.\n\nIn t
 his work, we analyse the scaling of existing OpenACC parallelisation of ec
 WAM on modern GPUs and its memory bandwidth usage. We present bytes-in-fli
 ght–aware optimisation techniques, including loop tiling, prefetching, and
  kernel partitioning, and reassess traditional optimisations in the contex
 t of memory-bandwidth-dominated architectures. Finally, we discuss which o
 ptimisations can be expected from the state-of-the-art compilers and which
  remain the responsibility of application developers or domain-specific tr
 anslation tools, and highlight the role of AI-assisted approaches in moder
 nising Fortran codes for sustained performance on future heterogeneous HPC
  systems.\n\nDomain: Climate, Weather, and Earth Sciences\n\nSession Chair
 : Ahmad Nawab (ECMWF)\n\n
END:VEVENT
END:VCALENDAR
