BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260522T162634Z
LOCATION:Bldg. 6 - Room 002
DTSTART;TZID=Europe/Stockholm:20260629T120000
DTEND;TZID=Europe/Stockholm:20260629T123000
UID:submissions.pasc-conference.org_PASC26_sess166_pap129@linklings.com
SUMMARY:A Performance-Portable, Massively Parallel Distributed Nonuniform 
 FFT
DESCRIPTION:Paul Fischill (ETH Zurich); Andreas Adelmann (Paul Scherrer In
 stitute, ETH Zurich); and Sriramkrishnan Muralikrishnan (Forschungszentrum
  Jülich)\n\nThe nonuniform fast Fourier transform (NUFFT) enables spectral
  methods for problems with irregularly spaced samples, with applications i
 n medical imaging, molecular dynamics, and kinetic plasma simulations. Exi
 sting implementations are limited to shared memory execution, restricting 
 problem sizes to what fits on a single node. We present the first distribu
 ted, performance-portable NUFFT for heterogeneous supercomputers. Our Kokk
 os-based implementation runs without modification on NVIDIA and AMD GPUs. 
 We develop multiple spreading and interpolation kernels optimized for diff
 erent accuracy requirements and architectures. Our spreading kernels match
  or exceed the single-GPU throughput of the state-of-the-art CUDA based NU
 FFT library cuFINUFFT at production particle densities, while our Kokkos-b
 ased implementation additionally supports AMD GPUs. Strong scaling experim
 ents on Alps (NVIDIA GH200), JUWELS Booster (NVIDIA A100) and LUMI (AMD MI
 250X) demonstrate scaling up to 1024 GPUs. At scale, the distributed FFT i
 s a significant part of the total runtime, making higher NUFFT accuracy le
 ss expensive. We apply the method to massively parallel Particle-in-Fourie
 r simulations of Landau damping with up to 1024^3 Fourier modes and 8.6 bi
 llion particles on Alps, JUWELS, and LUMI, demonstrating that distributed 
 NUFFTs enable kinetic plasma simulations at resolutions previously inacces
 sible to spectral particle methods.\n\nSession Chair: Sriramkrishnan Mural
 ikrishnan (Forschungszentrum Jülich)\n\n
END:VEVENT
END:VCALENDAR
