BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260421T090512Z
LOCATION:Plenary Room (Bldg. 6 - 001)
DTSTART;TZID=Europe/Stockholm:20260629T194800
DTEND;TZID=Europe/Stockholm:20260629T194900
UID:submissions.pasc-conference.org_PASC26_sess124_pos144@linklings.com
SUMMARY:Optimizing the ICON Dynamical Core for GPUs Utilizing GT4Py and Da
 Ce
DESCRIPTION:Christoph Müller (MeteoSwiss) and Magdalena Luz, Nicoletta Far
 abullini, Till Ehrengruber, Chia Rui Ong, Daniel Hupp, Philip Müller, Edoa
 rdo Paone, Ioannis Magkanaris, Christos Kotsalos, Yilu Chen, Jacopo Canton
 , Hannes Vogt, Enrique González Paredes, Rico Häuselmann, Anurag Dipankar,
  Mauro Bianco, William Sawyer, and Mikael Simberg (ETH Zurich / CSCS)\n\nN
 umerical weather predictions are based on a numerical model running on a l
 arge super computer. Improving the performance of these models is an activ
 e field of research which benefits society. The ICON model is a finite vol
 ume model running on an icosahedral mesh.\nFinite volume stencil computati
 ons on an icosahedral mesh pose a memory bound optimization problem which 
 profits heavily from inlining and fusion, resulting in the demotion of ful
 ly realised fields, which are written and read from global memory, to scal
 ars, which can exist in registers.\nIn this poster we showcase an optimiza
 tion pipeline which improves the performance of the dynamical core of ICON
  for production relevant MeteoSwiss experiments by 1.3x over OpenACC basel
 ine on Nvidia H100 GPUs and 1.15x over OpenACC baseline on Nvidia A100 GPU
 s.\nThe steps of the pipeline are a code elimination stage done in GT4Py w
 here all dynamical core code branches not relevant for the current experim
 ent are deleted, followed by an inlining and fusion stage in DaCe, which c
 ombines the remaining stencil computations into as few CUDA kernels as pos
 sible.\nPerformance results for production MeteoSwiss experiments for A100
  and H100 GPUs are presented and the difference to the OpenACC baseline is
  discussed.\n\n
END:VEVENT
END:VCALENDAR
