BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260421T090513Z
LOCATION:Plenary Room (Bldg. 6 - 001)
DTSTART;TZID=Europe/Stockholm:20260629T194000
DTEND;TZID=Europe/Stockholm:20260629T194100
UID:submissions.pasc-conference.org_PASC26_sess124_pos118@linklings.com
SUMMARY:A High-Performance, GPGPU-Enabled Discontinuous Galërkin Solver Us
 ing OpenMP Offloading and MPI
DESCRIPTION:Marco Scarpelli, Paola Francesca Antonietti, Carlo De Falco, a
 nd Luca Formaggia (Politecnico di Milano) and Giovanni Viciconte (ENI S.p.
 A.)\n\nWe present a GPGPU-enabled modal Discontinuous Galërkin solver that
  uses OpenMP+MPI. Device code is generated by offloading OpenMP pragmas, a
 nd inter-device/inter-node communication is enabled by MPI.\nOur test case
  implements a diffusion-advection solver with a Runge-Kutta-Chebyshev time
  stepping scheme. The fully explicit, matrix-free operator evaluation aims
  to keep a minimal memory footprint to address both the comparatively low 
 amount of RAM on the device and the high latency of its access. Thanks to 
 the high locality of DG methods, differential operators can be evaluated o
 n a per-element basis, using a SIMD scheme. We use a structured grid that 
 is implicitly defined, to further align with the philosophy of reducing me
 mory accesses in favour of more arithmetical computation.\nThe grid is par
 titioned in rectangular regions, each assigned to an MPI process, with “gh
 ost” cells at the boundary between processes; ghost DoF values are updated
  using a halo exchange scheme.\nWe show efficiency and scalability results
  for a multi-device, multi-node configuration and compare it to the host-o
 nly counterpart, deriving efficiency metrics to assess whether the device-
 enabled implementation is actually advantageous from a time-to-solution st
 andpoint.\n\n
END:VEVENT
END:VCALENDAR
