BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260522T162632Z
LOCATION:Bldg. 6 - Room 004
DTSTART;TZID=Europe/Stockholm:20260701T113000
DTEND;TZID=Europe/Stockholm:20260701T120000
UID:submissions.pasc-conference.org_PASC26_sess174_pap127@linklings.com
SUMMARY:Porting the code_saturne CFD Solver to GPU: Methodology, Feedback,
  and Insights
DESCRIPTION:Yvan Fournier (EDF France) and Charles Moulinec (STFC)\n\nThis
  paper describes recent developments in the open source general purpose CF
 D solver code_saturne, to enable execution of its main code paths on GPU, 
 while keeping the code base sustainable, limiting non-portable sections to
  a few performance-critical kernels and using portable constructs for the 
 bulk of the code.<br>Developed since 1997, and originally written in Fortr
 an, with pre- and post processing stages and MPI parallelism introducing C
  code, most of \CS\ had been migrated to C over the years. <br>Based on a 
 co-located finite volume approach for unstructured meshes and with domain 
 partitioning, the numerical approach allows for a broad scope of applicati
 ons and can handle multi-billion cell meshes, at the cost of low computati
 onal intensity and synchronization points, whether for dot products or hal
 o exchange.<br><br>For the bulk of the code, priority is given to ease and
  productivity of code adaptation over optimization, so as to minimize memo
 ry transfers. A simple migration of the initial C-based code to C++ allows
  for the replacement of traditional loops with parallel_for constructs whi
 ch can use different back-ends, encapsulating the previous OpenMP construc
 ts on CPU while using CUDA or SYCL on GPU.<br><br>Profiling has been essen
 tial to avoid performance pitfalls related to unified shared memory and GP
 U memory allocation. The solutions considered and their trade offs are des
 cribed here, focusing on the memory handling tweaks that have been essenti
 al to improving performance.<br><br>Some results are presented for the cas
 e of the flow in a tube bundle, showing good performance on several GPU ma
 chines.\n\nSession Chair: Rahul Bale (RIKEN)\n\n
END:VEVENT
END:VCALENDAR
