BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260625T133339Z
LOCATION:Bldg. 6 - 001 - Plenary Room
DTSTART;TZID=Europe/Stockholm:20260630T122300
DTEND;TZID=Europe/Stockholm:20260630T122400
UID:submissions.pasc-conference.org_PASC26_sess129_pos105@linklings.com
SUMMARY:P36 - Stencil Computation on Tenstorrent Wormhole
DESCRIPTION:Lorenzo Piarulli and Daniele De Sensi (Sapienza University)\n\
 nThe rapid ascent of large language models (LLMs) has prioritized domain-s
 pecific accelerators (DSAs) optimized for dense matrix-based deep learning
 . However, the suitability of these architectures for traditional high-per
 formance computing (HPC) kernels, like stencil-based partial differential 
 equation (PDE) solvers, remains largely unexplored. This research investig
 ates mapping stencil operations, the computational core of weather forecas
 ting, fluid dynamics, and seismic imaging, onto Tenstorrent’s Wormhole, a 
 RISC-V-based AI accelerator.\nThe study evaluates two heterogeneous CPU-Wo
 rmhole methodologies: an "axpy-style" approach using scaled vector additio
 ns and a matrix-multiplication formulation requiring complex stencil-to-ro
 w transformations. By delegating irregular scalar logic and boundary condi
 tions to the CPU while leveraging Wormhole for parallel tiled execution, t
 he design maximizes the accelerator’s throughput. Experimental results ind
 icate that while optimized CPUs currently lead in raw speed, Wormhole demo
 nstrates superior energy efficiency per grid update. Profiling identifies 
 bottlenecks in fixed tile layouts and scratchpad data movement. Consequent
 ly, this work proposes hardware and software refinements, including flexib
 le tile sizes, enhanced scalar units, and unified memory architectures, to
  evolve AI accelerators into competitive, general-purpose engines for scie
 ntific discovery. This work highlights that AI accelerators represent a pr
 omising path forward; their efficiency is vital for reducing energy consum
 ption in HPC environments while maintaining high performance.\n\nSession C
 hair: Tobias Hodel (University of Bern, Switzerland)\n\n
END:VEVENT
END:VCALENDAR
