BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260724T151406Z
LOCATION:Bldg. 6 - 004
DTSTART;TZID=Europe/Stockholm:20260629T160000
DTEND;TZID=Europe/Stockholm:20260629T163000
UID:submissions.pasc-conference.org_PASC26_sess159_msa152@linklings.com
SUMMARY:A Human-in-the-Loop Scoping Review Screening Pipeline Using Self-H
 osted Large Language Models: An Example in Sport Science
DESCRIPTION:Kai Michael Gensitz (University of Bern); Shawan Mohammed (RWT
 H Aachen University); Daniela E. Ströckl (Carinthia University of Applied 
 Sciences); Marc Augustin (Protestant University of Applied Sciences, Bochu
 m); Claudio R. Nigg (University of Bern); and Ciara McCormack (National Un
 iversity of Ireland, Maynooth)\n\nScoping reviews are labor-intensive effo
 rts to screen and extract paper data. Large Language Models (LLMs) seem to
  offer efficiency gains. But to address data privacy and reproducibility i
 ssues with cloud-based LLMs, and adhering to JBI/PRISMA-ScR guidelines, we
  present a human-in-the-loop review pipeline using a self-hosted LLM (gpt-
 oss-120b via a GPUStack framework) as the application of a Proof of Concep
 t infrastructure at the University of Bern servers, adhering to copyright 
 and validation standards. The process is applied to a sport science datase
 t (N=2222) on smartphone-based artificial intelligence for physical activi
 ty promotion. Rigorous validation involved a human screened 10% calibratio
 n sample (n=223; inter-rater reliability PABAK=0.53), informing a 20 itera
 tions of prompt optimization (RISEN framework), and a 5% unseen test sampl
 e (n=110; PABAK=0.75). The LLM achieved 92.7% accuracy, 100.0% sensitivity
 , 91.6% specificity, and PABAK of 0.86 on the test sample, saving 98% of h
 uman screening time. This calibrate-test method will be repeated for full-
 text screening and data extraction but title/abstract screening already pr
 oves that self-hosted LLMs can efficiently and rigorously support evidence
  synthesis on a methodological point of view. Practically, this approach w
 ill produce an evidence map to inform future interventions in the field of
  sport science.\n\nDomain: Chemistry and Materials, Climate, Weather, and 
 Earth Sciences, Applied Social Sciences and Humanities, Engineering, Life 
 Sciences, Physics, Computational Methods and Applied Mathematics\n\nSessio
 n Chairs: Tobias Hodel (University of Bern, Switzerland) and Sukanya Nath 
 (University of Bern, Data Science Lab)\n\n
END:VEVENT
END:VCALENDAR