BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260724T151409Z
LOCATION:Bldg. 6 - 103
DTSTART;TZID=Europe/Stockholm:20260629T170000
DTEND;TZID=Europe/Stockholm:20260629T173000
UID:submissions.pasc-conference.org_PASC26_sess158_msa174@linklings.com
SUMMARY:There’s Plenty of Room in the Data: Rethinking Genomic File Format
 s for the AI Decade
DESCRIPTION:Mohammed Alser (Georgia State University)\n\nGenomic data form
 ats such as FASTA, FASTQ, BAM, and VCF were designed for early sequencing 
 technologies with low throughput and short reads. Today, genomics is enter
 ing an AI-driven decade, where large-scale machine learning models increas
 ingly consume raw and processed data directly. This talk revisits the assu
 mptions behind our most widely used genomic file formats and asks whether 
 they remain fit for purpose—a question often overlooked, even when buildin
 g high-performance hardware accelerators for genomics. \n\nWe examine how 
 current formats encode sequence, quality, alignment, and metadata, highlig
 hting structural constraints that limit downstream analysis, accuracy, com
 pression, and random access. While these formats have enabled remarkable p
 rogress, they often prioritize human-centric pipelines and linear workflow
 s over the needs of modern AI systems: efficient batching, rich contextual
  metadata, and scalable access across distributed computing. \n\nWe also s
 hare ongoing efforts to enable intelligent genome analysis through novel a
 lgorithms, hardware accelerators, and emerging technologies such as proces
 sing-in-memory. Current formats unfortunately impede these parallelism eff
 orts, as I/O and memory bandwidth remain significant bottlenecks. Finally,
  we present projects exploring new genomic data formats and outline future
  challenges, benefits, and research directions. These efforts aim to lay a
  foundation for AI-ready, scalable, and intelligent genome analysis.\n\nDo
 main: Engineering, Life Sciences, Computational Methods and Applied Mathem
 atics\n\nSession Chair: Gagandeep Singh (AMD)\n\n
END:VEVENT
END:VCALENDAR