Neurips25

I am working with Seonghoon Seo on optimizing the prefill stage of large language model (LLM) inference, with a focus on reducing memory and compute overhead during initial context processing. Our work is targeted for submission to NeurIPS2025.