Hojoon Leo Kim

Undergraduate Student, Electrical and Computer Engineering, Seoul National University

prof1.jpg
prof2.jpg
prof3.jpg

hojoon.kim@snu.ac.kr

37, Samseong-ro 51-gil

Gangnam-gu, Seoul 06280

Republic of Korea

Hello, my name is Hojoon Kim, and I also go by Leo.

As an undergraduate researcher passionate about advancing ML systems through hardware-software co-design, I explore how traditional computer architecture principles such as caching, branch prediction, and multiprocessing can address inefficiencies in modern ML workloads. My work spans low-bit quantization, storage-assisted inference, and cache-driven planning for embodied AI agents, with publications at OSDI'25, ICML'25 (Spotlight), and submissions to MLSys'26. I aim to develop practical system architectures that make next-generation ML applications both efficient and deployable at scale by rethinking system abstractions across the entire computing stack.

When I’m not deep in code, you can probably find me on the tennis court🎾!

News

Jan, 2026 Our work on QUESO, a storage-assisted quantization error compensation method for on-device LLM inference, has been submitted to ICML'26!
Jan, 2026 Working on HELIOS, a scheduler for multi-robot serving on embedded devices. It hooks CUDA kernels using a light-weighted shared object (.so). Submitted to ICML'26!
Dec, 2025 Working on a scheduler for multi-robot serving on embedded devices for ICML'26. It hooks CUDA kernels using a light-weighted shared object (.so).
Nov, 2025 Our team won the Grand Prize (1st Place) at the 2025 AI Chip Contest - NPU Optimization Track hosted by the Ministry of Science and ICT (MSIT), Republic of Korea, receiving a prize of KRW 10,000,000.
Nov, 2025 Our work on AgenticCache, a cache-driven asynchronous planning system for embodied AI agents, has been accepted to MLSys'26!
May, 2025 Our DecDEC has been accepted to OSDI'25!
May, 2025 Excited to share that our ICML'25 submission, FlashTP, is currently under consideration for a Spotlight or Oral presentation. Also, it has already been successfully deployed in real-world industrial applications.

Selected Publications

* indicates equal contribution

  1. ICML’26 (In Submission)
    HELIOS.png
    HELIOS: Heterogeneous Lightweight VLA Model Serving System
    Jongheon Jeong*Hojoon Kim*, Rokhee Lee, Yeonhong Park, Young H. Oh, and Jae W. Lee
    2026
    In submission
  2. MLSys’26
    agenticcache.jpg
    AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents
    Hojoon Kim, Yuheng Wu, and Thierry Tambe
    2026
  3. ICML’26 (In Submission)
    queso.jpg
    QUESO: Storage-Assisted Quantization Error Compensation for On-Device LLM Inference
    Seong Hoon Seo, Donghyun Lee, Geonha Lee, Hojoon Kim, Yeonhong Park, and Jae W. Lee
    2026
    In submission
  4. ICML’25 Spotlight
    flashtp.jpg
    FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials
    Seung Yul Lee, Hojoon Kim, Yutack Park, Dawoon Jeong, Seungwu Han, Yeonhong Park, and Jae W. Lee
    2025
    Spotlight (313/12,107, 2.6%)
  5. OSDI’25
    decdec.jpg
    DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
    Yeonhong Park*, Jake Hyun*Hojoon Kim, and Jae W. Lee
    2025
    Acceptance Rate: 52/327 = 15.9%