2023_dsd
Ranked 1st based on throughput and task completion time. Optimized performance by addressing memory and compute bottlenecks: (1) implemented multi-bank memory to fetch data in parallel, resolving memory-bound issues; (2) used multiple output-stationary systolic arrays to boost compute throughput; and (3) pipelined the Load and Compute states to enable memory preloading.
[Video] [Report(In Korean)]