ComputerScience/Computer Architecture(7)
-
[Paper Review] Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective
Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective https://arxiv.org/abs/2511.00739 Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric PerspectiveAgentic AI serving converts monolithic LLM-based inference to autonomous problem-solvers that can plan, call tools, perform reasoning, and adapt on the fly. Due to diverse ..
19:39:45 -
[Paper Review] Processing in Memory: The Terasys Massively Parallel PlM Array
Processing in Memory: The Terasys Massively Parallel PlM Array (PACT'95) The notion of computing in memory has been with us for several decades. For example, Stone’ proposed a logic-in-memory computer consisting of an enhanced cache memory array that serves as a high-speed buffer between CPU and conventional memory. More recently, a group at the University of Toronto has designed a computati..
2026.04.22 -
[Paper Review] Hitting the Memory Wall: Implications of the Obvious
Hitting the Memory Wall: Implications of the Obvious (1994) This brief note points out something obvious—something the authors “knew” without really understanding. With apologies to those who did understand, we offer it to those others who, like us, missed the point.We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM memory speed; each is imp..
2026.04.17 -
[Paper Review] Near-Memory Computing: Past, Present, and Future
Near-Memory Computing: Past, Present, and Future The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory — call..
2026.04.17 -
[Paper Review] QuCo: Efficient and Flexible Hardware-Driven Automatic Configure
QuCo: Efficient and Flexible Hardware-Driven Automatic Configuration of Tile Transfers in GPUs (HPCA'26) Abstract The growing complexity and parallelism demands of modern GPU workloads have driven architectural innovations toward asynchronous tile transfers (ATTs) to overlap computation and data movement. While ATT units such as the NVIDIA’s Tensor Memory Accelerator (TMA) introduce high-thro..
2026.04.03 -
[Interconnection Networks] Chap24. Simulation
Interconnection Networks, Simulation simulation is a double-edged sword — while it can provide excellent models of complex network designs, simulators and simulations are equally complex. To that end, the quality of simulation results is only as good as the methodology used to generate and measure these results. In this chapter, we address the basics of simulation input, measurement, and design...
2026.03.25 -
[Paper Review] WASP: Exploiting GPU Pipeline Parallelismwith Hardware-Accelerated AutomaticWarp Specialization 2026.03.16