CPU(8)
-
[CUDA] Chap2. Heterogeneous data parallel computing
Chap2. Heterogeneous data parallel computing The structure of a CUDA C program reflects the coexistence of a host (CPU) and one or more devices (GPUs) in the computer. Each CUDA C source file can have a mixture of host code and device code. By default, any traditional C program is a CUDA program that contains only host code. One can add device code into any source file. The device code is clear..
2026.03.19 -
[UPMEM PIM] UPMEM-MHA DPU Programming
2025. 11. 29 Saturday 한 달 내내 풀지 못했던 에러를 해결했다.!너무너무너무 기뻤다. .·⋆(⌒_⌒)⋆·. 오랜만에 느껴본 코딩(디버깅)의 맛 .막막하고 어려운 문제도 step by step으로생각해보면, 찬찬히 해결할 수 있다 ! * dpu-diag- 총 2546개 DPU, 350Mhz를 확인할 수 있다. - perfcounter_get 로 가져온 cycle을 ms로 표현할 때, 정확한 hz가 필요하다. * Cycles 선형 증가 해결 - perfcounter_config(COUNT_CYCLES, true); - 초기화 구문 추가로, head 별 cycle 에러 해결 * DPU 내부 float 연산 제거 - 구조체 정의 변경, PIM 성능 향상 + TILE_ROWS 최적값..
2025.12.01 -
[PIM] PIM-Rec Design
2025. 05. 20. 화요일 Paper: https://open.library.ubc.ca/soa/cIRcle/collections/ubctheses/24/items/1.0435518 Offloading embedding lookups to processing-in-memory for deep learning recommender modelsRecommender systems are an essential part of many industries and businesses. Generating accurate recommendations is critical for user engagement and business revenue. Currently, deep learning recomme..
2025.05.20 -
[PIM] CPU/DPU Programming Code Review
PrIM BenchmarksVertor Addition Code Reivew https://github.com/SohyeonKim-dev/prim-benchmarks GitHub - SohyeonKim-dev/prim-benchmarks: PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world prPrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is developed to evaluate, analyze, and charact..
2024.07.16 -
[Paper review] Xen and the Art of Virtualization
Xen and the Art of Virtualization Abstraction Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100% binary compatibility at the expense of performance. Others sacrifice security or functionality for speed. Few offer resource isolation or..
2024.03.15 -
[운영체제] CFS in Linux
CFS Completely Fair Process Scheduling in Linux
2023.04.10 -
[운영체제] GPGPU for Deep Learning 2023.04.02
-
[운영체제] CPU와 Architecture _ SMP vs NUMA vs Clustered system
Bootstrapping in Linux - CPU - smart X -> 매우 빠르게 메모리에서 명령어를 읽고, 실행하는 것 - ROM - Read only memory - 읽기 전용 메모리, 영구적으로 저장하는 비휘발성 메모리 - 전원이 꺼져도 존재 -> 처음 부팅될 때 실행할 모드 + RAM - Random Acess Memery - 휘발성 메모리, 작업 중인 내용을 한시적으로 저장 하드웨어 초기화 및 테스트 - BIOS - Basic input output system - UEFL - Unified Extensible Firmware Interface -> 펌웨어 : 하드웨어에 포함된 소프트웨어 == 롬에 저장된 소프트웨어 -> POST - power on self test 진행 - 메모리 및 i..
2023.03.25