PIN(2)
-
[Pin] CoreBPE Memory Tracing by pinatrace
https://stackoverflow.com/questions/32026456/how-can-i-specify-an-area-of-code-to-instrument-it-by-pintool How can i specify an area of code to instrument it by pintool?There are four levels of granularity in Pin: routine, instruction and image, trace. Can i specify an limits/area to start and stop inserting instrumentation code. may by like directive like ( # startstackoverflow.com # 자주 썼던..
2024.10.15 -
Byte-Pair Encoding tokenization and Tiktoken
Byte-Pair Encoding tokenizationhttps://youtu.be/HEikzVL-lZU어떻게 토큰화의 단위가 결정되는지 알 수 있다. + Byte-Pair Encoding (BPE) was initially developed as an algorithm to compress texts, and then used by OpenAI for tokenization when pretraining the GPT model. It’s used by a lot of Transformer models, including GPT, GPT-2, RoBERTa, BART, and DeBERTa. 1) 캐릭터 별로 모두 분리하기 2) Pair 단위로 빈도 수 count 3) 가장 많은 빈도를 보여주는 ..
2024.07.08