Transformer(6)
-
Transformer Tokenizer, Embedding and LLaMA
Tokenization and Embedding: Science Behind Large Language Model Every input that we are providing to GPT is nothing but a token (numerical id) or a sequence of tokens. GPT doesn’t understand the language the way humans do but it just processes sequence of numerical ids, that we call tokens. But how does it find the association among words(tokens) and provide human like response, here comes the c..
2024.07.06 -
[GoogleML] Transformer Network (Final)
Transformer Network Intuition // 구글 마지막 강의! :) 기존의 sequence 데이터를 다루던 방식과 달리, parallel하게 처리 + 두 개념이 중요하다 (self-attention, multi-head attention) Self-Attention Query, Key, Value 세가지 값이 중요 + softmax 수식과 유사하다 예를 들어, q3이 what's happen in there(아프리카) 라면, q3과 k1 -> 해당 질문에 대한 답이 jane인 것 (person) q3과 k2 -> 해당 질문에 대한 답이 visit인 것 (action) 등등 dot product attention이라고 불리기도 한다. 전에 key, value, query + 어텐션 Mult..
2023.10.30 -
[OpenAI] ChatGPT Prompt 개발
https://platform.openai.com/examples OpenAI Platform Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform. platform.openai.com 👩💻 UI도 깔끔하고 짱 이쁘다. structure도 화면에 짜인 구조가 API JSON request에 그대로 반영되어서, 개발하기 너무 좋았다. :) 🥺 해당 화면에서 호출하는 API 코드도 그대로 보여준다. 최고다. 👍 message 구조만 조금 수정해서 colab에서 개발했다. 👩💻 다른 재미있는 기능들도 많은 것 같다. 프로젝트에 활용하면 정말 편하고, 빠르게 구현..
2023.09.18 -
[Paper reading] Swin Transformer
Swin Transformer Hierarchical Vision Transformer using Shifted Windows Abstract This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high..
2023.09.04 -
[Paper reading] Transformers for image recognition, ViT
Transformers for image recognition Model overview. We split an image into fixed-size patches, linearly embed each of them, add position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder. In order to perform classification, we use the standard approach of adding an extra learnable “classification token” to the sequence. Abstract While the Transformer archite..
2023.08.28 -
[Paper reading] Attention is all you need, Transformer
Transformer Abstract The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and conv..
2023.08.25