attention(3)
-
Transformer Tokenizer, Embedding and LLaMA
Tokenization and Embedding: Science Behind Large Language Model Every input that we are providing to GPT is nothing but a token (numerical id) or a sequence of tokens. GPT doesn’t understand the language the way humans do but it just processes sequence of numerical ids, that we call tokens. But how does it find the association among words(tokens) and provide human like response, here comes the c..
2024.07.06 -
[GoogleML] Sequence Models & Attention Mechanism
Basic Models input ํ๋์ค ๋จ์ด๋ค์ ๋ฐ๋ ๋ถ๋ถ -> ์ธ์ฝ๋ output ์์ด ๋จ์ด๋ค์ ์ถ๋ ฅ -> ๋์ฝ๋ + ์ถฉ๋ถํ ์์ input / output ๋จ์ด ์๋ค์ด ์๋ค๋ฉด, ํด๋น ๊ตฌ์กฐ๋ working ๋ง์ด ๊ธธ์ง ์์ ๋ฌธ์ฅ์ output์ผ๋ก ๋ธ๋ค๋ฉด image captioning๋ ๊ฐ๋ฅ sequence to seq image to seq Picking the Most Likely Sentence condition์ผ๋ก ํ๋์ค์ด ๋จ์ด๊ฐ ๋ค์ด์์ ๋, ์๋จ์ด์ ํ๋ฅ ์ ์์ธกํ๋ ๊ฒ -> conditional probablity ๋๋คํ๊ฒ ๋ฝ์๋ด๋ค๊ฐ๋ ์ด์ํ ๋ฌธ์ฅ์ ๋ง๋ค ์ ์๋ค. ๋ฐ๋ผ์ ํ๋ฅ ๊ฐ์ ์ต๋ํํ๋ ๋ฌธ์ฅ์ ์์ธกํ๋ ๊ฒ์ด ์ ํฉํจ ๋ฐ๋ผ์ most likely english sentence ์ ๋ฌธ์ฅ์ด ๋..
2023.10.30 -
[Paper reading] Attention is all you need, Transformer
Transformer Abstract The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and conv..
2023.08.25