[GoogleML] Sequence Models & Attention Mechanism

2023. 10. 30. 00:38ใ†ArtificialIntelligence/2023GoogleMLBootcamp

 

 

 

Basic Models

input ํ”„๋ž‘์Šค ๋‹จ์–ด๋“ค์„ ๋ฐ›๋Š” ๋ถ€๋ถ„ -> ์ธ์ฝ”๋”

output ์˜์–ด ๋‹จ์–ด๋“ค์„ ์ถœ๋ ฅ -> ๋””์ฝ”๋” 

+ ์ถฉ๋ถ„ํ•œ ์–‘์˜ input / output ๋‹จ์–ด ์Œ๋“ค์ด ์žˆ๋‹ค๋ฉด, ํ•ด๋‹น ๊ตฌ์กฐ๋„ working 

 

 

 

๋งŽ์ด ๊ธธ์ง€ ์•Š์€ ๋ฌธ์žฅ์„ output์œผ๋กœ ๋‚ธ๋‹ค๋ฉด image captioning๋„ ๊ฐ€๋Šฅ 

sequence to seq

image to seq

 

 

 

Picking the Most Likely Sentence

condition์œผ๋กœ ํ”„๋ž‘์Šค์–ด ๋‹จ์–ด๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ,

์˜๋‹จ์–ด์˜ ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ -> conditional probablity

 

 

 

๋žœ๋คํ•˜๊ฒŒ ๋ฝ‘์•„๋‚ด๋‹ค๊ฐ€๋Š” ์ด์ƒํ•œ ๋ฌธ์žฅ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. 

๋”ฐ๋ผ์„œ ํ™•๋ฅ  ๊ฐ’์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ฌธ์žฅ์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ์ ํ•ฉํ•จ

๋”ฐ๋ผ์„œ most likely english sentence

 

 

 

์œ— ๋ฌธ์žฅ์ด ๋” ์˜ฌ๋ฐ”๋ฅธ ๋ฒˆ์—ญ

ํ•˜์ง€๋งŒ ํƒ์š•์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜๋ฉด is ๋‹ค์Œ์œผ๋กœ going์ด ์˜ค๋Š” ํ™•๋ฅ ์ด ๋” ์ปค์„œ ์•„๋ž˜ ๋ฌธ์žฅ์„ ๋„์ถœํ•จ 

๋”ฐ๋ผ์„œ ํƒ์š• ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ตœ์ ์˜ ๋ฐฉ์‹์€ ์•„๋‹ˆ๋‹ค. 

๋‹จ์ˆœํžˆ ์•ž์˜ 3 ๋‹จ์–ด๋งŒ ๋‘”๋‹ค๋ฉด, ํƒ์š•๋ฒ• ๊ฒฐ๊ณผ์˜ ํ™•๋ฅ ์ด ๋” ํผ 

ํ•˜์ง€๋งŒ ์œ„์˜ ๋ฌธ์žฅ์ด ๋” ์ข‹์€ ๋ฌธ์žฅ (๋” ์ ํ•ฉํ•œ ๋ฒˆ์—ญ) 

 

 

 

Beam Search

beam search์—์„œ B๊ฐ€ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ์€ beam width

 

 

 

B = 3 

3๊ฐœ์˜ ๋‹จ์–ด์— ๋Œ€ํ•˜์—ฌ, ๊ฐ๊ฐ 10000

-> 30000 ํƒ์ƒ‰ํ•œ ๊ฒƒ

+ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ๋กœ ์ชผ๊ฐœ๊ธฐ ๊ฐ€๋Šฅ 

 

 

 

๋งŒ์•ฝ, beam search output 2๊ฐœ ์˜ˆ์ธก -> in, jane์— ๋น„ํ•ด ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์€ september -> ๋ฒ„๋ฆฐ๋‹ค. 

 

 

 

beam search๋ฅผ ํ†ตํ•ด ์ „์ฒด ๋ฌธ์žฅ์„ ๋„์ถœํ•˜๋Š” ๊ณผ์ •

 

 

 

Refinements to Beam Search

ํŒŒ์ด ๋Œ€์‹  ๋กœ๊ทธ(์‹œ๊ทธ๋งˆ)๋ฅผ ์‚ฌ์šฉํ•˜์ž -> log scale

0๊ณผ 1 ์‚ฌ์ด์˜ ํ™•๋ฅ  ๊ฐ’์ด ์ง€์†์ ์œผ๋กœ ๊ณฑํ•ด์ง€๋ฉด ๊ฐ’์ด 0์— ์ˆ˜๋ ด 

 

 

 

์•ŒํŒŒ๊ฐ€ 1์— ๊ฐ€๊นŒ์šฐ๋ฉด full normalization

0์— ๊ฐ€๊นŒ์šฐ๋ฉด normalization X

 

 

 

Ty -> ๊ธธ์ด (์‹œํ–‰ ํšŸ์ˆ˜) 

 

 

 

B -> ์–ผ๋งˆ๋‚˜ ๊ฒฝ์šฐ์˜ ์ˆ˜๋ฅผ ๊ณ ๋ คํ•  ๊ฒƒ์ธ๊ฐ€? 

B๊ฐ€ ํฌ๋ฉด, ๋งŽ์€ ๊ฒฝ์šฐ๋ฅผ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ -> ์„ฑ๋Šฅ์€ ์ข‹์œผ๋‚˜, ์†๋„๊ฐ€ ๋Š๋ฆด ๊ฒƒ

+ ํœด๋ฆฌ์Šคํ‹ฑํ•œ ํƒ์ƒ‰ ๋ฐฉ๋ฒ•

 

 

 

Error Analysis in Beam Search

์‚ฌ๋žŒ์ด ๋ฒˆ์—ญํ•œ ๊ฒƒ์ด y*

ML์ด ๋ฒˆ์—ญํ•œ ๊ฒƒ์ด y^

 

 

 

์ด ๋‘ ๊ฒฐ๊ณผ์˜ ํ™•๋ฅ ์„ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์ด RNN

 

 

 

์ธ๊ฐ„ ํ™•๋ฅ ์ด ๋†’๋‹ค๊ณ  RNN์ด ์˜ˆ์ธก -> beam search fault

ML ํ™•๋ฅ ์ด ๋†’๋‹ค๊ณ  RNN์ด ์˜ˆ์ธก -> RNN fault

 

 

 

B์™€ R์˜ ๋น„์œจ์„ ๊ณ„์‚ฐํ•˜์—ฌ ๋ฌด์—‡์ด ๋” ๋งŽ์€ ์—๋Ÿฌ๋ฅผ ์œ ๋ฐœํ•˜๋Š” ์›์ธ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

Bleu Score

 

 

 

 

 

 

 

 

 

bleu score - ๋ฒˆ์—ญ, generation, image caption ๋“ฑ๋“ฑ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์— ํ™œ์šฉ๋œ๋‹ค. 

 

 

 

Attention Model Intuition

๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ์‹ ๊ฒฝ๋ง์ด ๊ธฐ์–ตํ•˜๊ธฐ ์–ด๋ ต๋‹ค -> ์„ฑ๋Šฅ์ด ๊ฐ์†Œํ•˜๋Š” ํ˜„์ƒ ๋ฐœ์ƒ

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ attention ๋ฉ”์ปค๋‹ˆ์ฆ˜ (๋”ฅ๋Ÿฌ๋‹์—์„œ ๋งค์šฐ ์ค‘์š”)

 

 

 

Jane์ด๋ผ๋Š” ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด๋ฅผ output ๋„์ถœํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ฌด์—‡์ด ํ•„์š”ํ•œ๊ฐ€? 

์ด ๊ฐ๊ฐ์˜ input X ๋‹จ์–ด๋“ค์— ๋Œ€ํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ๋„์ž… -> ์•ŒํŒŒ๊ฐ’ : ์–ผ๋งˆ๋‚˜ ์—ฌ๊ธฐ์— attention ํ•  ๊ฒƒ์ธ์ง€ ์˜๋ฏธ

S : hidden state

 

 

 

์ง์ „ ๋‹จ๊ณ„์—์„œ ์ƒ์„ฑ๋œ ๋‹จ์–ด + attention ๊ฐ€์ค‘์น˜์— ๋”ฐ๋ผ input๋„ ํ•จ๊ป˜ ์˜ˆ์ธกํ•˜๋Š” ๊ณผ์ •์— ํˆฌ์ž… 

next word๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. 

 

 

 

a < t , t' >

์˜์–ด t๋ฅผ ์˜ˆ์ธกํ•  ๋•Œ,

ํ”„๋ž‘์Šค์–ด t'์„ ์–ผ๋งŒํผ ๋ฐ˜์˜, attention ํ•  ๊ฒƒ์ธ๊ฐ€? 

๊ฐ€์ค‘์น˜ ๊ฐ’์„ ์˜๋ฏธํ•œ๋‹ค. (์–ผ๋งˆ๋‚˜ ํ•ด๋‹น context๊ฐ€ ๊ทธ feature์— ์˜์กดํ•  ๊ฒƒ์ธ๊ฐ€?) 

t -> target, output

t' -> context, input

 

 

 

Attention Model

์•ŒํŒŒ์™€ ๋‹ฌ๋ฆฌ a๋Š” input (x<t>)๋ฅผ ์˜๋ฏธํ•œ๋‹ค! 

์—ฌ๊ธฐ์— ์–ดํ…์…˜ ์•ŒํŒŒ๋ฅผ ๊ณฑํ•œ ๊ฒƒ์„ ๋ชจ๋‘ ๋”ํ•˜๋ฉด c๊ฐ€ ๋œ๋‹ค.

 

 

 

๊ทธ๋ ‡๋‹ค๋ฉด ์ด ์–ดํ…์…˜ alpha<t, t'>์€ ์–ด๋–ป๊ฒŒ ๊ตฌํ•˜๋Š”๊ฑธ๊นŒ?

 

 

 

Tx, Ty์— ๋Œ€ํ•˜์—ฌ quadratic cost๊ฐ€ ์†Œ์š”๋œ๋‹ค๋Š” ๋‹จ์  

+ image captioning์— ์œ ์‚ฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

 

Speech Recognition

phonemes -> ์†Œ๋ฆฌ๋ฅผ ๋“ค๋ฆฌ๋Š”๋Œ€๋กœ ํ‘œ๊ธฐํ•œ ๊ฒƒ (the -> de) 

 

 

 

์Œ์„ฑ ์ธ์‹์—๋„ attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

 

Trigger Word Detection

์‹œ๋ฆฌ์•ผ! ํ•˜๋Š” ๊ทธ trigger detection์„ ์˜๋ฏธํ•œ๋‹ค. (์˜ค ์‹ ๊ธฐ๋ฐฉ๊ธฐ)

 

 

 

trigger word -> 1

์•„๋‹ˆ๋ฉด 0