[GoogleML] Natural Language Processing & Word Embeddings

2023. 10. 29. 16:52ใ†ArtificialIntelligence/2023GoogleMLBootcamp

 

 

 

Word Representation

๋‚ด์ ์„ ํ†ตํ•ด ์œ ์‚ฌ๋„๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

๋ชจ๋ธ์˜ ์ž…์žฅ์—์„œ ์‚ฌ๊ณผ์™€ ์˜ค๋ Œ์ง€ feature vector๋Š” ์œ ์‚ฌํ•˜๋‹ค 

ํ‚น - ์˜ค๋ Œ์ง€, ํ€ธ - ์˜ค๋ Œ์ง€ ๋ณด๋‹ค ์‚ฌ๊ณผ - ์˜ค๋ Œ์ง€๊ฐ€ ๋” ์œ ์‚ฌํ•œ ๊ด€๊ณ„ 

 

 

 

์ž„๋ฒ ๋”ฉ์˜ ๊ฐœ๋… 

 

 

 

Using Word Embeddings

์ „์ด ํ•™์Šต์„ ํ†ตํ•ด ๋” ์ž‘์€ ์–‘์˜ training set์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค 

 

 

 

 

 

 

์ด๋Ÿฌํ•œ ์ ์—์„œ word embedding์€ face encoding๊ณผ ์œ ์‚ฌํ•œ ์ ์ด ์žˆ๋‹ค! :) 

 

 

 

Properties of Word Embeddings

man -> woman as king -> what? 

์–ด๋–ป๊ฒŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ ์œผ๋กœ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ์„๊นŒ? (๋‘๋ฒกํ„ฐ์˜ ์ฐจ๊ฐ€ ๋™์ผํ•ด์ง€๋Š” feature -> gender) 

 

 

 

300 ์ฐจ์›์˜ ํ‘œํ˜„ ๊ณต๊ฐ„์—์„œ ์„ฑ๋ณ„์„ ์˜๋ฏธํ•˜๋Š” ๋‘ ๋ฒกํ„ฐ๊ฐ€ ์œ ์‚ฌํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚œ๋‹ค. 

๋”ฐ๋ผ์„œ ํ•ด๋‹น target word๋ฅผ ์–ป์œผ๋ ค๋ฉด,

ew(๋Œ€์ƒ ๋ฒกํ„ฐ, ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋‹ต)๊ณผ, (eking - eman - ewoman) ์˜ ๊ฐ’์˜ ์œ ์‚ฌ๋„๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” w๋ฅผ ์ฐพ์œผ๋ฉด ๋œ๋‹ค. 

 

 

 

T-SNE -> 300์ฐจ์›์„ 2D๋กœ ๋งคํ•‘์‹œ์ผœ์„œ, ๋ณด์—ฌ์ค€๋‹ค. (ํ‹ฐ์ฆˆ๋‹ˆ) 

 

 

 

์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ์˜๋ฏธํ•˜๋Š” sim

๋‘ ๋ฒกํ„ฐ ์‚ฌ์ด์˜ ๊ฐ์˜ ์ฝ”์‚ฌ์ธ ๊ฐ’ -> ์œ ์‚ฌ๋„๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ 

 

 

 

Embedding Matrix

์› ํ•ซ ๋ฒกํ„ฐ์™€ ์ „์ฒด embedding matrix์˜ ๋‚ด์ ์„ ํ†ตํ•ด orange ์—ด๋งŒ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ๋‹ค == e6257