[GoogleML] Convolutional Neural Networks Case Studies

2023. 10. 6. 23:16ใ†ArtificialIntelligence/2023GoogleMLBootcamp

 

 

 

Why look at case studies?

 

 

 

Classic Networks

LeNet

- ํ˜„์žฌ๋Š” 1000๋ฐฐ ์ด์ƒ ๋” ๋งŽ์€ params

- layer๊ฐ€ ์ง„ํ–‰๋ ์ˆ˜๋ก nh, nw๋Š” ์ค„์–ด๋“ค๊ณ , nc (์ฑ„๋„์˜ ์ˆ˜)๋Š” ์ฆ๊ฐ€ํ•˜๋Š” ๊ตฌ์กฐ

- conv์™€ pooling์ด ๋ฒˆ๊ฐˆ์•„๊ฐ€๋ฉด์„œ ์ˆ˜ํ–‰๋œ๋‹ค. (ํ’€๋ง ์ง„ํ–‰ํ•  ๋•Œ๋Š” ์ฑ„๋„ ์ˆ˜ ์œ ์ง€)

- ์ฑ„๋„์˜ ์ˆ˜ == ์ด์ „ ๋ ˆ์ด์–ด์˜ ํ•„ํ„ฐ ์ˆ˜

- ๋๋‹จ์— softmax activation func์„ ํ™œ์šฉํ•˜์—ฌ y' target ์˜ˆ์ธก

 

 

 

AlexNet

- LeNet๊ณผ ์œ ์‚ฌํ•œ ๊ตฌ์กฐ์ด๋‚˜, ํ›จ์”ฌ ๋” ํฌ๋‹ค (params 1000๋ฐฐ ์ฆ๊ฐ€)

- ReLU activation func ํ™œ์šฉ

- multiple GPU ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ

- Local Response Normalization (LRN) ํ™œ์šฉ -> ํ•œ pixel ์žก์•„์„œ, ์ฑ„๋„ (256) ๋ฐฉ๋ฉด์œผ๋กœ ๊ธธ๊ฒŒ normalization 

  -> ํ˜„์žฌ๋Š” ์ž˜ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ•

+ CV ๋ถ„์•ผ, ๋ฐ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ๋”ฅ๋Ÿฌ๋‹์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค€ AlexNet

 

 

 

VGG

- simpleํ•œ ๊ตฌ์กฐ์ด๋‚˜, layer๋ฅผ ๊นŠ๊ฒŒ ์Œ“์•˜๋‹ค.

- 138M parameters

- ๊นŠ์–ด์งˆ์ˆ˜๋ก H, W๋Š” ์ค„์–ด๋“ค๊ณ , C๋Š” ๋Š˜์–ด๊ฐ€๋Š” ๊ฒƒ์ด conv์˜ ๊ตฌ์กฐ!

 

 

 

ResNets

skip connection / short cut ์—ฐ๊ฒฐ 

input์„ ํ›„๋ฐ˜๋ถ€์— ๊ทธ๋Œ€๋กœ ๋‹ค์‹œ ๋”ํ•ด์คŒ์œผ๋กœ์จ 

gradient vanishing ๋ฌธ์ œ๋ฅผ ๊ฐœ์„  + ํ›จ์”ฌ ๋” ๋งŽ์ด ์Œ“์„ ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. 

 

g๋Š” activation func์œผ๋กœ ReLU๋ฅผ ๋งํ•œ๋‹ค. 

์ฆ‰, skip connection์˜ input ๊ฐ’์€ ReLU ํ†ต๊ณผ ์ „์— ๋”ํ•ด์ง 

 

 

 

์ด์ฒ˜๋Ÿผ short cut ์—ฐ๊ฒฐ์ด ๋”ํ•ด์ง€๋ฉด residual block์ด ๋จ

์œ„์˜ ๊ฒฝ์šฐ 5๊ฐœ์˜ residual block์ด ์žˆ๋Š” ๊ฒƒ 

ResNet์˜ ๊ฒฝ์šฐ, ๋งค์šฐ ๋งŽ์€ layer๋ฅผ ์Œ“์Œ์œผ๋กœ์จ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

Deeper Network -> doing well 

 

 

 

Why ResNets Work?

๋งŽ์€ residual block์„ ์Œ“์•„๋„, ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ํ•ด์น˜์ง€ ์•Š๋Š” ์ด์œ  

identity func์„ ๋”ํ•˜๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•œ ๊ธฐ๋Šฅ์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— 

์ฆ‰ ์ด์ „์˜ a[l]๊ณผ ์œ ์‚ฌํ•œ a[l+1]์ด ๋˜์–ด, ๋„คํŠธ์›Œํฌ์— ๋ถ€๋‹ด์ด X 

(weight์™€ bias๊ฐ€ 0์— ๊ฐ€๊นŒ์šด ์ž‘์€ ๊ฐ’์ด๋ผ๋Š” ๊ฐ€์ • ํ•˜์—)

 

 

 

a[l] ์ฐจ์›๊ณผ, a[l+2]์˜ ์ฐจ์›์„ ๋งž์ถฐ์ฃผ๊ธฐ ์œ„ํ•˜์—ฌ 

๋งŒ์•ฝ์— ๋‘˜์˜ ์ฐจ์›์ด ๋‹ค๋ฅด๋‹ค๋ฉด, a[l]์— ์ ์ ˆํ•œ ์ฐจ์›์˜ W๋ฅผ ๊ณฑํ•˜์—ฌ 

๋‘˜์˜ ์ฐจ์›์„ ๋™์ผํ•˜๊ฒŒ ๋ฐ”๊พธ์–ด์ค€๋‹ค 

 

 

 

๋งˆ์ง€๋ง‰์— FC -> for softmax

 

 

 

Networks in Networks and 1x1 Convolutions

1 X 1์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋ฉด, ํ•œ ํ”ฝ์…€์— ๋Œ€ํ•˜์—ฌ ์ฑ„๋„ ๋ฐฉํ–ฅ์œผ๋กœ, 

ํ•˜๋‚˜์˜ ๋‰ด๋Ÿฐ์œผ๋กœ ์—ฐ์‚ฐํ•˜์—ฌ, ๊ฐ’์„ ๋„์ถœํ•˜๋Š” ํšจ๊ณผ๋ฅผ ๊ฐ€์ง„๋‹ค. 

์ด 1 X 1์˜ ํ•„ํ„ฐ ์ˆ˜๊ฐ€ ์ถœ๋ ฅ์˜ ๋งˆ์ง€๋ง‰ ์ฐจ์›์ด ๋˜๋Š” ๊ฒƒ 

 

 

 

๋งŒ์•ฝ์— ๋‰ด๋Ÿฐ์ด ์—ฌ๋Ÿฌ๊ฐœ์˜€๋‹ค๋ฉด (1๊ฐœ๊ฐ€ ์•„๋‹ˆ๋ผ)

์—ฐ๋‘์ƒ‰ ์ฐจ์› ๋ฐฉํ–ฅ์œผ๋กœ ๋ ˆ์ด์–ด๊ฐ€ ์Œ“์ด๊ฒŒ ๋œ๋‹ค.

์ด๊ฒƒ์ด 1 X 1 conv (Network in network)์˜ concept

 

 

 

shrink number of channel

์ฐจ์›์„ ์ค„์—ฌ์ฃผ๋Š” ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

1 X 1 ํ•„ํ„ฐ์˜ ์ˆ˜๋ฅผ ํ†ตํ•ด ์ฐจ์› ์ถ•์†Œ ๊ฐ€๋Šฅ 

 

 

 

Inception Network Motivation

์—ฌ๋Ÿฌ conv layer๋“ค์˜ ๊ฒฐ๊ณผ๋ฅผ concatํ•˜์—ฌ output

๊ฐ size๋งˆ๋‹ค channel ์ˆ˜์— ๋”ฐ๋ผ, 64, 128, 32, 32๊ฐœ์˜ output์ด ๋‚˜์˜ค๊ณ  

์ด๊ฑธ ๋‹ค ๋”ํ•ด์„œ 256๊ฐœ์˜ layer๋ฅผ ๊ฐ–๋Š” next layer๊ฐ€ ๋„์ถœ๋œ๋‹ค. 

+ 28 * 28๋กœ ์œ ์ง€ / ๋”ฐ๋ผ์„œ ๋ชจ๋‘ same conv

 

 

 

์ด cost๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•˜๋Š”๊ฐ€? 

์ด๊ฑฐ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ• 

output dim * conv * ์ด์ „ layer ์ฑ„๋„ ์ˆ˜ (192) 

28 28 32 * 5 5 * 192

 

 

 

1 X 1 ์„ ํ™œ์šฉํ•˜์—ฌ bottle neck ๊ตฌ์กฐ 

์ค‘๊ฐ„์— ํ•„ํ„ฐ์ˆ˜๋กœ ์ฐจ์›์„ ํ™• ์ฃฝ์ธ๋‹ค 

parameter ์ˆ˜๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค 

 

 

 

Inception Network

max pooling์˜ ๊ฒฝ์šฐ, same -> 28 * 28 * 192 (channel์ˆ˜๋„ ์œ ์ง€๋จ) 

input๊ณผ ๋™์ผํ•œ ๋งค์šฐ ํฐ ์ฐจ์›์ด output 

์ด๋•Œ, 1 X 1 conv layer๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ฑ„๋„ ์ˆ˜๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ฐจ์›์„ ์ถ•์†Œ์‹œํ‚จ๋‹ค. 

์™œ ํ’€๋ง๋งŒ ๋‚˜์ค‘์— 1 X 1์„ ์ ์šฉ์‹œํ‚ค๋Š”๊ฑธ๊นŒ . . ? (conv๋Š” ์ „์ฒ˜๋ฆฌ์ฒ˜๋Ÿผ ์“ฐ์˜€๋Š”๋ฐ) 

 

 

 

inception block ๊ตฌ์กฐ๊ฐ€ ๋ฐ˜๋ณต๋˜๋Š” ํ˜•ํƒœ์˜ ๋„คํŠธ์›Œํฌ 

 

 

 

๋ณด์กฐ ํ•™์Šต๊ธฐ๊ฐ€ ๋ถ™๋Š”๋‹ค

 

 

 

์™œ ์ด๋ฆ„์ด ์ธ์…‰์…˜ net์ธ๊ฐ€? 

์ธ์…‰์…˜ ์˜ํ™” ๋ฐˆ์„ ๊ฐ€์ ธ์™”๋‹ค! :)

์•„ ์ธ์…‰์…˜ ๋ณด๊ณ ์‹ถ๋‹ค. . . 

 

 

 

MobileNet

๋” ์ œํ•œ์ ์ธ ํ™˜๊ฒฝ์—์„œ๋„ ๋™์ž‘ ๊ฐ€๋Šฅํ•œ NN

 

 

 

์ผ๋ฐ˜์ ์ธ conv์˜ ๊ฒฝ์šฐ 

์–ด๋–ค ๋ฐฉ์‹์œผ๋กœ computational cost๊ฐ€ ๋ฐœ์ƒํ• ๊นŒ? ์— ๋Œ€ํ•œ ๊ณ ์ฐฐ 

 

 

 

์–ด๋–ป๊ฒŒ ์ด ๋‘ step์œผ๋กœ ๋‚˜๋‰˜๋Š”๊ฐ€? 

 

 

 

R G B ํ•„ํ„ฐ ๋ณ„ ์—ฐ์‚ฐ์ด ์ด๋ฃจ์–ด์ง€๋Š” ๊ณผ์ • 

 

 

 

 

 

 

pointwise conv๊ฐ€ ์–ด๋–ป๊ฒŒ ์ด๋ฃจ์–ด์ง€๋Š”๊ฐ€?

 

 

 

ํ•‘ํฌ์ƒ‰ 1 X 1 ๋„ ๋™์ผํ•˜๊ฒŒ ์—ฐ์‚ฐ์ด ๋œ๋‹ค 

์–ด๋–ป๊ฒŒ ์ฐจ์›์„ 3 -> 5๋กœ ๋Š˜๋ฆฌ๋Š”๊ฐ€? 

ํ•‘ํฌ์˜ ์ฐจ์› ์ˆ˜ !

 

 

 

ํ•‘ํฌ์ƒ‰์„ 5๊ฐœ ์“ฐ๋ฉด -> ๊ฒฐ๊ณผ ์ฑ„๋„ dim์ด ๋˜๋Š” ๊ฒƒ 

 

 

 

 

 

 

๋” ๋‚˜์€ ์ถ”๋ก  ์‹œ๊ฐ„์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ์ด์œ  

depthwise์™€ pointwise๋กœ ๋‚˜๋ˆ„์–ด, ๋”ํ•ด์ง€๋ฉด์„œ 

computational cost๊ฐ€ ํš๊ธฐ์ ์œผ๋กœ ์ค„์–ด๋“ ๋‹ค (30%)

 

 

 

 

 

 

MobileNet Architecture

depthwise์™€ pointwise๋กœ ๋‚˜๋‰˜์–ด 

computational cost๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ์ค„์ธ๊ฒƒ์ด key 

 

 

 

bottlenet block ๊ตฌ์กฐ์˜ ์žฅ์ 

- expansion์„ ํ†ตํ•ด ๋” ์ข‹์€ representation

- memory ๊ด€์ ์—์„œ

 

 

 

EfficientNet

1. ๋ณด๋‹ค ๋” ๋†’์€ ํ•ด์ƒ๋„ 

resolution์„ ๋†’์ด๊ธฐ

 

 

 

2. depth๋ฅผ ๋” ์Œ“์ž

๋” ๊นŠ์€ ๋„คํŠธ์›Œํฌ

 

 

 

3. width๋ฅผ ๋Š˜๋ฆฐ๋‹ค 

๋ธ”๋Ÿญ size๋ฅผ ํ‚ค์šฐ์ž 

 

 

 

์…‹ ๋‹ค ๋Š˜๋ ค๋ณด๊ธฐ scale up

์ฃผ์–ด์ง„ computational cost ์ƒ์—์„œ

์ตœ์ ์˜ r, d, w๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ์กฐ์ ˆํ•œ๋‹ค 

 

 

 

 

 

'ArtificialIntelligence > 2023GoogleMLBootcamp' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[GoogleML] Face Recognition  (0) 2023.10.09
[GoogleML] Object Detection  (1) 2023.10.09
[GoogleML] Convolutional Neural Networks  (1) 2023.10.04
[GoogleML] Structuring Machine Learning Projects ์ˆ˜๋ฃŒ  (0) 2023.10.01
[GoogleML] Transfer Learning & End-to-end Deep Learning  (0) 2023.09.30