[Paper reading] DenseNet

2023. 8. 22. 12:49ใ†ArtificialIntelligence/PaperReading

 

 

 

DenseNet

resNet๊ณผ ์œ ์‚ฌํ•˜๋ฉด์„œ๋„ ๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ ์—ฐ๊ฒฐ๋œ denseNet

 

 

  

๋™์ผํ•œ k์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ layer ์ˆ˜ ๋น„๊ต

 

 

 

๋‹ค๋ฅธ k ๊ฐ’๋„ ํ™•์ธ ๊ฐ€๋Šฅํ•˜๋‹ค

 

 

 

Abstract

  • Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output.
  • In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with layers have connections — one between each layer and its subsequent layer — our network has L(L+1) direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.
  • DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
    We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet).
     

 

 

 

Conclusion of Paper 

  • We proposed a new convolutional network architecture, which we refer to as Dense Convolutional Network (DenseNet). It introduces direct connections between any two layers with the same feature-map size. We showed that DenseNets scale naturally to hundreds of layers, while exhibiting no optimization difficulties.
  • In our experiments, DenseNets tend to yield consistent improvement in accuracy with growing number of parameters, without any signs of performance degradation or overfitting. Under multiple settings, it achieved state-of-the-art results across several highly competitive datasets.
  • Moreover, DenseNets require substantially fewer parameters and less computation to achieve state-of-the-art performances. Because we adopted hyperparameter settings optimized for residual networks in our study, we believe that further gains in accuracy of DenseNets may be obtained by more detailed tuning of hyperparameters and learning rate schedules.
  • Whilst following a simple connectivity rule, DenseNets naturally integrate the properties of identity mappings, deep supervision, and diversified depth. They allow feature reuse throughout the networks and can consequently learn more compact and, according to our experiments, more accurate models.
  • Because of their compact internal representations and reduced feature redundancy, DenseNets may be good feature extractors for various computer vision tasks that build on convolutional features

 

 

 

Conclusion

์žฅ์ 

  • Dense connectivity ์„ค๋ช… ๋„์ž…๋ถ€์—์„œ ResNet skip connection์—์„œ์˜ ์•„์‰ฌ์šด ์ ์„ ๊ฐœ์„ ํ•œ ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๋ฉฐ, ์™œ dense connectivity๊ฐ€ ํšจ๊ณผ์ ์ธ ๋ฐฉ์•ˆ์ธ์ง€ ๋…ผ๋ฆฌ์ ์œผ๋กœ ์„ค๋ช…๋˜์—ˆ๋‹ค. ๋‚˜์•„๊ฐ€ ์ง€์†์ ์œผ๋กœ(์ผ๊ด€์ ์œผ๋กœ) ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฉด์—์„œ ResNet๊ณผ ๋น„๊ตํ•˜๋ฉฐ, ๋‚ด์šฉ์ด ์ „๊ฐœ๋˜๋Š” ์ ์ด ์ดํ•ดํ•˜๋Š”๋ฐ ํšจ๊ณผ์ ์ด์—ˆ๋‹ค. 
  • parameter ์ˆ˜๊ฐ€ ๋” ์ ์„ ์ˆ˜ ์žˆ๋Š” ๊ทผ๊ฑฐ๋ฅผ ์ œ์‹œ 
  • ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ๋ถ„์„์„ ํ•œ ์ ์ด ์ƒˆ๋กœ์› ๋‹ค. (SVHN์ด ๋น„๊ต์  ์‰ฌ์šด ๊ณผ์—…์ด๋ผ๊ณ  ์ถ”์ •ํ•˜๋Š” ๊ทผ๊ฑฐ)
    However, the 250-layer DenseNet-BC doesn’t further improve the performance over its shorter counterpart. This may be explained by that SVHN is a relatively easy task, and extremely deep models may overfit to the training set.

 

  • Parameter๋ฅผ ๋ณด๋‹ค ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•œ๋‹ค๋Š” ๋ชจ๋ธ์ด๋ผ๋Š” ์  
    + ๋‹ค๋ฅธ ๋ชจ๋ธ๊ณผ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•  ๋•Œ, ๋‹จ์ˆœํžˆ error๋ฅผ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์—์„œ ๋‚˜์•„๊ฐ€ parameter ์ˆ˜๋ฅผ ์ƒ์„ธํžˆ ๋ถ„์„ํ•จ์œผ๋กœ์จ, ๋ณด๋‹ค ํ˜„์‹ค์ ์ด๊ณ  ํšจ๊ณผ์ ์œผ๋กœ ๋ชจ๋ธ ๋ถ„์„ ๋ฐฉ๋ฒ•์ด ๋ฐœ์ „๋˜์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. (๋™์ผํ•œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ๋‚ด์—์„œ๋„ ์–ด๋–ค ๊ตฌ์กฐ๊ฐ€ ๋” params ์ธก๋ฉด์—์„œ ํšจ๊ณผ์ ์ธ์ง€)
  • DenseNet์ด ์˜ค๋ฒ„ํ”ผํŒ…์— ์ทจ์•ฝํ•˜์ง€ ์•Š์€ ์ด์œ ์™€ ๋”๋ถˆ์–ด ์˜ค๋ฒ„ํ”ผํŒ…์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ์ด๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์•ˆ๊นŒ์ง€ ์ œ์‹œํ•œ ์ ์ด ์ข‹์•˜๋‹ค.
    • The DenseNet-BC bottleneck and compression layers appear to be an effective way to counter this trend.

 

 

๋‹จ์ 

  • This suggests that DenseNets can utilize the increased representational power of bigger and deeper models. It also indicates that they do not suffer from overfitting or the optimization difficulties of residual networks. 
    • ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ(CIFAR)์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋กœ DenseNet์€ residual network์™€ ๋‹ฌ๋ฆฌ overfitting or optimization ๋ฌธ์ œ๋ฅผ ๊ฒช์ง€ ์•Š์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ๊ณ  ํŒ๋‹จํ•˜๊ธฐ๋Š” ์–ด๋ ต์ง€ ์•Š์„๊นŒ? (๊ทผ๊ฑฐ๊ฐ€ ๋ถ€์‹คํ•œ ๊ฒƒ ๊ฐ™๋‹ค.)
  • DenseNets perform a similar deep supervision in an implicit fashion: a single classifier on top of the network provides direct supervision to all layers through at most two or three transition layers.
    However, the loss function and gradient of DenseNets are substantially less complicated, as the same loss function is shared between all layers.
    ๋‹จ์ˆœํ•œ GD๋ฅผ ๋„ˆ๋จธ, ๋‹ค์–‘ํ•œ ๊ตฌ์กฐ๊ฐ€ ์ œ์‹œ๋  ๋•Œ, gradient - loss func์€ ์–ด๋–ป๊ฒŒ ์„ค๊ณ„๋˜์–ด์•ผ ํ•˜๋Š”์ง€?

 

  • feature reuse ์ธก๋ฉด์—์„œ ๋‹จ๊ณ„๋ณ„ heat map ๊ทธ๋ž˜ํ”„๋ฅผ ํ†ตํ•ด ๊ตฌ์ฒด์ ์œผ๋กœ ๋ถ„์„ํ•œ ๊ณผ์ •์ด ์ธ์ƒ๊นŠ์—ˆ๋‹ค.
    • All layers spread their weights over many inputs within the same block. This indicates that features extracted by very early layers are, indeed, directly used by deep layers throughout the same dense block.
    • The weights of the transition layers also spread their weight across all layers within the preceding dense block, indicating information flow from the first to the last layers of the DenseNet through few indirections.
    • The layers within the second and third dense block consistently assign the least weight to the outputs of the transition layer (the top row of the triangles), indicating that the transition layer outputs many redundant features (with low weight on average). This is in keeping with the strong results of DenseNet-BC where exactly these outputs are compressed.
    • Although the final classification layer, shown on the very right, also uses weights across the entire dense block, there seems to be a concentration towards final feature-maps, suggesting that there may be some more high-level features produced late in the network.
 

 

๊ฐœ์„ ํ•  ์  & Question

  • It is worth noting that our experimental setup implies that we use hyperparameter settings that are optimized for ResNets but not for DenseNets. It is conceivable that more extensive hyper-parameter searches may further improve the performance of DenseNet on ImageNet.
    • ์–ด๋–ป๊ฒŒ ๋” ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ๋Š” ์ง€ ํ•จ๊ป˜ ์ œ์‹œ๋˜์—ˆ์œผ๋ฉด ๋” ์ข‹์•˜์„ ๊ฒƒ ๊ฐ™๋‹ค.

 

  • Superficially, DenseNets are quite similar to ResNets:
    only in that the inputs to Hl(·) are concatenated instead of summed.
    However, the implications of this seemingly small modification lead to substantially different behaviors of the two network architectures.
  • As a direct consequence of the input concatenation, the feature-maps learned by any of the DenseNet layers can be accessed by all subsequent layers. This encourages feature reuse throughout the network, and leads to more compact models. 

 

 

 

'ArtificialIntelligence > PaperReading' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Paper reading] Transformers for image recognition, ViT  (0) 2023.08.28
[Paper reading] Attention is all you need, Transformer  (0) 2023.08.25
[Paper reading] GoogleNet  (0) 2023.08.18
[Paper reading] ResNet  (0) 2023.08.16
[Paper reading] VGGNet  (0) 2023.08.16