[Paper reading] Dataset Condensation reading

[Paper reading] Dataset Condensation reading

2023. 9. 11. 00:11ㆍArtificialIntelligence/PaperReading

Dataset Condensation with Gradient Matching (ICLR 2021)

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for data-efficient learning, called Dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch.
We formulate this goal as a gradient matching problem between the gradients of deep neural network weights that are trained on the original and our synthetic data. We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods.
Finally we explore the use of our method in continual learning and neural architecture search and report promising gains when limited memory and computations are available.

In this paper, we propose a dataset condensation method that learns to synthesize a small set of informative images. We show that these images are significantly more data-efficient than the same number of original images and the ones produced by the previous method, and they are not architecture dependent, can be used to train different deep networks.
Once trained, they can be used to lower the memory print of datasets and efficiently train numerous networks which are crucial in continual learning and neural architecture search respectively.
For future work, we plan to explore the use of condensed images in more diverse and thus challenging datasets like ImageNet (Deng et al., 2009) that contain higher resolution images with larger variations in appearance and pose of objects, background.

Few-shot learning vs zero-shot learning
- 모델을 학습하는 데 사용되는 데이터의 양을 나타내는 용어
- few shot learning
  - 한 클래스당 일부 샘플 이미지만 사용하여 classification
  - 제한된, 적은 수의 샘플 이미지를 사용하여 분류 작업을 수행
  - Few shot learning refers to a machine learning methodology that can learn from a limited number of samples, less than 10.
  - The idea is that just as humans can quickly pick up new concepts with just a few examples, so can models. (Machine learning aims to bridge the gap between human-like learning capabilities.)
- zero shot learning
  - 라벨링 되지 않은 새로운 클래스에 대해, 이전에 학습된 모델을 사용하여 분류
  - 이전에 학습된 모델을 사용하여 학습 데이터에 없는 새로운 클래스를 인식하고 분류할 수 있다.
  - 이때 새로운 class에 대한 정보를 input으로 주어야 한다.
  - ZSL is a machine learning methodology that allows a model to learn to recognize a new class without ever seeing an example of it during training
  - model can be taught to recognize and classify it accurately using information about its attributes or characteristics without explicit training on the new class.

Dataset distillation vs Knowledge distillation vs Dataset condensation
- Knowledge distillation
  - 큰 앙상블 모델의 지식을 더 작은 모델(compact network)로 전달하는 방법
  - 대량의 데이터를 소수의 데이터로 압축하는 아이디어에서 출발
  - 전체 데이터의 loss와 증류 데이터(_x)의 loss가 서로 최대한 낮게 나오는 _x를 찾는 문제로 귀결
    - Dataset distillation

GitHub - VICO-UoE/DatasetCondensation: Dataset Condensation (ICLR21 and ICML21)

Dataset Condensation (ICLR21 and ICML21). Contribute to VICO-UoE/DatasetCondensation development by creating an account on GitHub.

github.com

Guide for few shot learning - concept, learning method, process

Few-shot learning requires less data for training, making it valuable in domains with limited or costly data collection.

www.thedatahunt.com

[Paper reading] Implicit Neural Representations (0)	2023.09.18
[Paper reading] Dataset Condensation Keynote (0)	2023.09.11
[Paper reading] Dataset Condensation Summary (0)	2023.09.10
[Paper reading] Swin Transformer (0)	2023.09.04
[Paper reading] Transformers for image recognition, ViT (0)	2023.08.28

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`