[Paper reading] Denoising Diffusion Probabilistic Models

[Paper reading] Denoising Diffusion Probabilistic Models

2023. 9. 19. 16:55ㆍArtificialIntelligence/PaperReading

Abstract

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. (비평형 열역학)
Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.
On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion.

Diffusion Model

This paper presents progress in diffusion probabilistic models. A diffusion probabilistic model is a parameterized Markov chain trained using variational inference to produce samples matching the data after finite time.
Transitions of this chain are learned to reverse a diffusion process, which is a Markov chain that gradually adds noise to the data in the opposite direction of sampling until signal is destroyed. When the diffusion consists of small amounts of Gaussian noise, it is sufficient to set the sampling chain transitions to conditional Gaussians too, allowing for a particularly simple neural network parameterization.

Diffusion models are straightforward to define and efficient to train, but to the best of our knowledge, there has been no demonstration that they are capable of generating high quality samples. We show that diffusion models actually are capable of generating high quality samples, sometimes better than the published results on other types of generative models (Section 4).
In addition, we show that a certain parameterization of diffusion models reveals an equivalence with denoising score matching over multiple noise levels during training and with annealed Langevin dynamics during sampling.
Despite their sample quality, our models do not have competitive log likelihoods compared to other likelihood-based models. We find that the majority of our models’ lossless codelengths are consumed to describe imperceptible image details.
We present a more refined analysis of this phenomenon in the language of lossy compression, and we show that the sampling procedure of diffusion models is a type of progressive decoding that resembles autoregressive decoding along a bit ordering that vastly generalizes what is normally possible with autoregressive models.

forward process or diffusion process / posterior q를 통해 log likelihood (Loss)

Diffusion models and denoising autoencoders

Diffusion models might appear to be a restricted class of latent variable models, but they allow a large number of degrees of freedom in implementation. One must choose the variances βt of the forward process and the model architecture and Gaussian distribution parameterization of the reverse process.
We ignore the fact that the forward process variances βt are learnable by reparameterization and instead fix them to constants. Thus, in our implementation, the approximate posterior q has no learnable parameters, so LT is a constant during training and can be ignored.

Since our simplified objective discards the weighting in Eq(12), it is a weighted variational bound that emphasizes different aspects of reconstruction compared to the standard variational bound.
In particular, our diffusion process causes the simplified objective to down-weight loss terms corresponding to small t.
These terms train the network to denoise data with very small amounts of noise, so it is beneficial to down-weight them so that the network can focus on more difficult denoising tasks at larger t terms. We will see in our experiments that this reweighting leads to better sample quality.

Progressive generation We also run a progressive unconditional generation process given by progressive decompression from random bits. In other words, we predict the result of the reverse process, x0, while sampling from the reverse process using Algorithm 2. Figures 6 and 10 show the resulting sample quality of x0 over the course of the reverse process. Large scale image features appear first and details appear last. Figure 7 shows stochastic predictions x0 ∼ pθ(x0|xt) with xt frozen for various t. When t is small, all but fine details are preserved, and when t is large, only large scale features are preserved. Perhaps these are hints of conceptual compression.

We can therefore interpret the Gaussian diffusion model as a kind of autoregressive model with a generalized bit ordering that cannot be expressed by reordering data coordinates.
Prior work has shown that such reorderings introduce inductive biases that have an impact on sample quality, so we speculate that the Gaussian diffusion serves a similar purpose, perhaps to greater effect since Gaussian noise might be more natural to add to images compared to masking noise.
Moreover, the Gaussian diffusion length is not restricted to equal the data dimension; for instance, we use T = 1000, which is less than the dimension of the 32 × 32 × 3 or 256 × 256 × 3 images in our experiments. Gaussian diffusions can be made shorter for fast sampling or longer for model expressiveness.

Related Work

While diffusion models might resemble flows and VAEs, diffusion models are designed so that q has no parameters and the top-level latent xT has nearly zero mutual information with the data x0.
Our ε-prediction reverse process parameterization establishes a connection between diffusion models and denoising score matching over multiple noise levels with annealed Langevin dynamics for sampling.
Diffusion models, however, admit straightforward log likelihood evaluation, and the training procedure explicitly trains the Langevin dynamics sampler using variational inference. The connection also has the reverse implication that a certain weighted form of denoising score matching is the same as variational inference to train a Langevin-like sampler.
Other methods for learning transition operators of Markov chains include infusion training, variational walkback, generative stochastic networks, and others.

Conclusion

We have presented high quality image samples using diffusion models, and we have found connections among diffusion models and variational inference for training Markov chains, denoising score matching and annealed Langevin dynamics (and energy-based models by extension), autoregressive models, and progressive lossy compression.
Since diffusion models seem to have excellent inductive biases for image data, we look forward to investigating their utility in other data modalities and as components in other types of generative models and machine learning systems.

Question

background 후반부에 제시된 다른 계산 방식(Rao-Blackwellized fashion)이 무엇을 의미하는지 잘 모르겠다. Appendix를 보아도 잘 이해되지 않는다.
- Consequently, all KL divergences in Eq(5) are comparisons between Gaussians, so they can be calculated in a Rao-Blackwellized fashion with closed form expressions instead of high variance Monte Carlo estimates.

To summarize, we can train the reverse process mean function approximator μθ to predict μt, or by modifying its parameterization, we can train it to predict ε. We have shown that the ε-prediction parameterization both resembles Langevin dynamics and simplifies the diffusion model’s variational bound to an objective that resembles denoising score matching.
- Langevin dynamics (랑주뱅 동역학)을 resemble 한다는 것이 무엇인지 모르겠다.

Similar to the discretized continuous distributions used in VAE decoders and autoregressive models, our choice here ensures that the variational bound is a lossless codelength of discrete data, without need of adding noise to the data or incorporating the Jacobian of the scaling operation into the log likelihood. At the end of sampling, we display μθ (x1 , 1) noiselessly.
- 이 부분에서 data에 노이즈를 더하거나, 자코비안을 incorporating 한다는 것이 어떤 의미인지 잘 모르겠다.

참고 자료

https://xoft.tistory.com/32

[개념 정리] Diffusion Model

GAN, VAE 와 같은 생성 모델(Generative Model) 중 하나로써, 2022년에 이슈가 되었던 text-to-image 모델인 Stable-Diffusion, DALL-E-2, Imagen의 기반이 되는 모델입니다. 많은 논문에서 Diffusion Model이 인용되지만 수

xoft.tistory.com

https://bskyvision.com/entry/마코프-프로세스마코프-체인란

[강화학습] 마코프 프로세스(=마코프 체인) 제대로 이해하기

이 포스팅은 어느 카테고리에 넣어야할지 고민이 된다. 확률과도 관련이 있고, 딥러닝의 강화학습과도 관련이 있고, 영상처리의 몇몇 알고리즘에서도 사용되기 때문이다. 짧은 고민 끝에 머신

bskyvision.com

https://xoft.tistory.com/33

[개념 정리] Diffusion Model 과 DDPM 수식 유도 과정

이전글 에서 Diffusion Model과 DDPM(Denosing Diffusion Probabilistic Model)의 개념에 대해서 알아봤습니다. 이번 글에서는 수식 유도 과정을 다뤄보겠습니다. 이전글에서 Diffusion모델은 Noise를 주입을 위해 사

xoft.tistory.com

'ArtificialIntelligence > PaperReading' 카테고리의 다른 글

[Paper reading] Dataset Condensation with Distribution Matching (1)	2023.10.01
[Paper reading] NeRF, Representing Scenes as Neural Radiance Fields for View Synthesis (0)	2023.09.24
[Paper reading] Implicit Neural Representations (0)	2023.09.18
[Paper reading] Dataset Condensation Keynote (0)	2023.09.11
[Paper reading] Dataset Condensation reading (0)	2023.09.11

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

KimAnt 🥦

KimAnt 🥦

태그

최근글

댓글

공지사항

아카이브

Abstract

Diffusion Model

Diffusion models and denoising autoencoders

Related Work

Conclusion

Question

참고 자료

'ArtificialIntelligence > PaperReading' 카테고리의 다른 글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역