Diffusion models are evolutionary algorithms, claims a team of researchers from Tufts, Harvard, and TU Wien.
"At least two processes in the biosphere have been recognized as capable of generalizing and driving novelty: evolution, a slow variational process adapting organisms across generations to their environment through natural selection; and learning, a faster transformational process allowing individuals to acquire knowledge and generalize from subjective experience during their lifetime. These processes are intensively studied in distinct domains within artificial intelligence. Relatively recent work has started drawing parallels between the seemingly unrelated processes of evolution and learning. We here argue that in particular diffusion models, where generative models trained to sample data points through incremental stochastic denoising, can be understood through evolutionary processes, inherently performing natural selection, mutation, and reproductive isolation."
"Both evolutionary processes and diffusion models rely on iterative refinements that combine directed updates with undirected perturbations: in evolution, random genetic mutations introduce diversity while natural selection guides populations toward greater fitness, and in diffusion models, random noise is progressively transformed into meaningful data through learned denoising steps that steer samples toward the target distribution. This parallel raises fundamental questions: Are the mechanisms underlying evolution and diffusion models fundamentally connected? Is this similarity merely an analogy, or does it reflect a deeper mathematical duality between biological evolution and generative modeling?"
"To answer these questions, we first examine evolution from the perspective of generative models. By considering populations of species in the biosphere, the variational evolution process can also be viewed as a transformation of distributions: the distributions of genotypes and phenotypes. Over evolutionary time scales, mutation and selection collectively alter the shape of these distributions. Similarly, many biologically inspired evolutionary algorithms can be understood in the same way: they optimize an objective function by maintaining and iteratively changing a large population's distribution. In fact, this concept is central to most generative models: the transformation of distributions. Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and diffusion models are all trained to transform simple distributions, typically standard Gaussian distributions, into complex distributions, where the samples represent meaningful images, videos, or audio, etc."
"On the other hand, diffusion models can also be viewed from an evolutionary perspective. As a generative model, diffusion models transform Gaussian distributions in an iterative manner into complex, structured data-points that resemble the training data distribution. During the training phase, the data points are corrupted by adding noise, and the model is trained to predict this added noise to reverse the process. In the sampling phase, starting with Gaussiandistributed data points, the model iteratively denoises to incrementally refine the data point samples. By considering noise-free samples as the desired outcome, such a directed denoising can be interpreted as directed selection, with each step introducing slight noise, akin to mutations. Together, this resembles an evolutionary process, where evolution is formulated as a combination of deterministic dynamics and stochastic mutations within the framework of non-equilibrium thermodynamics. This aligns with recent ideas that interpret the genome as a latent space parameterization of a multi-scale generative morphogenetic process, rather than a direct blueprint of an organism. If one were to revert the time direction of an evolutionary process, the evolved population of potentially highly correlated high-fitness solutions will dissolve gradually, i.e., step by step and thus akin to the forward process in diffusion models, into the respectively chosen initial distribution, typically Gaussian noise."
The researchers proceed to present a mathematical representation of diffusion models. Then, "By substituting Equations 8 and 10 into Equation 5, we derive the Diffusion Evolution algorithm: an evolutionary optimization procedure based on iterative error correction akin to diffusion models but without relying on neural networks at all." They present pseudocode for an algorithm to demonstrate this.
Equations 1-3 are about the added noise, equations 4-5 are about reversing the process and using a neural network to estimate and remove the noise, equation 6 represents the process using Bayes' Theorem and introduces a representation using functions (f() and g()), and equations 7-9 are some plugging and chugging changing the representation of those equations to get the form where you can substitute back inte equation 5 as mentioned above.
"When inversely denoising, i.e., evolving from time T to 0, while increasing alpha-sub-t, the Gaussian term will initially have a high variance, allowing global exploration at first. As the evolution progresses, the variance decreases giving lower weight to distant populations, leads to local optimization (exploitation). This locality avoids global competition and thus allows the algorithm to maintain multiple solutions and balance exploration and exploitation. Hence, the denoising process of diffusion models can be understood in an evolutionary manner: x-hat-0 represents an estimated high fitness parameter target. In contrast, x-sub-t can be considered as diffused from high-fitness points. The first two parts in the Equation 5, ..., guide the individuals towards high fitness targets in small steps. The last part of Equation 5, sigma-sub-t-w, is an integral part of diffusion models, perturbing the parameters in our approach similarly to random mutations."
Obviously, consult the paper if you want the mathematical details.
"We conduct two sets of experiments to study Diffusion Evolution in terms of diversity and solving complex reinforcement learning tasks. Moreover, we utilize techniques from the diffusion models literature to improve Diffusion Evolution. In the first experiment, we adopt an accelerated sampling method to significantly reduce the number of iterations. In the second experiment, we propose Latent Space Diffusion Evolution, inspired by latent space diffusion models, allowing us to deploy our approach to complex problems with high-dimensional parameter spaces through exploring a lower-dimensional latent space."
"Our method consistently finds more diverse solutions without sacrificing fitness performance. While CMA-ES shows higher entropy on the Ackley and Rastrigin functions, it finds significantly lower fitness solutions compared to Diffusion Evolution, suggesting it is distracted by multiple solutions rather than finding diverse ones.
"We apply the Diffusion Evolution method to reinforcement learning tasks to train neural networks for controlling the cart-pole system. This system has a cart with a hinged pole, and the objective is to keep the pole vertical as long as possible by moving the cart sideways while not exceeding a certain range."
"Deploying our original Diffusion Evolution method to this problem results in poor performance and lack of diversity. To address this issue, we propose Latent Space Diffusion Evolution: inspired by the latent space diffusion model, we map individual parameters into a lower-dimensional latent space in which we perform the Diffusion Evolution Algorithm. However, this approach requires a decoder and a new fitness function f-prime for z, which can be challenging to obtain."
"We also found that this latent evolution can still operate in a much larger dimensional parameter space, utilizing a three-layer neural network with 17,410 parameters, while still achieving strong performance. Combined with accelerated sampling method, we can solve the cart pole task in only 10 generations, with 512 population size, one fitness evaluation per individual."
"This parallel we draw here between evolution and diffusion models gives rise to several challenges and open questions. While diffusion models, by design, have a finite number of sampling steps, evolution is inherently open-ended. How can Diffusion Evolution be adapted to support open-ended evolution? Could other diffusion model implementations yield different evolutionary methods with diverse and unique features? Can advancements in diffusion models help introduce inductive biases into evolutionary algorithms? How do latent diffusion models correlate with neutral genes? Additionally, can insights from the field of evolution enhance diffusion models?"
There are no comments yet.