top of page

Which is best for Avatar generation: Stable Diffusion Models Vs GANs

Updated: Nov 5, 2022

Avatar generation is currently done by using A.I based deep image generative models like Stable diffusion, DALL-E, Midjourney, GANs, etc. Generative adversarial networks (GANs) have been done extensive research for the past few years, due to the quality and accuracy of generative output they produce. Recently, another promising method is by using different diffusion models for accomplishing the same task. Both of them have found wide usage in the field of image, video and voice generation.

Stable diffusion generated images using text prompts

GAN generated images using input real image

Stable Diffusion:

Stable Diffusion is a deep learning, latent text-to-image model using A.I which got released in 2022. This project was derived from research done on High-Resolution Image Synthesis with Latent Diffusion by academia. It is primarily used to generate detailed images based on text descriptions (called prompts), though it can also be applied to other tasks such as inpainting, out painting, and generating image-to-image translations guided by a text prompt.

Stable Diffusion is a latent diffusion model, a variety of deep generative neural network developed by the CompVis group at LMU Munich. The model has been released by a collaboration of Stability AI, CompVis LMU, and Runway with support from Eleuther AI and LAION.

Stable Diffusion's code and model weights have been released publicly, and it can run on most consumer hardware equipped with a modest GPU with at least 10GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services.


Generative adversarial network

A generative adversarial network (GAN) is a class of machine learning frameworks designed and developed in 2014. Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning, and reinforcement learning.

The core idea of a GAN is based on the "indirect" training through the discriminator, another neural network that can tell how "realistic" the input seems, which itself is also being updated dynamically. This means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.


40 views0 comments

Recent Posts

See All


bottom of page