History of AI Picture Generators

When Did the First AI Image Generators Appear?

A painting generated by Aaron and hand-colored by its creator Harold Cohen

The earliest known AI image generator can be dated back to 1973 when the British painter Harold Cohen began working on his “computer-assistant” dubbed Aaron. The kinvention was tuned to understand fundamentals of the visual art and could autonomously generate artwork.

Further evolution of picture generators continued in the 2010s when new generative models based on deep learning were introduced: GAN, stable diffusion, and others.

Main Achievement that Influenced the Emergence of Image Generators

AlexNet

AlexNet debuted in 2012 at the ImageNet challenge. It is a Convolutional Neural Network (CNN), which includes 60 million parameters. Its architecture is based on 8 layers: 5 convolutional layers with occasional max-pooling, and the other 3 are fully connected layers.

It is considered a highly influential model: it was designed for image recognition, which has become one of the building blocks in image generation. Besides, it was the first GPU-trained model.

Generative Adversarial Networks (GANs)

The GAN technology appeared in 2014. In the core of the GAN model two components are put to rivalry: generator and discriminator. They follow the rules of the maximum likelihood game to minimize the Kullback–Leibler divergence as much as possible fbetween data and model. As a result, the generator is forced to produce a highly realistic output to trick the discriminator into believing that it’s real and not fake.

Diffusion Models

Diffusion models were first designed in 2015 at the Stanford university by Jascha Sohl-Dickstein. Typically, such a model includes three main components: forward, reverse, and sampling processes. Diffusion Adds noise step by step to the input samples in a controlled manner by employing a Markov chain. It lets the model detect intrinsic patterns and details of the target image, after which it reverses the procedure — this allows recovering the initial image “destroyed” with the noise or generating a unique one.

GAN-Based-Methods of Generating Images

The GAN-based frameworks and accompanying solutions include:

CelebA Database

CelebA is an extensive dataset with 200, 000 images — they include 10,177 identities with 40 attribute annotations. CelebA is widely used for training image generators, including those based on GANs.

DRAW

Deep Recurrent Attentive Writer (DRAW) is a model that integrates a unique solution: it emulates the human eye foveation with the help of spatial attention, while also adding variational auto-encoding framework to synthesize images.

AttnGAN

AttnGAN is a solution that, unlike many GAN-based models, focuses on the sub-regions of the image, as well as on the words in a user’s prompt most relevant to them. It also employs multi-stage refinement for fine-grained pictures, among all else.

StyleGAN

StyleGAN combines unsupervised separation of high-level attributes like a person’s identity with stochastic variation in the pictures: facial expression, wrinkles, skin pigmentation, etc. In turn it boosts the generation process with the scale-specific control.

BigGAN

BigGan is an enhanced model that features SAGAN architecture for long-range dependency modeling, TTUR component for improved generator/discriminator training, as well as Hinge Loss GAN that also contributes to more effective training.

ImageVista

ImageVista is a complex solution that puts Contrastive Language-Image Pretraining (CLIP) together with a BigGAN model. Its main purpose is in semantic relevance optimization. It also achieves data bias mitigation with the help of composite generation and increases the image-text alignment with initialization/overparameterization methods.

Some Diffusion Models and Text-to-Image Generators

Diffusion models include:

DALL-E

DALL-E combines both transformer and diffusion approaches. Each image (and text as well) is represented as an interrupted stream of 1280 tokens. The maximum likelihood approach is used during the training stage to achieve the best result possible.

Stable Diffusion

Stable Diffusion is based on the diffusion approach. Among all else, it features low-dimensional latent space that shifts focus from imperceptible details to more essential semantic data and reduces the cost of computation.

MidJourney

MidJourney is also a diffusion-based tool, potentially designed with the help of v-diffusion in its early stages. It might have used LAION open datasets that underwent progressive distillation or another similar procedure that speeds up sampling in the case of diffusion models.

History of AI Picture Generators

When Did the First AI Image Generators Appear?

Main Achievement that Influenced the Emergence of Image Generators

GAN-Based-Methods of Generating Images

Some Diffusion Models and Text-to-Image Generators

Sign up with email

Check your inbox