Abstract

I use a Bidirectional Generative Adversarial Network (BiGAN) [1] to generate fake images for Motorbike Generator Challenge [2]. Experiment results, Motorbike Generator task demonstrate that our method not only stabilizes GAN training but also achieves significant improvements over several state-of-the-art GAN methods.

The main contributions of this paper are the following:

  • Apply the BiGAN, reconstructions and confirm the usefulness of the learned representations by obtaining a performance competitive with state-of-the-art on Motorbike Generator task.
  • Improves learning for BiGAN outperforming several state-of-the-art GAN methods training on motorbike dataset by a few techniques such as preprocessing data, data augmentation and hyper-parameter tuning.

Introduction

Motorbike Generator is a task for generating fake motorbike images, make diversity motorbike images to can apply to many tasks such as recognize, detect motorbikes. We need the best method to get the best score on an available dataset. With the best method, can apply to many applications in computer vision, such as generating realistic human faces, cartoons translation, face swaps, aging face, image super resolution and generate new various datasets.

Motorbike Generator use a complex dataset, mean that contain many noise images in dataset, such as

  • Image background contains person, tree, tower, character
  • Different size of images
  • Two types of image: PNG, JPEG

Because complexity of dataset, some methods are new but no good result. To keep a lot of complex information for the discriminator, I use BiGAN method.

Background

BiGANs are related to autoencoders [3], which encode data samples and reconstruct data from compact embedding. Donahue et al. [1] show a detailed mathematical relationship between the two frameworks. Makhzani et al. [4] introduced an adversarial variant of autoencoders (AAE) that constrains the latent embedding to be close to a simple prior distribution (e.g., a multivariate Gaussian). Their model consists of an encoder , a decoder  and a discriminator. While the encoder and the decoder are trained with the reconstruction loss‖x-Dec(Enc(x))‖_2^2 (where x represents real data samples), the discriminator decides whether a latent vector comes from the prior distribution or from the encoder’s output distribution.

Figure 1: The structure of BiGAN.

Bidirectional Generative Adversarial Network: The GAN framework provides a mapping from to, but not another from to. Such a mapping is highly useful as it provides an information-rich representation of, which can be used as input for downstream tasks (such as classification) instead of the original data in simple yet effective ways [1], [5]. Donahue et al. [1] and Dumoulin et al. [5] independently developed the BiGAN (or ALI) model that adds an encoder to the original generator-discriminator framework. In Figure 1, the generator models the same mapping as the original GAN generator while the encoder is a mapping E(x;θE) from Pd to PE with the goal of bringing PE close to Pz. The discriminator is modified to incorporate both z and G(z) or both x and E(x) to make real/fake decisions as D(z, G(z);θD or D(E(x), x;θD), respectively. Donahue et al. [1] provides a detailed proof to show that under optimality, E and G must be inverses of each other to successfully fool the discriminator. The model is trained with the new minimax objective as shown in Equation 1.

Solution

For unsupervised training of BiGANs, I use the Adam optimizer to compute parameter updates, following the hyperparameters (initial learning rate α=1×10^(-4), momentum β1=0 and β2=0.909). The mini-batch size is 32. The latent space size is 64. For the GAN objective, I use Lipschitz constraint satisfied by the gradient penalty [6]. Evaluation is measured by Frechet Inception Distance [7]. By default, the augmentation used is flipping the image horizontally.

Averaging over samples coming from different distributions can produce worse results. So, I use Exponential Moving Average (EMA) [8] after every iteration t = 10 with update ratio β = 0.999.

For the training dataset, the Motorbike dataset collected by Zalo AI Challenge [2], contain about 10,000 motorbike images. Each image is downsampled to 128 x 128 dimensions.

Several new methods developed from the GAN for image synthesis, that I implement to compare

  • BigGAN is accomplished by splitting latent space into one chunk per resolution, and concatenating each chunk to the conditional channel vector which gets projected to the BatchNorm gains and biases
  • StyleGAN [9] is control an adaptive instance normalization, add noise image for the discriminator
  • Self-Attention Generative Adversarial Networks (SAGANs) [11] Armed with self-attention, the generator can draw images in which fine details at every location are carefully coordinated with fine details in distant portions of the image

I adopt the Frechet Inception distance (FID) (Heusel et al., 2017) [7] as primitive metric for quantitative evaluation, as FID has proved be more consistent with human evaluation. I fix α=10^(-15) and the cosine distance threshold is 0.05 for FID.

Comparing the difference in FID score of GAN models, in Table 1. From the table, we see the flip horizontal augmentation help us to improve FID score from 137.5412 to 136.0553.

Some of the new methods, such as StyleGAN[9] and Alpha GAN, are not suitable for generating fake motorbike images, the evaluation results are less than Original  GAN. With new method, CR-BigBiGAN[10], based on Flip Horizontal or EMA + Augmentation∗, the score of BiGAN is better. With new method, BigBiGAN, based on combines EMA and Flip Horizontal, the score of BiGAN is better than the score of BiGAN about 4 score (28.4417 vs 32.6050). From BiGAN column of table, we also see the EMA method help improve significantly, but Augmentation∗ or remove images with the person does not help improve score. So, I can use flip horizontal augmentation and EMA to improve the performance of BiGAN.

Table 1: FID Score on motorbike dataset with different methods.

Note: Flip horizontal + Sharpness 2.0 + Brightness 0.8 + Contrast 0.8

I train each model for more 200,000 iterations and present samples in Figure 2.

Figure 2: Some motorbikes generated by our BiGAN.

Application

Over a few years, a large number of interesting applications of the BiGAN have been seen. BiGAN has a very specific use cases that are useful in many business aspects.

  1. Image datasets play an important role in model training. The technique has been successfully used for generating example for image datasets that helps the scientist get ready. (Data is in dire need today as AI training problems are often lacking in data. To solve this, we need to generate more data for model training).
  2. Generation of clear images and new images with better augmented resolutions. GAN can be used to improve image quality from low to high resolution with strong effectiveness.
  3. In security, GAN can be used for detection of fake data like deepfake, or help catch criminals, especially those in disguise. The model can also be used to create aging face images year over year, and find people based on photos of them while younger.
  4. In business, GAN can use for digital product testing. For example: Online shops can upload images of their clothes, and customers can send their heights, weights, or images of their body to try on the clothes digitally.
  5. In education, GAN can be used to improve online learning by creating better and more formal characters, as well as more suitable backgrounds.
  6. In entertainment, GAN can be used to create new songs and more stimulating visuals.

The list goes on…

Conclusion

I have presented a Bidirectional Generative Adversarial Network for generating motorbike images base on a complex dataset. EMA and Flip horizontal are the two main techniques for improving the quantitative score better. Experimental results on Motorbike Generator task demonstrate that my proposed approach not only stabilizes GAN training but also leads to significant improvements. I also won the 4th place on the Final Leaderboard of Motorbike Generator Challenge in 2019 [2].

With a weight normalization technique is Spectral Normalization [11], which takes a lot of time to train. I plan to extend our GAN with this method:

  • Experiment on BiGAN method use Orthogonal Regularization, Self Attention and Spectral Normalization;
  • Experiment on BigBiGAN method use Consistency Regularization, Self Attention and Spectral Normalization.

Hoang Van Truong – Solution & Technology Unit, FPT Software

Reference

  1. J. Donahue, P. Krhenbhl, and T. Darrell. Adversarial feature learning. International Conference on Learning Representations, 2017.
  2.  VNG. Zalo ai challenge (https://challenge. zalo.ai). VNG Corporation in Vietnam, 2019.
  3.  A. Brock, J. Donahue, and K. Simonyan. Large scale gan training for high fidelity natural image synthesis. International Conference on Learning Representations, 2019.
  4. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. 2015. Adversarial autoencoders. In International Conference on Learning Representations (2015). arXiv: 1511.05644. Retrieved from https://arxiv.org/abs/1511.05644
  5. V. Dumoulin , I. Belghazi, B. Poole, A. Lamb, M. Arjovsky, O. Mastropietro, and A. Courville. Adversarially learned inference. International Conference on Learning Representations, 2017.
  6.  I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of wasserstein gans. AdvancesinNeuralInformationProcessingSystems, pages 5767–5777, 2017.
  7.  M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems 30, 2017.
  8.  Y. Yazıcı, C. Foo, S. Winkler, K. Yap, G. Piliouras, and  V. Chandrasekhar. The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations, 2019.
  9.  T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. Conference on Computer Vision and Pattern Recognition, 2019.
  10.  H. Zhang, Z. Zhang, A. Odena, and H. Lee. Consistency regularization for generative adversarial networks. International Conference on Learning Representations, 2019.
  11. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. International Conference on Learning Representations, 2018.
Related posts: