ML PAPER: NEURAL STYLE TRANSFER — TL;DR

Sneha Ghantasala
2 min readDec 27, 2020

Research paper: “A Neural Algorithm of Artistic Style” by Leon A. Gatys, et. al published on September 2, 2015.

Disclaimer: These are just notes and lot of the text is taken from the paper.

The paper describes and formulates Neural Style Transfer. Its main takeaway is an algorithmic understanding of how neural representations can independently capture the content and style of an image. It has described a method for synthesizing images that mix content and style from different sources. This independence has been mathematically formulated in the loss function which is the sum of both the content loss and style loss. The paper has described the style of an image as the general texture and has referenced this definition.

The paper has combined the following ideas described in various other papers elegantly:

1. Using trained VGG networks (trained for object recognition) to get encodings for images for content comparisons between images.

2. Using existing visualization techniques to visualize the information in intermediate layers by reconstructing the image from feature maps of that layer.

3. Using existing ideas and formulae for calculating the style of an image.

4. Relating the method of style computations to how neurons in the biological eye work.

The way an image is generated, as described by the paper, is as follows:

1. Initialize the generated image to a white noise image.

2. Improve the generated image’s content by comparing the activations of the content image and the generated image (of a chosen layer in the network) by passing these images separately through a trained VGG-19 network (A CNN which has very high accuracy rates for object recognition) and changing the pixel values using gradient descent (Content Loss function).

3. Improve the generated image’s style by minimizing the difference in gram matrices of the generated and style images. Gram matrices are feature correlations between the different filter responses of a layer in the CNN. The paper mentions that we could take a weighted sum of this loss for different layers for computing the Style Loss function. The style of the image is improved in every iteration using gradient descent as well.

4. The overall cost function is regulated by two parameters — alpha (α), beta (β) — and the ratio α/β is the emphasis on matching the content of the content image and the style of the style image.

The paper states that for image synthesis, replacing the max-pooling operation by average pooling improved the gradient flow and obtained slightly more appealing results. But there were no examples or metrics presented for the same.

Overall, the paper provides a fascinating tool to study the perception and neural representation of style and content-independent image appearance in general.

--

--