Neural Networks applications in valuation of banner ad creative efficiency

The technical aspects of building a solution to our problem of predicting advertisement banner efficiency. Rectified Linear Units activation function. A simple neural network architecture, trustworthy model. Visualizing convolutional neural networks.

Рубрика Программирование, компьютеры и кибернетика
Вид дипломная работа
Язык английский
Дата добавления 13.07.2020
Размер файла 2,8 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru

The Federal state autonomous educational institution

of higher professional education

National Research University Higher School of Economics

Faculty of Communication, media and design

Neural Networks applications in valuation of banner ad creative efficiency

Final qualifying research - Master's Thesis

Specialization 42.04.01 «Advertising and Public Relations»

Educational program «Data-driven Communication»

Belyaev Dmitry Dmitrievich

Moscow 2020

Abstract

This work explores the application of convolutional neural networks to advertisement banners, trying to predict whether the banners have a higher than average click-through rate or less than that. The data used in this work was sourced from Mediascope (company) internet monitoring as advertisement banners and their respective click-through rates. The topic of this work was motivated by a particular problem during the research phase in advertising campaign planning.

This work consists of a brief introduction to the topic, as well as the problem addressed in this work and laying the ground for the solution provided.

Then it is followed by a literature review section, which explores the application of neural networks in image classification, as well as techniques for improving model results.

The literature review is a ground for a theoretical framework, which briefly discusses methods and techniques which are important for building an accurate model and avoid overfitting.

Following the very technical parts of this work, in the solution section a real-life application is discussed, which actually motivated the topic of this work. The problem for which the solution is aimed is briefly discussed, together with the advantages and disadvantages of the proposed solution.

Finally, the results of the experiment, which came in a form of convolutional neural network model, are being discussed together with applied techniques, assumptions, and limitations. The visualizations are created to show how the model decides on the classification of an image.

And in conclusion we sum up the results of the work while talking about the application of the solution provided in this work in a new business era.

Introduction

Just a few years ago Artificial Intelligence (AI) was mostly a topic discussed by academics in computer science departments, futurologists, or just science fiction fans, and it seemed that we are still decades from actually interacting with it. Today this could not be any further from the truth, as we might not even notice it, but we interact a lot with machine-made decisions. Take for instance the recommendation systems, telling us what is frequently bought together with other products we have been browsing, chatbots, which help us with frequently asked questions, or even those masks people are using for their Instagram posts - all of these utilize AI techniques. While it is highly unlikely you will find yourself as the main character of the movie “Her” (https://www.imdb.com/title/tt1798709/) by tomorrow, we live in a time when developments in AI progress exponentially fast.

Such rapid development is usually explained by the availability of technology, developments in the theoretical base, and of course acknowledgment of the benefits which AI can bring to humanity. For example, AI in medicine can help the doctor with a diagnosis. And this benefit is being acknowledged by businesses as well, upon which they are trying to capitalize. Financial industry uses it for trading systems, while e-commerce uses it for the recommendation systems, as mentioned above, and so on. And it is only natural, that such a large industry as advertising would want to utilize such technological advances for their benefit.

To this date, there are already cases where advertising companies utilize AI for their business needs. Mostly, these use-cases include CRM marketing automation, media planning, and digital advertising optimizations. Particular industrial cases will be discussed further, as we progress through this work. However, there are many other domains of the advertising business that would benefit from technological progress.

“Advertising” in itself is a complex process, which includes a lot of planning, preparations, research, and most of all human opinion, expertise, and labor. Before we get to see this commercial brought to us by a large FMCG company it usually starts with a particular goal the company is pursuing. For this need, the company hires an advertising agency to help with the whole process. The advertising agency would usually do everything from research to actually deploying the advertising campaign out to the world to see, and everything in between. Usually, the process consists of defining objectives of the campaign, figuring out the budget available, thus evaluating the companies current standings. Then the research is done on the audiences and other things concerning the product or service being advertised. Further the means and ways of communicating the product or service and finally, when the campaign is deployed to evaluate the results. Because every part of this process is highly important, an increase in effectiveness in any stage could positively contribute to the whole experience of creating an advertisement campaign, as well as getting better results of the campaign.

One could never underestimate the power of setting the right goals, or the effectiveness of the advertising campaign, and so on. However, the customer will not see any of that, and will only receive the end result, the advertisement itself. And it is very desirable, that a customer produces a desirable action, such as acquiring the service or product advertised to him. And thus it is of utter importance, that the advertisement helps the customer to make such a decision. This is why the advertisement banners were chosen as a topic of this work.

So far it is highly debatable what makes a great advertisement banner - is it a smiling man, showing borderline madness happiness when acquiring a new vacuum cleaner, or a particular color which makes the food so attractive that we instantly would want to take a bite, it is highly opinionated. However, what could be certain is that there are marketing experts who have the “feel” for advertising and can tell whether the advertisement will achieve the desired result. This, of course, comes from years of hard work and experience in the field, and this type of competence is very valuable for the business. However, this instinct usually can not be formalized in a set of rules, that for example if an advertisement has a smiling man and a vacuum cleaner in his right hand with a grin on his face the advertisement would be very effective. But if the product being advertised is not a vacuum cleaner and instead it is a car, the same setting would probably not make much sense and would leave a customer wondering what is being advertised, the car or the vacuum cleaner. One can be certain that such disorientation is not a very good tactic, because clearly the message will not be communicated to the customer which defeats the purpose of the advertisement campaign. But there still could be something in common between successful advertisement banners in visual terms. If this would be possible, advertising agencies could be able to improve the effectiveness of their advertisements.

Currently, there is a whole separate process organized to choose between advertisement banners for the advertisement campaign. The research is done to determine preferences of the audience, then based on these preferences many banners are being designed to be further given to the focus groups to decide upon the “winner”. In the end, an advertising agency advises the client on these creatives, and maybe two or three on average are chosen from the initial hundred or so. The process is very time consuming, costly and is prone to human error and is very opinionated. There also arises a problem that focus groups do not know what advertisement banner would work the best for the client, because they have no experience, and marketing specialists usually do not share preferences of their focus groups, but have marketing expertise. This leads to a problem, that people who have no marketing expertise could potentially choose inefficient banners, while marketing specialists will use them, because this was the choice of the focus group, as they assume that they have chosen the best banners. Thus it would be great, if there was an algorithm, which would be able to distinguish between efficient and inefficient banners, and only give the focus groups the banners with most efficiency. This would not only save a lot of time for the agency and focus group participants, as it would decrease the amount of banners for inspection, it could also potentially improve effectiveness of the banners during the campaign, and thus the post-campaign results, which is the desirable outcome for the agency and its clients. As a potential solution for this problem, we propose a convolutional neural network model, which would predict the effectiveness of the banner based on the image itself.

It should be noted that when referring to banners it is meant that it is static, thus it is not a video or a GIF, but just an image. The dataset used for the task was sourced from Mediascope internet monitoring, and consists of the images, their IDs, and their click-through rates. Initially, the dataset included videos and GIFs, so they were removed. The click-through rates were presented as float representation of percentages, meaning that 3% was represented as 0.03. Images came in a separate archive, being named based on their ID. Each image ID had exactly one corresponding click-through rate value. So in the end the data looks as easy as it sounds - an advertisement banner with its click-through rate. In total, the dataset consists of approximately two thousand images.

The model used for this particular task is a convolutional neural network. Various architectures have been tried in order to see which one performs better on our data. However, the result of the model was also improved by applying additional techniques, which will be discussed in the section of this work devoted to the model. As a result, we received a model which could distinguish between efficient advertisements and less efficient ones.

In this work we not only explored convolutional neural networks for image classification, but also techniques which could improve models results, such as data augmentation, for example. After that, we discussed in more detail the problem, for which the model was proposed as a solution. Further, the training of the model was discussed with what was tried, what worked and what did not, and what we got in the end. Also, some examples were visualized with highlighted areas, which guided the model to classify the advertisement. The assumptions, benefits and limitations of the model were also thoroughly discussed in the section.

Literature review

Neural networks as an instrument became very popular as soon as computing power, data, and even programming packages for the purpose became available. And quite fast, neural networks started showing great results in applications to finance, medicine, security, automation, and so on. In this work, an application to the advertisement banners is explored and therefore quite a lot of literature was explored on the topic. However, before diving deep into the researches on convolutional neural networks for the advertising industry, it was also worthwhile to take a step back and take a look at techniques, which are used to improve the output results of the neural networks for image classification, as well as try to understand how does a network classify the images. And thus the literature on convolutional neural networks, techniques on improving their results, explaining these results, and then their application in the industry will be explored.

Model interpretability is a thing that does not come easy with neural networks. It contains too many parameters for anyone to be able to grasp them, thus techniques were developed to try to explain how a model perceives the data and comes to a certain output. One such technique is proposed in Interpretable Deep Convolutional Neural Networks via Meta-learning by Xuan Liu, Xiaoguang Wang, Stan Matwin (2018). It is called CNN-INTE, a post-hoc method of interpretation, to which the trained model is interpreted. The method tries to explain the model globally, in which it succeeds, and this is portrayed as one of the advantages of the method. The other advantage is that it does not lose accuracy obtained by an original convolutional neural network model, which calls for a faithful model and reliable interpretations. However, the method is quite complex and the explanation of the convolutional neural network would require a person with field experience in data science, statistics etc.

In Visualizing and Understanding Convolutional Networks by Matthew D. Zeiler and Rob Fergus (2013), the authors introduce a technique to visualize intermediate layers of neural networks. They trained large convolutional neural network models to classify images for the matter. For most parts of these networks, the visualized features were not interpretable, but rather the small properties of the image desired for the classification. They also showed that visualizations could be used to debug the models and that with the debugging previous models can compete with state-of-the-art results.

While exploring methods of visualizations, one candidate for this work was the method called SHAP, which stands for SHapley Additive exPlanations. This method was discussed briefly in A Unified Approach to Interpreting Model Predictions by Scott M. Lundberg and Su-In Lee (2017). Authors pointed out that large and complex models can make very accurate predictions, and as time goes their complexity rises. With that, the machine learning experts begin to struggle in the interpretation of these models, which in turn adds a lot of tension in using the model. That is why SHAP was introduced. It is designed that it assigns each feature a value identifying its importance to the prediction.

Do not trust additive explanations by Alicja Gosiewska and Przemyslaw Biecek (2019), as the title says, have explored additive explanation models, such as LIME and SHAP, and have outlined their drawbacks in explaining the results of machine learning models. Authors examined how these models performed large scale benchmarks and showed that additive models can be misleading. They also introduced a new iBreakDown method, that generated not only additive explanations. Additive explanations showed that some important model parts were omitted, which is a drawback of such explainers.

In A New Method to Visualize Deep Neural Networks by Luisa M Zintgraf, Taco S Cohen, and Max Welling (2016), authors presented a method visualizing a response of a neural network for a particular input. For images, in particular, the method highlights the areas which contribute to or against a certain classification. In the paper, the authors showed visualization for pre-softmax layers, as well as for hidden layers.

A Brief Introduction to Neural Networks by Richard D. De Veaux and Lyle H. Ungar was an introductory article, which introduced the intuition behind artificial neural networks, as well as their resemblance to a biological neural network of the human brain. The concepts provided in the paper served well as fundamental to further approach convolutional neural networks as a model used in this paper.

While researching the problems encountered in creating convolutional neural network models for image classification, one can stumble upon a shortage of data, which can not be easily overcome. Sometimes it takes financial resources, time and human labor to acquire new data, and in cases like medicine, new data can be even impossible to come by without a case occurring. Thus, authors of A survey on Image Data Augmentation for Deep Learning, Connor Shorten and Taghi M. Khoshgoftaar (2019), examine a lot of methods of data augmentation. Some simple, and some state-of-the-art, while discussing their advantages and disadvantages.

In “Why Should I Trust You?” Explaining the Predictions of Any Classifier by Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin (2016) discuss the “philosophy” behind human-model interaction. As with each interaction between humans, there are some things that can not be quantified, such as trust and faith. As with humans, trust and faith should not be blind. Since models are becoming increasingly complex, the question of how to actually trust the models' decision arises. The model can give accurate predictions, but for very wrong reasons, which can be dictated by unknown factors. For one, that could be a bias in the dataset. And now when the model has been trained and tested on the biased dataset when the model may start performing badly in the production environment. Thus it is better to have a notion of how a model makes a prediction. For this purpose, techniques such as Local Interpretable Model-agnostic Explanations (LIME) were developed. This technique is good for layperson explanations. In some cases, a developer is not well versed in the subject of model prediction, and a researcher is not well versed in deep learning, so explanations that would satisfy both sides would be useful.

An article from way back, Gradient-Based Learning Applied to Document Recognition by Yann LeCunn, Leon Bottou, Yoshua Bengio, and Patrick Haffner (1998) examined the application of multilayer artificial neural networks to the recognition of handwritten digits. Specifically, they found that models trained using backpropagation turned out to be the most accurate models, but also, that convolutional neural networks performed best.

In an article ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton (2012) the authors have trained a deep convolutional neural network using 1,200,000 high-quality images with thousand labels. The model was created for the ImageNet LSVRC-2010 contest. Their result overshot error rates of the state-of-the-art models. In their paper authors describe the architecture used in the model. In particular, they discuss how they decreased the training time of the model with the use of non-saturating neurons and reduced overfitting with dropout. Dropout techniques “turns off” weights arbitrarily, and with that, the model generalizes better. Their structure was inspected and some elements were added to the model used in this work, such as a dropout layer.

Advancements in Image Classification using Convolutional Neural Network by Farhana Sultana, Abu Sufian, and Paramartha Dutta (2019) was focused on convolutional neural networks for image classification tasks and their advancements. In particular, state-of-the-art models like AlexNet, ZFNet, VGGNet, GoogLeNet, ResNet, DenseNet, and SENet were compared to each other. As a result, they inferred that a combination of residual blocks and inception modules results in better accuracy than stacking these blocks over and over while testing on the conventional convolutional neural network model, ResNet, and GoogLeNet.

An article by Amazon researchers Bag of Tricks for Image Classification with Convolutional Neural Networks by Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie and Mu Li (2019) examined refinements implemented in the image classification process. These refinements include data augmentations and optimizations introduced in recent years, to which such progress achieved in the last years can be attributed. The purpose of their work was to inspect these advancements in a combination, as most studies focus on either one particular method or show the use of methods together in the source code. Their study has shown that the implementation of these methods improved results consistently in MobileNet, ResNet-50, and Inception-V3. However, they achieved even better results when stacking them altogether. In this paper, they examined model tweaks, by changing models architectures, like replacing 7 by 7 convolutional layers with 3 by 3, and also training refinements, which included label smoothing and knowledge distillation.

High-resolution image classification with convolutional networks by Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, and Pierre Alliez (2017) in their paper examines pixel-wise image classification. The imagery present in their work is aerial, meaning that these are photos taken from a flying object, such as a drone. They raise a problem of convolutional neural networks being popular in image classification problems, however, they are still quite challenging when it comes to producing quality classification maps because of a localization-recognition trade-off. Convolutional neural networks are impressive at recognizing objects, however, this comes at a cost of spatial precision. The authors proposed architecture specifically for this purpose. As a result, they put together a convolutional neural network model, where features were learned in different resolutions and then combined together. Such a model yielded higher accuracy and required less computational power.

Deep learning for image classification on very small datasets using transfer learning by Mengying Shu (2019) examined the problem of dealing with small datasets. Usually, deep convolutional neural networks are trained using large datasets, but sometimes a large dataset is not attainable, so a researcher has to work with what he has. As a result, when a dataset is small, convolutional neural network models overfit on this data. In this article, an author used pre-trained model weights of Inception V3, InceptionResNet V2, VGG16, and VGG19. The pretraining was done on the ImageNet dataset. The author applied various techniques which helped to battle overfitting even further, such as using dropout layers and data augmentation. His initial dataset included 6,000 images, 3,000 used for training, and reserved 2,000 and 1,000 for validation and testing respectively.

Performance Comparison of Pretrained Convolutional Neural Networks on Crack Detection in Buildings by З.F. Цzgenela and A. Gцnenз Sorguз (2018) used convolutional neural networks to detect cracks in the buildings. This is a very important application as the health of the building positively affects the safety of those who utilize it. The problem, however, is not as straightforward as it seems, because cracks are low-level features and are usually confused by the model with textures, backgrounds and irregular objects in the images. Additional problems exist, such as lighting, occlusions and etc. Nevertheless, authors successfully tackled this problem using pre-trained models, such as AlexNet, VGG16, VGG19, GoogleNet, ResNet50, ResNet101, and ResNet152, with AlexNet only taking a little over two minutes to train on a 28,000 images dataset. They showed that pre-trained models can be fine-tuned to detect cracks. In addition, they showed that the expansion of training datasets does not contribute to model accuracy. Also, features learned during the training proved to be applicable to other materials with high accuracy.

In Convolution Neural Network for Cooking State Recognition using VGG19 by Hemanth Potlabathini (2019), the author examined the challenge of teaching a robot basically how to cook on his own. For this purpose, the author required a robot to recognize the objects required for the recipe. For this purpose as the title states, VGG19 was used. The modified implementation of VGG19 was used in the process, while accuracy was tried to be improved for classifying eleven vegetables. Different regularizations, optimizers, and data augmentation techniques were used in the process. However, the main problem in the paper was that the objects change state and in this case, the vegetables change their state. In particular, when a vegetable is cut, for a model it might lose its label and a robot will not be able to identify it as an initial pre-cut object. The accuracy achieved by the model was around 58%.

Efficient Yet Deep Convolutional Neural Networks for Semantic Segmentation by Sharif Amit Kamran and Ali Shihab Sabbir (2018) examine the problem of semantic segmentation with deep convolutional neural networks being very GPU intensive. With that, the authors recommend techniques in order to make the training of a convolutional neural network for semantic segmentation require less computing power. One of the methods proposed was using pre-trained weights. These weights can be further trained to improve model accuracy. In their paper, authors also used VGG19 and VGG16 as their model architectures.

The Effectiveness of Data Augmentation in Image Classification using Deep Learning by Jason Wang and Luis Perez (2017) investigated and compared multiple ways of solving a problem of data augmentation for purposes of image classification. They artificially limited their dataset, taking a sample from ImageNet and compared data augmentation techniques with the received dataset. The results they received have shown that while more traditional augmentation techniques, like geometric transformation, works quite well, they found that techniques enabled through the use of CycleGAN also showed a lot of promise. Creating combinations from images also proved to be a great way to augment data, as it improved the accuracy of the model.

“Understanding data augmentation for classification: when to warp?” by Sebastien C. Wong, Adam Gatt, Victor Stamatescu, and Mark D. McDonnell (2016) explored how data augmentation benefits machine learning classification models. They used two approaches for data augmentation - data warping and synthetic over-sampling. The experiment was conducted using MNIST dataset, where they discovered that it is better to perform data-space augmentation, given the label is preserved in the process.

In Data Augmentation by Pairing Samples for Images Classification by Hiroshi Inoue (2018) author introduces a simple yet effective data augmentation technique, used for image classification tasks, called SamplePairing. In SamplePairing one synthesizes a new data point by overlaying an image on one another, thus N-squared number of images can be created, where N is the size of the dataset. The technique helps to expand the dataset as well as avoid overfitting in the limited dataset. Benchmark convolutional neural network models in the paper were used, such as CIFAR-10 and GoogLeNet, for which each model's accuracy increased for a given ILSVRC 2012 dataset.

A paper Adaptive Data Augmentation for Image Classification by Alhussein Fawzi, Horst Samulowitz, Deepak Turaga, and Pascal Frossard (2016) introduces a new algorithm, which automatically and adaptively chooses data augmentation techniques for data transformation. The main idea behind this algorithm is to seek the smallest transformations which account for the highest loss in classification. For this task, a trust-region optimization strategy is used where the sequence of linear programs has to be solved. The method was tested on two datasets and results have proven to be at the level of state-of-the-art models.

Comparing Data Augmentation Strategies for Deep Image Classification by Sarah O'Gara and Kevin McGuinness (2019) examine practices of data augmentation techniques, from most traditional to the latest ones, the dependence of their effectiveness based on the dataset size and when it is actually required to introduce data augmentation technique. One of the drawbacks of data augmentation is of course that it prolongs the training time of the model. This disadvantage is often overlooked in favor of achieving better accuracy, however, it can play a huge role if computing power is scarce. Dataset used in the paper is ImageNet ILSVRC. They examined Random and Gaussian Distortions, which yielded quite insignificant results. However, they achieved a significant result with random erasing. Positive results were also achieved with traditional data augmentation techniques.

In article Improve Image Classification Using Data Augmentation and Neural Networks by Shanqing Gu, Manisha Pednekar and Robert Slater (2019) show how image classification can be improved using convolutional neural networks and data augmentation techniques. The most common problems in this space are, of course, poor performance and overfitting. These problems were tackled by the authors by adjusting the sizes of the filters, adding convolutional, dropout, and max-pooling layers, optimizing the selection of hyper-parameters, and using data augmentation techniques. The model used for benchmarking is VGG16.

The review of the articles listed in this review significantly influenced the research and the model training process. Some of the articles even document their training process. For example, what helped them increase their results, as well as what was tried but did not improve the result, as well as what they were looking forward to trying out in the future. As a result, we have made ourselves familiar with techniques for training the models, architectures, and visualization techniques, which will be used in our model.

Theoretical framework

In this section, we will briefly discuss the technical aspects of building a solution to our problem of predicting advertisement banner efficiency. Unfortunately, building machine learning models can be quite a hassle due to many reasons which will also be discussed below. We will also explore methods which will help us battle these problems.

Neural networks

The model of choice for this work was a neural network, as was mentioned in the title. The choice was not arbitrary - currently, convolutional neural networks are a go-to model, when it comes to image processing. Therefore, as a part of the theory behind this work, neural networks are going to be discussed, so as to fully understand the model used.

Artificial neural networks, or simply neural networks, are models being used for machine learning problems, such as one discussed in this work - advertisement banner classification. The name for the “neural networks” was inspired by the biology of how a human brain works. The human brain consists of a large number of interconnected neurons, sending electric impulses to one another to activate them. Once all required neurons are activated, a human brain produces some action. Artificial neural networks work approximately the same, but instead of electrical impulses, they send over numeric inputs to the neurons. Similarity with the brain does not end there - models also “learn”, can make inferences and predictions even without having complete data, very much like humans. Of course, this comparison to the human brain is not essential, it is usually drawn to introduce neural networks to the reader, but it is still a good representation and very easy one to follow.

Formally speaking, a neural network is a network of neurons having connections between them. Each of the neurons in the context of machine learning is a predictor variable or a combination of them. Such neurons will receive input data, transform them, usually add together with some weights, and then send the output to the next neuron. These weights are the parameters discussed previously. In the beginning, they are unknown, and only determined during the training of the model.

So far, neural networks have been successful at many tasks, such as image, audio, and text recognition. They can also predict continuous values, which is solving a regression problem. And interestingly enough, they can also generate new data, based on the data they learned from, and this will be discussed in the data augmentation section of this work. In this work, the application of neural networks for image recognition is the point of interest. One of the most famous cases of a neural network application to image recognition is, of course, the case for recognizing images of handwritten ZIP codes written on letters in the post office. As of now, computing power has dramatically increased in the last 30 years since convolutional neural networks were used in recognizing handwritten numbers (LeCun, Bottou, Bengio, Haffner, 1998). Today, neural networks are used for more complex image recognition tasks.

Artificial neural networks are different from most statistical models in many ways. For starters, neural networks can have hundreds of thousands and even more parameters, while most would not have even a hundred. However, because of that, it becomes hard to actually interpret the results produced by a neural network model, and this will be discussed in a section devoted to the interpretation of deep learning models. So generally speaking, neural networks do not aid the researcher in understanding the nature of data at hand. This can be fine for tasks such as handwritten digits image recognition tasks because the post office does not really care about the intrinsic details of why a model classifies the digit as it did, but rather that it classifies it properly.

So far it was discussed that neural networks are “trained” and mimic humans' brains in this way. As with humans, neural networks can learn when someone is telling them the right answer. This is called a supervised learning. But this is not always the case, as with many things in life, no one gives a right answer, but humans learn nevertheless, and this is a different kind of problem. This is called unsupervised learning. An example of supervised learning is, like in this work, when there is data, images in this case, and this data is labeled with their respective class. Using this data a researcher is trying to predict a class of an image, using a model trained on a labeled dataset. With unsupervised learning there is no label, but even without the label one can still learn information from their dataset. For example, one can segment users in a finite group based on their preferences, which will be the inputs in the model, and thus determine clusters of the users. Thus it can be said that in unsupervised learning a neural network attempts to understand the data on its own. In this work, unsupervised learning will not be discussed, as the model in this work is dealing with classification, the supervised learning. In this work a feedforward convolutional neural network was used as a model to classify advertisement banners. Let's discuss how a basic neural network looks in more statistical terms, rather than it was done previously.

Biological neurons in an artificial neural network are simulated by an “activation function”. An activation function is what it sounds like - a function that activates. This activation occurs when the value of parameters is greater than a certain value, as an example in ReLU activation function depicted in Figure 1. If the sum of values in that neuron is greater than zero, ReLU function yields greater than zero result, otherwise ReLU returns zero. However, there are many other activation functions, such as a sigmoid function in Figure 2.

Figure 1. Rectified Linear Units activation function

Figure 2. Sigmoid activation function

As can be seen, sigmoid function is “softer” around the negative values. Also, it is worth mentioning that an activation function can dramatically affect computing power required and therefore the speed of learning of the model.

As in the brain, neurons are connected in a hierarchical network, and thus they send electrical impulses in the next neurons. These networks can be represented in layers of nodes. Such a node represents weighted input, which are summed together and put into an activation function, which produces an output. So a node in some sense is a house to the activation function, inputs and the output of the activation function. The weights in this context are real numbers, which are the “parameters” of the model, which the model “learns” during the training. These weights are usually multiples of the inputs from the other neurons. There also exists one special parameter, which is not a multiple of any input, and is just added to the sum of weighted inputs and it is called bias. Bias is usually added to stimulate an activation function to turn on more often.

Given the previous overview, it is now clear how the anatomy of a neural network looks like. Now all these elements would be combined into layers of a neural network. A simple neural network as in Figure 3 structure would consist of an input layer, hidden layer and an output layer. A neural network model with two or more hidden layers is called a deep neural network. This is not the only type out there, as there are also liquid state machines, or Boltzman machines.

Figure 3. A simple neural network architecture

Since the architecture of a simple neural network is now clear, it should be also explained how the training process looks like. As was previously mentioned, nodes contain weights multiplied by input values from other neurons. A node links these weights and values together. When supervised learning is done, the model aims to train itself in order to reduce an error between input and output. So to speak, if the model yielded a result, which differs from what it actually should be, it readjusts the weights in a way that would make the input data closer to the result, thus diminishing the error. This is possible in supervised learning, as there already exist pairs of input and output data, and based this weights will be adjusted to minimize the error between them. But if the weights would be varied arbitrarily, one could may never achieve any result, as there are infinite amounts of possible weights values, and since there are quite a lot even in simple models, that makes it even more impossible. However, there are ways to adjust these parameters.

One such method is called gradient descent. It is a method, in which a gradient of error of weights is calculated in point, in order to find a minimum of the function by iteratively moving in the steepest direction, which is appointed by a negative gradient. This iterative movement can occur at different speeds, which is known as the learning rate. Learning rate can be small, meaning that it will take a model longer to learn, but taking a large learning rate can simply overshoot the minimum point and the model's minimum will never be achieved.

When talking about iterative minimization of error between output of the neural network and the input, weights were varied through gradient descent. But there is also a more generalized way to reduce error, while also preventing overfitting. The optimization is usually formulated in terms of cost function. There are as well as activation functions many cost functions, each having their application in certain problems. In the case of this work, a binary cross entropy.

In this work a convolutional neural network is used, and they are very similar to the model discussed above, but of course, there are differences. First of all, non-convolutional neural networks do not scale well with large images. For example, a small, colored 300 by 300 pixels amounts to stunning 270,000 weights. Larger images will amass these weights very fast. The weight corresponds to each pixel in the image. It is also worth noting, that one would most likely have more than one such neuron. This large amount of parameters will definitely lead to an overfitting.

What makes a convolutional neural network differ from the ordinary neural network is that it usually has a pooling and a convolutional layer, as well as a fully connected layer. Let's take a look at each of these individually.

A convolutional layer is something that solves the problem. The ordinary neural network takes each pixel in the image as a parameter, ignoring that pixels are correlated with each other. Thus there has to be a way to account for it. In order to analyze this correlation, a filter is used. This filter represents a matrix, usually 3 by 3, which moves across the image from left top and to right bottom. A value is calculated for each point based on this filter by operation of convolution. And these filters can associate with just about anything, from seeing humans ears and how many times these ears appear in the image. And thus through convolution, a number of weights is reduced, and also the location of the feature does not affect its identification much. After an image has been progressed through by the filter, it generates a feature map. These feature maps are then put into an activation function, which determines whether these features are actually present in the image. More and more filters can be added to create more feature maps, making the features more and more abstract as a deeper convolutional neural network is created. After that, a pooling layer can be used to select largest values, or average values in some cases, to use these values as inputs in forthcoming layers. Average values are not as typical as maximum values, because usually it is desired to find an outstanding value, because these are the values through which a network identifies the feature. After the filter has been applied and feature maps have been created, they add up together and enter another input layer. Thus if a hundred filters were applied to the image, the third dimension of the output of that layer will be the same as the amount of filters, a hundred. A fully connected layer is present in the near end of the convolutional neural network, just before the classification result is output. They are similar to the multilayer perceptron neural network model in this sense, such that a fully connected layer is used to flatten classification results.

Taking a step back, when a convolution is applied to an image, the result is downsampled to the original image size by the size of the applied filter. This leads to a loss of data, not a huge one, but still a loss, and it should be omitted somehow. Usually, a technique called padding is used. Also, imagine if there are many filters applied in the convolutional neural network. If that is the case, one can be left out with a largely reduced output. Thus, there are padding techniques, which introduce pixels around the image with value zero to match the size of the matrix, so that the size of an image is not lost in the convolution. Also, it is not an obligation to use full padding, as one can approve loss of some image size in convolution.

With that, the convolutional neural networks architecture is fully described, as well as why it was chosen as a model for the task at hand. For the given reasons above, convolutional neural networks are the first choice models considered for image classification problems.

Data augmentation

The upside of using convolutional neural networks is that they are very good at computer vision tasks. But this comes at a cost of heavy reliance on large amounts of data. And not only do we need more data than usual, but we are also highly likely to encounter a problem of overfitting on our data. Overfitting is a phenomenon, in which a neural network does not generalize well, because it has learned to perfectly model the training data. This becomes an additional problem in fields with not a lot of data - not only do we lack the data, but we are also overfitting on it. And one such area, as it happens to be, is advertising banners. Advertising agencies would love to have a lot of banners for testing, but their variety is usually quite limited. This is usually due to brand safety reasons, rather than time and money constraints. Clients would also love to test what works the best, but if they create too many advertisements, they will just confuse the customer and it will bring more hassle than benefit. Other domains include medical image analysis, which is surprising but has an explanation - for some illnesses you just can not produce a lot of imagery. For example, having a lot of X-rays for one person can be hazardous for their health, and some illnesses do not come by often.

With a combination of big data and high computing power deep learning models can live up to their fullest potential. One of them being tremendous at discriminative tasks. And because of that, they are widely used in applications, which solve computer vision tasks such as object detection, image classification, and detection. As was previously discussed, convolutional neural networks are currently the best fit for such tasks.

As of now, there exists a range of benchmarks on solutions of particular problems, such as MNIST dataset image recognition and many others. These benchmarks are consistently improved by new findings in the machine learning field and this will be discussed further. These benchmarks are beaten by improving models generalizability - models ability to generalize. It is considered one of the most difficult challenges in machine learning. Such benchmark datasets also include CIFAR-10, ImageNet, MIT places, Street View House Numbers (SVHN), and many more. Usually, however, articles and researches focus on CIFAR-10 and ImageNet, because they are so big that they are easily classified as “big data”.

When referring to generalizability it is usually meant that the model should perform well on new data, given it has been taught on relevant data before. That is - if a model has seen a bunch of golden retriever images, which were labeled as a dog, it should identify another clear picture of golden retriever as one. However, one should not expect a model to identify an image of a corgi dog as a corgi dog, because the model was trained to discriminate between golden retrievers and not golden retrievers, and not identify corgi dogs.

So if the generalizability is poor, the model will not be able to identify a golden retriever, even if the picture is very clear. This happens due to overfitting, which can be discovered during the models' training by comparing errors in predicting the test data. If during the models training validation error consistently goes up compared to previous epochs this is an example of the model overfitting. As is with everything - everything is good in moderation, and as it turns out it is the same with the model training. Too much training can lead to performing really well on the training data, but bad on the test set, which contradicts the objective. Data used for training is already known. Unless the same type of data comes up all the time, and which by the way would defeat all the purpose of the machine learning algorithm, it is highly desired that a model performs well on the unknown data. What is desirable, is that the testing set loss function goes down together with the training set loss function. However, there will be a moment when testing loss function will plateau, meaning will not decrease beyond a certain value, training should better be stopped at that epoch. That is the desired relationship, of course.

Validation error, once it plateaus, will not decrease by itself. Some action must be taken in order to improve the situation. To solve two problems - not a lot of data and overfitting - we can use a technique, called data augmentation. Actually, data augmentation is a suite of techniques which can increase the size and the quality of training datasets, so that better deep learning models can be built upon newly acquired data. Some image augmentation algorithms will be discussed, such as color space augmentations, geometric transformations, mixing images, kernel filters, feature space augmentation, random erasing, generative adversarial networks, adversarial training, meta-learning, and neural style transfer.

This is where data augmentation comes in place. It will introduce new data points and add variation to the data, which should improve generalization. While there are many other techniques that also serve that purpose it will be one with the primary focus in this work, because it was used in the final model. Not all of these techniques ended up being used in the model, but they still deserve to be mentioned. Some of the techniques were considered to be used in the model but were rejected and motivation on such decisions will be explained further.

Data augmentation approaches overfitting problems from the heart of it - the training data set itself. Of course, it is implicitly assumed, that additional data may be gathered from the original set, as we artificially increase the size of the dataset by warping it or oversampling. When mentioning data warping it is also meant that even if the image is distorted or altered, the label should be preserved. Examples of such augmentations would be color and geometric transformations, as well as neural style transfer, adversarial training, and random erasing. Oversampling augmentations would include the usage of generative adversarial neural networks to generate additional data directly, feature space augmentations, and mixing images. It should be noted that data warping and oversampling methods are not in any way mutually exclusive, which means you can apply both if that serves the purpose of your modeling. It is absolutely fine to rotate an image produced by a generative adversarial neural network or change the color of the rotated image or other ways to combine image augmentation methods. It should also be noted that the same method can be used more than once if it produces unique data input. An example of that could be using rotation - 90, 180, 270 degrees, and any other number, as well you can use different color palettes as long as it does not alter the label of data.

Usage of image augmentation techniques is a subject of the context of a problem the model is solving. For example, let us assume we want to differentiate between a bike and a motorcycle. Bikes and motorcycles come in many colors, they can be photographed under different lighting, etc. Most likely we can say a bike is a bike even if it is standing on one wheel (rotated for this example), same for motorcycles. So we assume that color and geometric augmentations will not adversely affect the image recognition model, but let's say if we slim down the motorcycle. A slim motorcycle could be easily taken for a bike by a human and the model may interpret it as such. However, it will have an opposite label and this will add some unnecessary noise into the models' weights. Thus it pays off to be careful about augmentation techniques being used and check if they help the model with the task.


Подобные документы

  • Понятие о нейронных сетях и параллели из биологии. Базовая искусственная модель, свойства и применение сетей. Классификация, структура и принципы работы, сбор данных для сети. Использование пакета ST Neural Networks для распознавания значимых переменных.

    реферат [435,1 K], добавлен 16.02.2015

  • Решение задач прогнозирования цен на акции "Мазут" на 5 дней, построение прогноза для переменной "LOW". Работа в модуле "Neural networks", назначение вкладок и их характеристика. Построение системы "Набор программистов" нечеткого логического вывода.

    курсовая работа [3,2 M], добавлен 26.12.2016

  • Модели оценки кредитоспособности физических лиц в российских банках. Нейронные сети как метод решения задачи классификации. Описание возможностей программы STATISTICA 8 Neural Networks. Общая характеристика основных этапов нейросетевого моделирования.

    дипломная работа [1,4 M], добавлен 21.10.2013

  • Технологии решения задач с использованием нейронных сетей в пакетах расширения Neural Networks Toolbox и Simulink. Создание этого вида сети, анализ сценария формирования и степени достоверности результатов вычислений на тестовом массиве входных векторов.

    лабораторная работа [352,2 K], добавлен 20.05.2013

  • Overview of social networks for citizens of the Republic of Kazakhstan. Evaluation of these popular means of communication. Research design, interface friendliness of the major social networks. Defining features of social networking for business.

    реферат [1,1 M], добавлен 07.01.2016

  • Information security problems of modern computer companies networks. The levels of network security of the company. Methods of protection organization's computer network from unauthorized access from the Internet. Information Security in the Internet.

    реферат [20,9 K], добавлен 19.12.2013

  • Тестування і діагностика є необхідним аспектом при розробці й обслуговуванні обчислювальних мереж. Компанія Fluke Networks є лідером розробок таких приладів. Такими приладами є аналізатори EtherScope, OptіVіew Fluke Networks, AnalyzeAir та InterpretAir.

    реферат [370,5 K], добавлен 06.01.2009

  • Сущность и понятие кластеризации, ее цель, задачи, алгоритмы; использование искусственных нейронных сетей для кластеризации данных. Сеть Кохонена, самоорганизующиеся нейронные сети: структура, архитектура; моделирование кластеризации данных в MATLAB NNT.

    дипломная работа [3,1 M], добавлен 21.03.2011

  • Consideration of a systematic approach to the identification of the organization's processes for improving management efficiency. Approaches to the identification of business processes. Architecture of an Integrated Information Systems methodology.

    реферат [195,5 K], добавлен 12.02.2016

  • Description of a program for building routes through sidewalks in Moscow taking into account quality of the road surface. Guidelines of working with maps. Technical requirements for the program, user interface of master. Dispay rated pedestrian areas.

    реферат [3,5 M], добавлен 22.01.2016

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.