Probabilistic graphical models in customer analytics: comparison with classical predictive models

This paper presents the application of different methods in order to have a complex vision on customers’ churn problem. It detects which algorithms can be used not only for churn prediction but also for churn prevention issues. Churn prevention analysis.

Рубрика Менеджмент и трудовые отношения
Вид дипломная работа
Язык английский
Дата добавления 25.08.2020
Размер файла 1,3 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru/

FEDERAL STATE EDUCATIONAL INSTITUTION

OF HIGHER EDUCATION

NATIONAL RESEARCH UNIVERSITY

HIGHER SCHOOL OF ECONOMICS

Saint Petersburg School of Economics and Management

Probabilistic graphical models in customer analytics: comparison with classical predictive models

Master's thesis

In the field 38.04.02 `Management'

Educational programme

`MANAGEMENT AND ANALYTICS FOR BUSINESS'

churn prevention analyses

Gritsay Polina Aleksandrovna

Makkoveeva Arina Maksimovna

Samkina Alena Aleksandrovna

Saint Petersburg 2020

Abstract

In a highly competitive business environment customer churn is an inevitable issue. Traditional methods of churn prediction are quite accurate. However, the utility of such methods in terms of managerial decisions is rather poor, because these methods do not show the indirect variables connections. Thus, there is a fundamental gap between churn prediction and churn prevention problems. This paper presents the application of different methods in order to have a complex vision on customers' churn problem. It detects which algorithms can be used not only for churn prediction but also for churn prevention issues. For our research, we used Logistic Regression as a representative of classical methods of prediction analysis, Random Forest and eXtreme Gradient Boosting as two examples of Machine Learning algorithms, and Bayesian Networks as a graphical model example. All analysis was based on publicly available data from telecom industry and performed in R. It was figured out that Bayesian Belief Network's model prediction accuracy is comparable to other methods, while it can additionally provide sufficient affordances, which will help to develop design prevention strategies.

Key words: Customer churn, churn prediction, churn prevention, machine learning techniques, Bayesian Belief Network, Random Forest, eXtreme Gradient Boosting, Logistic regression, telecom

Table of contents

Introduction

1. Churn management in customer analytics

1.1 Churn prevention analysis

1.2 Churn prediction analysis

2. Methods of churn analysis

2.1 Traditional methods

2.2 Machine learning algorithms

2.3 Graphical models for predictive analysis: Bayesian Belief Networks

3. Research design and methodology

3.1 Methods

3.2 Data

4. Analysis and results

4.1 Case descriptive statistics

4.2 Logistic Regression, Random Forest and XGBoost

4.3 Bayesian Belief Network analysis

4.4 Customer profiles analysis

Conclusion and discussion

References

Appendice

Introduction

Currently, almost all areas of the market are crowded with various representatives, which gives consumers the opportunity to choose which company to turn to for services and purchase goods. More and more companies are being created, that is why competition in markets is growing rapidly. It, in turn, leads to the fact that the risk of customer churn is increasing, since with the advent of a slightly more advantageous offer from another company, the consumer will easily switch to competitors. Churn means a loss of customers, expressed in the absence of purchases or payments for a certain period of time. For some areas of activity, such as the sale of real estate, the concept of churn is not applicable, since purchases are not regular. But for companies with a subscription and transactional business models, involving regular payments to the company, the churn indicator is extremely important (Knox & Oest, 2014). These are banks, telecom operators, SaaS services. That is why our study is based on the data of telecommunication companies, as one of the areas with high importance of the churn indicator. However, there is a fundamental gap between churn prediction and churn prevention problems, and these problems are viewed from different angles (Ascarza, 2018). The issue is in the increase of this gap: extremely accurate models for churn prediction are now built by technical specialists with usage of various algorithms and additional tools, while the usefulness of them for resolving managerial issues is not justified so far. As a result, we have accurate models, while they are usually not applicable for decision making (Ascarza, Iyengar & Schleicher, 2018). Finally, the churn rate does not really decrease, and customers are not prevented from churn.

Still, in order to avoid losing customers, companies use churn prediction models. Existing customers of the company are divided into risk groups based on historical data about users themselves, their purchase decisions and their history of various services used. This division helps companies understand how likely it is for users of each group to leave, and this knowledge, in turn, makes it possible to prevent churn. By understanding the probability of churn, the company has the opportunity to make a personal offer and prevent the loss of a client. Making customer churn forecasts allows firms to save millions due to the cost of customer retention is much less than the cost of attracting new ones (Pfeifer, 2005; Calciu, 2008). Moreover, an increase in customer retention of 5% leads to an increase in company profits of more than 25% (Reichheld & Kenny, 1990). Finally, personalized offers save companies money due to their pointiness. For example, there is no need to lower prices for everyone, just do it for one client who is at risk. This will allow, firstly, not to lose the “additional” income from customers who are satisfied with generally available tariffs, and secondly, to keep the client who was close to leaving, that is, continue to earn on him, though, possibly, smaller amounts, but still the amounts. Analysis and predictions are based on the socio-demographic characteristics of users, the specifics of their purchases and the use of certain services. Moreover, the telecommunications sector allows building user networks, which provides even more data for analysis and improving the accuracy of predictions. All this data helps companies understand the importance of each user for the company and find the “right” customers.

The topic of grouping and predicting consumer behaviour is not entirely new. More and more studies are emerging in predicting customer churn, which once again demonstrates the enormous relevance of this problem. In various works, based on completely different data from different areas researchers make predictions using one or another method (Huang et al., 2010; Yang et al., 2018, Li & Li, 2019). In total, three groups of methods can be distinguished: classical, which includes regression and time series models, methods based on machine learning (ML) algorithms, which include Decision Trees, Support Vector Machine (SVM), Random Forest (RF), Boosting and others, and graphical models, namely, Bayesian networks (BN). Despite the popularity of the topic and a large number of articles comparing various methods (Vafeiadis et al., 2015; Borrotti, 2018), there are no studies where classical and graphical models are encountered.

Traditional machine learning models do not show us the full picture, and as a result it is difficult to design any prevention strategies. Firstly, we do not see the relation between the independent variables. Secondly, they do not allow us to see what will happen with different customer groups (profiles) with the changing of one of the factors. It is no coincidence that Bayesian networks were added to the analysis, because unlike other methods, they are able to show the connections between all variables and highlight specific patterns in order to see what happens if we move price up or down or perform other actions. Then for experimenting with the tariffs of suggestion programs for different customers' groups (profiles), we could create the model and have an approximate prediction of how it can work in real life. It gives us the understanding of what moment do we need to pay attention to the client at risk (something has changed in the customer's behaviour: he started to use less services or even stopped using them). Thus, the importance of our work is not in the getting of new knowledge for telecommunication services, while reviewing the approach for the further usage with the new data. Researchers used Bayesian Belief Networks in similar tasks, for example, where customer's satisfaction was studied, while not the churn prediction primarily. Therefore, we decided to look deeply and see if usage of Bayesian Networks will give us wider understanding in the churn problem, what will provide us with developing and further intervening policies for churn prevention. Our work is more methodological, and the main goal is to compare methods of churn prediction not only by predictive power but also by affordances they give for decision making and churn prevention policies.

According to the results of the Bayesian Network analysis, it is possible to see what and how affects not the dependent variable only, but also other independent variables, which already directly affect the dependent (mediate effect). This is important to do in order to understand how to deal with customer churn. Often, it seems that the reason is the price and a company needs to give discounts on its products to customers who are at risk. However, in reality, the matter may be completely different, it can be an intangible factor (Kisioglu & Topcu, 2011; Chakraborty et al., 2016), and Bayesian Belief Networks allow us to see this. Based on the literature, we selected methods from different groups with the best indicators of the accuracy of predictions. These methods were further used in our comparative analysis. Despite not having a particular research question, we put forward the task we resolve in our paper: to compare methods and explore the features of the Bayesian Networks tool for solving the problem of predicting client outflow. This work is more methodological and aimed at comparing methods, while obtained during the performed analysis results were further be used as recommendation tools for telecommunication business in order to prevent customer churn or in order to improve the retention programs, or loyalty programs for company's clients. The results received while performing analysis (and moreover, the workflow of the analysis) can be used not only for telecommunication business, but also for other service companies, which offers clients Internet, TV, mobile and other "online" services.

The structure of our work is formed as follows: the first chapter is devoted to what role churn management plays in database marketing in order to specify the field of our study. The following three chapters are devoted to various groups of methods: from simple classical models to more complex - graphical models and their applications to solve the problem of predicting customer churn. Next is the methodological part, where we describe in detail the methods we have chosen for the practical part of our work and the data on the basis of which we built predictions and compared models. Finally, the sixth chapter describes the results of the work, and in conclusion, there is a discussion part and a comparison of our results with the earlier studies.

In our paper we used several measures for the description of models. All the values of these measures were calculated with the help of confusion matrices. Accuracy - the proportion of correctly classified values over the total number of observations. Sensitivity - the proportion of positive values predicted correctly. Specificity - the proportion of negative values predicted by the model correctly. Precision measures how good the model is at assigning positive values to the positive class. This value is strongly related to the level of specificity of the model. Recall - is the proportion of positive values accurately predicted by the model. Recall equals sensitivity. The most important measure in the model for churner's identification is recall rate (“how many churners we can find”), thus in order to have higher prediction power for the model we need to improve the recall rate for the model.

1. Churn management in customer analytics

The business environment becomes more competitive and overloaded with information, and has tough investment payback requirements. In such conditions, database marketing turned out to be a crucial tool for accomplishing the marketing's fundamental goal - to increase customer value (Blattberg, Malthouse & Neslin, 2009). Database marketing is defined as the using customer databases to improve marketing effectiveness by attracting, retaining, and developing customers more effectively. Thus, database marketing comprises three main components. They are (1) the use of current or potential customer databases, (2) the marketing productivity concept, which helps identify whether the firm's marketing efforts pay off, and, finally, (3) customer management, which is about customer acquisition, retention and development. In other words, it means that using clients' information database marketing is aimed at evolving marketing productivity by obtaining the customer's attention to start mutual business interactions, then making sure the customer proceeds doing business with the organization and, in addition, extending the amount of business interactions the customer performs with the company (Ascarza, 2018). No matter how good database marketing is, it brings nothing until implemented. Here the concept of “customer-centric” organization appears. The structure of the company should be built around the customer in order database marketing to be successfully embedded. Going deeper into this process, customer management is one of the core elements in a customer-centric organization. The chart below was created to visualize the theoretical framework of the research problem (Figure 1).

Figure 1. The chart showing the escalation of the theoretical context of the research field

The idea of the customer management is to divide customers into groups (portfolios) by an appropriate clustering, for example, by sales amount, and manage them, meaning adopt acquiring, retaining and developing strategies according to each customer group. Consequently, the customer manager has to maximize the lifetime value of the customers involved in the manager's portfolio by supporting close connections, understanding customers' needs, offering customized products or services, thus making company more and more valuable for the customer and enhancing company's margin per customer. There are some arguments for moving to a customer management system (Blattberg, Malthouse & Neslin, 2009). First of all, it increases customer satisfaction by satisfying customer true needs. Secondly, it makes up sustainable advantage, knowledge and competences which are hard to be duplicated by the rivals. Thirdly, it cares about customer lifetime value which plays by definition in long-term perspective. Finally, it became feasible because now firms are able to collect, store, process and analyse huge amounts of data.

Along with database marketing techniques, such as customer acquisition program, cross-selling and up-selling, customer tier and frequency reward programs, which are oriented on acquiring and developing the customer, there is a need of making sure all these efforts are not dumped because the customer opted out of the company's services. This issue is a concern of the churn management. Quantitative methods, for example, predictive models, are concentrated on proactive churn management goals, when the customer might have an intention to churn, but has not directly stated his or her desire to leave the company yet, the opposite of which is the case of reactive churn management (Ascarza, Iyengar & Schleicher, 2018). In this case, the proactive churn management challenge is to reach the customer in advance and to offer a service or stimulus, created with the help of the predictive models to prevent churning of the client. The reasons why customers might leave the company are various but can be classified by category. There are customer satisfaction factors for churn, factors related to the level of switching costs, customer characteristics, marketing, within and between category competition (Ascarza, 2018).

1.1 Churn prevention analysis

Several studies concentrate their attention on customer churn prevention design and suggest relevant policies for churn management. As one of the tools to reset the relationship with a client after a complaint or a bad purchase experience, recovery proved to be efficient in preventing customer churn (Knox & Oest, 2014). The research is done based on the panel data of 20 000 customers in retail industry and developed a customer base model in order to explore the recovery effectiveness in customer churn prevention. The results have shown that recovery compensates the negative effect of a complaint but does not completely cover it. Nevertheless, authors invite to invest in decreasing of probability of failure when possible and undertake recovery measures when necessary, because in majority of cases it is costly yet effective.

Another tool for preventing customer churn was discussed by Ascarza, Iyengar and Schleicher (2018). The authors suggest that recommendations of pricing plans to the company's clients can have not only a positive impact on the client churn, despite the expectations that customers will be less likely to churn getting more benefits from the company's offer. They split the sample into two for conducting the experiment: one group of customers was offered the pricing plan, while another was not. The results have shown that proactive approach of prompting clients to switch to more cost-efficient tariff can even increase customer churn. After three month of observation the churn rates of two groups were 6% for the non-influenced by new program offer group and 10% for another group. The possible explanations for this, both proved by data, are reducing customer willingness in switching plans and enhancing the significance of previous usage habits of potential churners. Finally, some advice was given to use this approach more correctly, for example, to target specific clients, but not all of them, those who are about to churn or had bad purchasing experiments recently.

Heuristics and policies of churn prevention in managerial practices

There are other ways of identification of churners, which help to understand which ways of management policies can be used for churn prevention for various groups of customers (such as loyal customers, not loyal, etc.). Heuristics are how company identify churners' groups, and policies are what company suggests for customer in order to keep client loyal, or to attain those customers, who are ready to churn.

There are different heuristics in management: usage of recency-sales matrix (Blattberg, Getz, & Thomas, 2001), cross-tabulation, RFM-type approach, self organizing maps (SOM) and genetic programming (GP) (Faris, Al-Shboul, & Ghatasheh, 2014), etc. Customers as churners can be identified via different ways: CLV value, LifeTime value, RFM value, ROI. CLV is based on financial characteristics of the customer: it shows the value of future profits obtained over his relationships with the firm. The calculation of CLV provides the firm with differentiation of customers and their segmentation on groups based on this value (Abedzadeh, & Nematbakhsh, 2012). Also, there is RFM analysis, which similarly allows firms to divide customers into groups, based on their relationships with the firm. Both of these methods give management team the opportunity to highlight the common attributes of churning customers due to the fact that previous customer behaviour is for sure the best predictor of their future actions.

All of the heuristics are driven with some basic ideas: best predictor of customer future behaviour is his part behaviour; the willing of customer's to be the best in the customer's game (people like thinking they are smart and control all of their actions); programs which are data-driven - help to allocation of resources (ROI); service is about relation of business and customer and making its relation duration as long as it could be, and about interaction between customer and business, which in further will be analysed for possible opportunities for business (Novo, 2004). The core in customer-management relation is the existence of a feedback loop in the heart of its relationship, and it is about a loop, where action, then reaction, then feedback, which will drive customers to the action, for example - the continuation of usage of service, or stop the usage of it. Business in turn react to the action with feedback. And that is how does this happen, until both customer and business see the value in this relationship (Novo, 2004).

Results we have got from the CLV and RFM analysis (division of groups) can be further used for creation of retention strategies and policies of churn prevention. There are many policies, which are used for churn prevention and customer's retainment (Wьbben & Wangenheim, 2008). For example, it can be just sending an offer which will make this group of customer's stay with you forever, or even better - it can be another additional sign of a special offer, which is going to provide them with special discounts for different products. These policies can be implemented for different groups of customers based on their loyalty level. If we start to see that we are going to lose them in the shortage of time - we need to focus our attention on them and suggest to them something they cannot refuse. When we try to understand what drives customers to churn (churning reasons), we need to perform actions, which will cover the need for these churning customers, based on the reasons for their churn behaviour.

1.2 Churn prediction analysis

The issue of churning in the company's operational procedures is getting more and more considerable and vexed. It concerns businesses from completely different fields, which only justifies the severity of the problem. For example, Karapinar, Altay and Kayakutlu (2016) conducted research aimed to identify churners and validate applied methods in the automotive supply industry. They based their research on categorical and continuous variables retrieved from the Turkish automotive industry, such as total cash flows from a customer during 2013-2015, frequency of using a car maintenance service by a customer by a year, availability of insurance, the year of purchase and other. In this research the authors compared two methods of analysis, Artificial Neural Networks and Decision Trees, and found out that despite both methods showing strong results, for decision tree cases, the number of buckets remains a crucial factor. Another industry which considers predicting churning behaviour as an important issue is electronic banking. Keramati, Ghaneei and Mirmohammadi (2016) wondered what are the features of churning customers. They used decision trees to build a model based on such data as level of customers dissatisfaction, taking into account the duration of communication with customers and customer complaints, level of use of service, taking into consideration the number and value of transactions made through electronic banking portals and other means of transaction, and finally the customers' demographic features, for instance, age, gender, education and working experience. The results obtained define five groups of churners for electronic banking services depending on variable intervals.

Tambde and Motwani (2019) looked deeper on the problem of churners not outside but inside the company. They investigated employee churn rate as a helpful tool for the organization to be aware of possible churn of valuable personnel as a part of Human Resource Management strategy. On a Kaggle dataset of 15,000 observations and 10 variables the Machine Learning algorithms were applied. As a result, confusion matrix, which included Gradient Boosting algorithm, Dimensionality reduction algorithm, K-means algorithm and Random Forest, demonstrated the indisputable advantages of the latter in comparison with the others. After more precise literature review on what industries were investigated in terms of churn identifying several more areas were found. For example, Venkatraman and Ragala (2017) concentrated their study on the churners prediction for the streaming video service. They gathered all information needed for the research from Gigya, which contains customer information, from Youbora, which comprises information about video subscription, and form so-called enterprise service bus, where tracking data is stored. The received results showed that the models with high expectations should always be created and a combination of the proposed systems should be used. The complex method proved its application based on Logistic regression, Naive Bayes, Decision trees, Artificial neural networks (ANN), Support vector machines, K-nearest neighbour (KNN), which were used by the authors. The majority of the methods mentioned will be considered in detail later in this paper.

Continuing familiarization with the diversity of areas in which customer loss prevention is being studied by academics, we should also mention the paper on churn prediction in Chinese traditional broadcasting industry (Hou et al., 2018). According to a simple logistic regression several variables appeared to have a significant influence on customer churn. Thus, consumer intensity of watching, the amount the customer consumes and his habits of paying play a significant role in predicting churning customers. Based on data from cable network enterprises, results also showed that the preference of customers in terms of what to watch moderates the impact of consumer intensity of watching on the dependent variable. Moving on to the next example of the industry where churning behaviour is also being studied, it is interesting to mention the home-based care services industry (Manongdo & Xu, 2017). A binary client churn classifier was built comparing three common methods of customer churn prediction: logistic regression, decision tree and random forest. As a result, only the decision tree method has shown its applicability in this industry, taking as the independent variable customer satisfaction, demographics, frequency of transactions and several more.

2. Methods of churn analysis

2.1 Traditional methods

Regression analysis

There are different methods for predicting the churners. One of the most common classic methods is logistic regression. This is the first method that researchers began to use for predicting clients which might leave. This method has a couple of undeniable advantages, it is quite easy to use, there are no strict and complex requirements on the nature of the data and it is rather fast. Dalvi et al. (2016) studied the reasons of customer churn on the testing dataset using logistic regression and comparing it with decision tree method. They found out that both methods are not only valuable in churn prediction, but work better together being supplemental to each other in finding the valuable insights of the telecommunication industry. Other researchers, Li and Li (2019), used logistic regression to predict who are likely to churn to the competitors on the e-commerce platforms. Along with simple logistic regression they built a model using extreme gradient boosting (XGBoost) algorithm. It was proved that on a sample which includes order information, customer profile, preference, aftersales situation, adhesiveness, and churn state as the factors the hybrid model works better for customer churn prediction than logistic regression. It is not the only one study which results showed the weaknesses of applying only logistic regression in such a kind of prediction field. For example, De Caigny, Coussement and De Bock (2018) created a new compound method to predict customer churn in fourteen customer churn datasets from different industries based on logistic regression and decision trees. The dataset comprised mean monthly revenue, mean number of customer care calls, director assisted calls, outbound voice calls, etc. This data helped to find out that the logit leaf model shows more accurate results compared to both of its components, decision trees and logistic regression independently.

Yanfang and Chen (2018) received the completely opposite results. The study on identifying key features of customers' churn behaviour demonstrated that the logistic regression model used is able to predict the churning behaviour of users with a high level of reliability. This research was conducted on the e-commerce dataset consisting of such user behaviour factors as user's online duration, number of logins, attentions, and others. Mandбk (2017) in his research on churn prediction in the telecommunication industry came to the same conclusion. The results tell us that logistic regression can be applied as a consistent tool to identify customers who are at risk of churning. The dataset consisted observations of 50,000 customers on 16 different variables and was proceeded not only by logistic regression but also by decision tree method. As a result, customer duration and contract duration found out to be the most influential in both models, while value added services variable has a big impact on dependent variable in logistic regression model, too.

Two years later the new study was published on the same topic by Mandбk and one more researcher Hanиlovб (2019). The aim of the study was to predict customer churn in the European telecommunication company with the help of demographic and consumer behaviour variables fitted into the logistic regression model. The aim was achieved by training and testing datasets representing 50,000 customers' observations randomly selected from over one million. The variables vary from age, lifetime and account type to mobile data consumption and payment bills. As a result, it occurs that people with family accounts have less chances to churn than any other. At the same time young clients, clients who do not use the company service for a long time, clients who use more mobile services than traditional ones, for example, calls and SMS, clients who do not pay in time, those with accounts for students and soon-to-be-expired contracts are more vulnerable to churn. The final model has quite high accuracy correctly predicting almost 95% of churners.

ARIMA

Along with regression models, there is another classical method of churn analysis. ARIMA, the full name, autoregressive integrated moving average, is a model that is applied for time series analysis. The coherent sequence of steps in this case starts with evaluating the stationarity. Then, unit roots and order of time integration are identified by conducting tests. On-demand, the series can be transformed by differences in the corresponding order (Bergmeir et al., 2018).

The Box-Jenkins methodology for selecting an ARIMA model for a given series of observations consists of three stages. The first step is to obtain a stationary series. Stationarity of series is tested using the following methods: visual graph analysis, autocorrelation function (ACF) and partial autocorrelation function (PACF) visual analysis, and unit root tests. If it is a stationary series, then next step is performed, if not, then application of the operator of taking a sequential difference and repeating the test are required. In practice, the sequential difference is usually taken no more than twice. After a stationary time series is obtained, its selective ACF and PACF are constructed, which are a kind of "fingerprint" of the Autoregressive-moving-average model (ARMA) (p, q) process and allow us to formulate several hypotheses about possible orders of autoregression (p) and moving average (q). It is usually recommended to use models of the lowest possible order, usually with p + q < 3 (if there is no seasonal component). Selective ACF and PACF, of course, are not required to follow exactly the theoretical analogues, but must be "close enough" to them.

For each of the models selected at the first stage, their parameters are evaluated, and their residuals are calculated. Each model is checked to see if it matches the data. The simplest model is selected from the models that are adequate to the data, i.e. the model with the least number of parameters. After the model is selected in the second stage, a forecast for one or more steps over time and estimate the confidence limits of the forecast values can be made. Modern computer packages include various methods for evaluating ARMA models, such as linear or nonlinear ordinary least squares, and the full or conditional maximum likelihood method. There are several criteria for evaluating how well the ARMA model matches the data. First, the model's coefficient estimates must be statistically significantly different from zero. Secondly, the errors are white noise. Accordingly, their estimates should also be similar to white noise. Therefore, the remainder must have zero autocorrelation.

The main purpose of using ARIMA models is to make a forecast outside of the sample. There are two sources of forecast inaccuracy: the first is ignoring future errors, second - deviation of model coefficient estimates from their true values. As one of the examples of the research where ARIMA principles were implemented is the paper of Yang (2018). Based on the large-scale data from Snapchat, real activities on the platform and ego-network structures, the authors created ClusChurn, a clustering technique to help predict churning user types in the industry of online platforms. This approach is usually used for the econometrics problems, where prediction based on frequency dynamics of usage of some value, and it is a dynamic model. Mostly in to build the ARIMA model, it is sufficient to use the information contained in the analysed time series data, what is not suitable for our case due to the absence of data needed for model building. For example, the paper of Safinejad, Noughabi, and Far (2018) was performed to build a suitable model for predicting the future customer's behaviour by extracting the time series patterns of customers' past behaviour. Authors tried to construct a dynamic model for predicting future churn behaviour, for taking preventive strategies by the organization.

2.2 Machine learning algorithms

The consequences of high customer churning rate might be severe. Practically, businesses spend five times more on attracting a new customer versus keeping an existing one (Chiang et al., 2003). That is why researchers did not stop enriching customer churning behaviour studies with the new methods. Machine learning is a class of artificial intelligence methods, the characteristic feature of which is not a direct solution to a problem, but training in the process of applying solutions to many similar problems. To construct such methods, mathematical statistics, numerical methods, optimization methods, probability theory, graph theory, various techniques for working with data in digital form are used.

Support Vector Machine

The first approach of machine learning algorithms to discuss is a support vector machine. The acknowledged benefit of SVM is high potential compared to traditional approaches because of its scalability, faster learning and running times. This method is used to solve a problem of classifying customers between those who might churn and might not. However, this method shows its weakness when the amount of negative observation is too small. Scholkopf invented a method which adapts the SVM to the one-class classification problem, Li (2003) used it for anomaly detection. The technique is that the feature space is considered first based on assumption that the origin in the second class as well as all observation close to the origin should be treated as outliers. The data is abnormal if the input fits the selected sample. Zhao (2005) studied the customer churn using advanced one-class SVM. They gathered data during three months from 100,000 clients on 17 different variables including geographic and population data, fee, additional charges, and date, time, duration, location of the calls, and whether the customer churned within five months. The training dataset was based on 2134 observations, while the testing - on 824 examples. The results have shown that Gaussian Kernel function has the highest accuracy rate of 87.15% in comparison with linear and polynomial functions with 72.28% and 77.65% accuracy rate, respectively, with the definite conclusion of non-linearity of the separating hyperplane. Moreover, the one-class SVM also outperformed ANN (Artificial Neural Network), Decision Tree C4.5 and Naпve Bayes.

The SVM model is also used to predict churning behaviour in the banking industry (He et al., 2014). The research was done on the sample from a Chinese commercial bank. The dataset included the one year monthly observations for the training dataset and the monthly observations for six months for the testing dataset. After excluding outliers, the dataset consisted from 46406 observations. The sample had 421 churners, clients who cancelled their account during the period of observation, with an approximate churners-to-non-churners ratio 109. The peculiarities of using SVM in the paper are that the authors used both linear SVM and SVM with radial basis kernel function (RBF SVM) adding logistic regression. There were five samples with different churners-to-non-churners ratios and the accuracy of the results was estimated by 10-fold cross-validation. As a result, the RBF SVM model showed the best accuracy rate of 98.95%, 39.10% precision rate and the 26.84% churners recall rate. Speaking about the F-measure of three models, they are 0.39, 0.30 and 0.25 for RBF SVM, logistic regression and linear SVM, respectively.

Another research on customer churn prediction proved the viability of SVM model for telecommunication industry (Huang et al., 2010). In the paper two common modelling techniques, such as multilayer perceptron neural networks (MLP) and Decision Tree C 4.5, and one innovative modelling technique, which is support vector machines (SVM), were used. The research was done on the dataset with the information from more than 47,000 customers of Irish telecom company (Eircom, 2008) about customer demographics, account data, payment data, order and call details. There were approximately 28,000 customers in the training dataset with 35% of churners. The rest were used in the testing dataset with 5% of churners. The results showed that the choice of the model depends on the expectations about the measurement of prediction rates. To this extent, if we evaluate according to overall accuracy the Multi-Class SVM (M-SVM) model demonstrates the best performance. In case of evaluating according to the accuracy of true churn, Decision Tree model remains the best prediction approach.

Xia and Jin (2008) also paid attention to the SVM model potential in predicting customer churn. They used a database of the University of California and a home telecommunication carry. The first dataset was divided so that there were 3333 observations for training set with almost 15% of churners and 1667 observation for testing dataset with approximately 13% of churners. The second dataset had 1474 observations in the training set and 966 observations in the testing set with 42% and 44% of churners, respectively. As a result, among different kernel functions that were used when working with SVM model, for the first dataset the SVM with radial basis kernel function had the best accuracy rate, while for the second dataset the SVM with cauchy kernel function showed the highest accuracy. The results received were compared to those from using logistic regression, Decision Tree C4.5, ANN, and Naive Bayes classi?ers.

Decision Trees

As for another machine learning method, decision trees are frequently used to estimate customer churn. The decision tree (also called the classification tree or regression tree) is a decision support tool used in machine learning, data analysis, and statistics. Hadden (2006) were one of the first to predict churning behaviour with new technologically advanced methods. The training set consisted of 202 customers with almost equal number of churners and non-churners, whereas the testing set had observations on 700 customers with 30% of churners. The regression tree as well as neural network models were done with Matlab. SPSS was used to create a linear regression model. The best accuracy was observed in the regression tree model with 82% of proper prediction. However, a lot of preliminary manipulations on classifying the customers into two groups are needed to accurately predict with this model. At the same time, linear regression showed the best results in prediction of non-churners and neural network succeeded the most in predicting churners.

Several machine learning techniques were also compared with other methods by Vafeiadis (2015) in churn prediction of telecom customers, including ANN, decision trees, SVM, Naive Bayes classi?ers and Logistic Regression classi?ers. Using open source telecommunication data, they showed the superiority of Decision Tree classifier and the two-layer Back-Propagation Network in terms of error value and total accuracy (94%). The support vector machines classi?ers took second place with 93% of total accuracy, followed by Naive Bayes and Logistic Regression classifiers with the lowest accuracy of approximately 86%. Another interesting research was conducted for multimedia on demand industry in Taiwan (Tsai & Chen, 2010). This case study was made based on the dataset consisting of information gathered about more than 37,000 customers during 15 months. The back-propagation learning and C5.0 algorithms were used to create neural network and decision tree models. Both models were trained on 12 and 22 variables datasets. Finally, it came out that based on the training set the models trained on 22 variables perform better than those trained on 12 variables. Nevertheless, in the case of testing dataset, the results are the opposite, which is more representative and reflects reality. Moreover, in both cases decision tree models performed better.

Random Forest

Another big block of machine learning techniques is a Random Forest method. It became a helpful tool in data analysis, especially in customer analytics. Xie (2009) studied the churning behaviour in a major Chinese bank using improved balanced random forest, which is a combination of balanced and weighted random forests. Data was gathered from the warehouse and consisted of 14,000 customers and 27 variables. The dataset was randomly divided into 1524 samples with equal number of observations for training and testing datasets. The overall principle was that the more trees consider the sample to be negative, the bigger negative score the sample receives. The results showed that improved balanced forests perform better than both of its components separately. Compared to other methods such as Artificial Neural Networks, Decision Trees, class-weighted core support vector machines (CWC-SVM), imbalanced random forest (IBRF) had the highest accuracy and the biggest number of top-decile lift, 93.2% and 7.1 respectively. Moreover, it is worth mentioning that the top-decile lift seizes about 88% of churners, while the top-four-decile lift - 100% of churners. Thus, IBRF shows that it is more suitable for churn prediction than classical models.

To explore deeper the possibilities of the Random Forest approach, two different datasets were used by Ullah (2019) to analyse machine learning techniques for the churn prediction tasks. One pool of data was gathered from one of the telecom service providers in South Asia and contained more than 64,000 examples with almost 30 variables. Another pool of data was a churn-bigml dataset with more than 3,000 examples. The main idea of the method used is to use a large ensemble of decision trees, each of which gives a very low-quality classification, but due to their large number the result is good. Furthermore, Random Forest returns missing observations inside the dataset for training the model. As a result, Random Forest and Decision Tree C4.5 (J48) turned out to be the best models with close to 90% accuracy rate and 0.893 precision rate. Finally, the main drivers for churning behaviour were identified and the guidelines for the management decision were formulated.

The final example of implementation of the Random forest technique for churn prediction in the airline industry is the paper of Zhu (2019). After all conversions, the dataset consisted of almost 5,000 customers of the considered airline company and 53 variables. The process of the model structuring based on Random Forest and least absolute shrinkage and selection operator (LASSO) model is sequential and has several steps. First of all, the data pre-processing takes place, followed by building LASSO model for variable screening. Then, it is needed to construct a preliminary Random Forest model. After tuning the parameters and testing the data, the optimal model should be received to make a prediction decision. In the research a single LASSO model and Random Forest model were compared to their combination. It turned out that LASSO-RF model is easier in terms of calculations, more accurate in churn prediction and robust, which gives an ability and reason to implement the method in other industries.

Boosting

Among machine learning algorithms there is one more based on Decision Tree which is called eXtreme Gradient Boosting algorithm (XGBoost). It has proved to be an efficient, agile and fast method. Also, it is suitable for large amount of sparse data that fits modern trends in information technologies development. This method identifies the target feature by a number of decision trees, estimating weight for each leaf. XGBoost has also become widely used in customer analytics, especially in churn prediction. Borrotti (2018) implemented this method with 2, 6 and 10 maximum depth of a tree and 100, 500 and 900 number of trees with the maximum number of iterations equals 100. In order to avoid early convergence, the 0.01 learning rate was settled. The dataset consisted of three months' observations on more than 7,000 customers of the pet shop. The allocation of data into training, validation and test set was 65%, 20% and 15% respectively. As a result, the research showed that Decision Tree has the best performance when using 500 trees and 10 maximum depth. Both parameters accuracy and log-loss were quite acceptable 0.834 and 0.396. However, XGBoost demonstrated 0.895 accuracy and 0.317 log-loss with the same maximum depth of a tree but with number of trees equals 900. Thus, XGBoost shows better results compared to Decision Tree methodology with only exception when the number of trees is 100.

Gregory (2018) investigated how the same method works in churn prediction for the music streaming service industry. For the dataset source, he chose the WSDM Cup 2018 Challenge provided by KKBOX. Three-year data on user activities, user transactions and user personal data was splitted into training, validation and testing set. 80% of features were used from user activities, user transactions data, totally accounted for 208 features, each of which was cross-validated by the testing set. As a primary classifier, the XGBoost library was used, while as a final model submission tool, the Light Gradient Boosting Model (LightGBM) was implemented with 88% and 12% weights, respectively. Taking both models together, the overall log loss was 0.07974 and the final accuracy rate showed the viability of the LightGBM and XGBoost in predicting churn behaviour in KKBOX environment. Machado, Karray and Sousa (2019) conducted another research on customer loyalty prediction in the finance industry. They used LightGBM and XGBoost algorithms and compared them in predicting loyalty score of each customer in the financial organisation in Brazil, called Elo. The company published its dataset on Kaggle and set the competition. It was the first time in academic literature when XGBoost was used in the field of financial products marketing. Surprisingly, the authors found out that LightGBM outperform XGBoosting in this research. It showed better accuracy than usual regression and other GBDT models and proved its applicability in financial products marketing.

Neural Networks

There are a lot of studies which explore customer churn with the help of Artificial Neural Network (ANN). This method represents mathematical model, as well as its software or hardware implementation, built on the principle of organization and functioning of biological neural networks. For example, Tsai and Lu (2009) conducted research based on consumer relation management system data provided by an American telecommunication company, where there were more than 51,000 consumers' observations in the dataset among which 68% were churners. The aim was to identify the best model among ANN model, SOM (self-organizing map), a neural network with unsupervised learning that performs the task of visualization and clustering, and a hybrid neural network model. The results showed that the hybrid model is better than single ANN with 88% accuracy, which is also high. Another comparative analysis was presented by Mena (2019), they explored churn behaviour with Sequential Data and Deep Neural Networks.

In the banking industry, the neural networks became popular in predicting customer churn as well. Zoriж (2016) mined the dataset consisting of 1866 customers on several features such as gender, age, monthly salary, usage of bank products. Using software package Alyuda Neuro Inteligence, the dataset was divided into training, validation and testing set and a final working model was created. The authors concluded that the young clients, particularly students, to a greater extent tend to churn, though they have the highest value and potential for the bank. The relevant recommendations have been developed afterwards. The usual telecommunication industry has not been spared by studies where Artificial Neural Network was used to predict potential churners (Kumar & Kumar, 2019). The dataset comprised 21 features and 7043 observations on the bank clients, particularly, their account information, demographics, service based and churn information. Finally, the researchers received 85.53% accuracy on the training dataset and 76.5% accuracy on the validation set. Also, they found that the accuracy on the training dataset can be raised to 93.14% if the units of neurons in some hidden layer are shifted to another figure. The results indicated Tech support and Streaming movie features as those which have more influential weighted factors in terms of customer churn. Moreover, these factors have a direct positive influence on churn behaviour. The opposite relationship can be seen with Contract and Total Charge variables: these factors negatively influence churn.


Подобные документы

  • Improving the business processes of customer relationship management through automation. Solutions the problem of the absence of automation of customer related business processes. Develop templates to support ongoing processes of customer relationships.

    реферат [173,6 K], добавлен 14.02.2016

  • Analysis of the peculiarities of the mobile applications market. The specifics of the process of mobile application development. Systematization of the main project management methodologies. Decision of the problems of use of the classical methodologies.

    контрольная работа [1,4 M], добавлен 14.02.2016

  • The concept and features of bankruptcy. Methods prevent bankruptcy of Russian small businesses. General characteristics of crisis management. Calculating the probability of bankruptcy discriminant function in the example of "Kirov Plant "Mayak".

    курсовая работа [74,5 K], добавлен 18.05.2015

  • Selected aspects of stimulation of scientific thinking. Meta-skills. Methods of critical and creative thinking. Analysis of the decision-making methods without use of numerical values of probability (exemplificative of the investment projects).

    аттестационная работа [196,7 K], добавлен 15.10.2008

  • Impact of globalization on the way organizations conduct their businesses overseas, in the light of increased outsourcing. The strategies adopted by General Electric. Offshore Outsourcing Business Models. Factors for affect the success of the outsourcing.

    реферат [32,3 K], добавлен 13.10.2011

  • Сущность CRM-систем - Customer Relationship Management. Преимущества клиенториентированного подхода к бизнесу. Формы функционирования и классификация CRM-систем. Основные инструменты, которые включает в себя технология управления отношениями с клиентами.

    реферат [30,9 K], добавлен 12.01.2011

  • Рассмотрение концепции Customer Relationship Management по управлению взаимоотношениями с клиентами. Возможности CRM-систем, их влияние на эффективность бизнеса. Разработка, реализация и стоимость проекта внедрения CRM-системы для ЗАО "Сибтехнология".

    дипломная работа [5,5 M], добавлен 15.09.2012

  • Description of the structure of the airline and the structure of its subsystems. Analysis of the main activities of the airline, other goals. Building the “objective tree” of the airline. Description of the environmental features of the transport company.

    курсовая работа [1,2 M], добавлен 03.03.2013

  • Value and probability weighting function. Tournament games as special settings for a competition between individuals. Model: competitive environment, application of prospect theory. Experiment: design, conducting. Analysis of experiment results.

    курсовая работа [1,9 M], добавлен 20.03.2016

  • About cross-cultural management. Differences in cross-cultural management. Differences in methods of doing business. The globalization of the world economy and the role of cross-cultural relations. Cross-cultural issues in International Management.

    контрольная работа [156,7 K], добавлен 14.04.2014

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.