Probabilistic graphical models in customer analytics: comparison with classical predictive models
This paper presents the application of different methods in order to have a complex vision on customers’ churn problem. It detects which algorithms can be used not only for churn prediction but also for churn prevention issues. Churn prevention analysis.
| Рубрика | Менеджмент и трудовые отношения | 
| Вид | дипломная работа | 
| Язык | английский | 
| Дата добавления | 25.08.2020 | 
| Размер файла | 1,3 M | 
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Looking at the results for the new variables, first of it worth to describe Internet service variable. In the general model we used another Internet variable, which took on only “yes” or “no” values. But it turned out to be insignificant, therefore it was not even included in the final model. For the Internet dataset we can observe very different results. Thus, the odds to churn for the clients who have Fiber optic Internet provider are about 135% higher than the odds for the clients who have DSL provider on the 1% significance level. Why did we get such results? It can be explained by difference in these two providers. First of all, Fiber optic is pretty much faster than DSL. Secondly, Fiber optic Internet is typically more reliable than DSL: such factors as distance from the Internet service provider can interfere with DSL connection and reduce speed. However, with all the advantages, Fiber optic costs more than DSL, that is why DSL is almost always going to be a more economical option, because of usage of existing technology and infrastructure (telephone lines). Thus, if a person uses the Internet more for surfing the web and sending emails than streaming TV and movies, DSL is more cost-effective. Finally, for a region to have access to fiber-optic Internet, they must have the fiber optic cables installed, which gives another pain. Maybe the cost of Fiber optic plays the major part in the resulting odds. Probably for most customers DSL speed is enough for daily needs, and they do not want to pay more for the Internet they do not fully use. The same explanation goes to the Streaming TV and Movies variables. Perhaps, the odds to churn for the clients who have streaming TV and movies options are about 31% higher than the odds for the clients who have not these options are such, because people just do not need these options in reality.
Finally, Tech Support and Online Security variables turned out to be significant on 1% level. The odds to leave the company for the clients with tech support option over the odds for those who do not have service providing technical help is 0.611. In terms of percent change, the odds for the clients who have tech support service to churn are around 39% lower than the odds for those who have not. The odds to churn for those who have an online security option are almost 41% lower than the odds for those who do not have this option. The explanation of these results may be in the options utility. Customers with online security option do not see harmful and phishing websites due to the blocking, which makes their Internet time spending more peaceful and pleasant. The same goes for tech support. Having service which provides technical help and solutions to hardware and software problems makes your life easier. Of course, the question why clients without these options cannot just turn them on immediately arises. Maybe, it is all about money and clients want to find a telecommunication company which provides these services for free or lower costs, but maybe, and more likely, some clients do not know about the existence of these options at all. In both cases the company can prevent churn rate by reducing the price for these options for the risk group or notifying their clients about these options.
After logistic regression analysis, the Random Forest model was built. The best result showed the model with 100 trees and 2 variables randomly sampled as candidates at each split. Unlike regression, Random Forest does not give results how changes in each independent variable affect dependent variable, but it gives something no less important. Random Forest shows the importance measures of each variable in the model, which gives information for understanding effects of different variables on churn. Mean decrease accuracy is calculated from permuting out-of-bag (OOB) data: for each tree, the prediction error rate on the OOB portion of the data is recorded (Liaw & Wiener, 2002). Then the same is done after permuting each predictor variable. The difference between the two is then averaged over all trees and normalized by the standard deviation of the differences. Mean decrease Gini shows the total decrease in node impurities from splitting on the variable, averaged over all trees. For classification, the node impurity is measured by the Gini index. Gini importance is overall inferior to accuracy importance as it is relatively more biased and unstable (Liaw & Wiener, 2002; Breiman, 2002). According to the model output (Figure 10), the most important variable is Tenure. Thus, removing this variable from the model would only result in an additional misclassification of around 32 observations on average. This means that the number of months the customer has stayed with the company plays the major role in churn prediction. Then goes Total Charges variable with 31 mean decrease accuracy and Contract with mean decrease accuracy equals almost 27. Other important variables are Monthly Charges, Internet Service, Online Security and Tech Support. Looking at the mean decrease Gini measures we will see almost the same results, at least top seven variables will be the same. Compared to regression results again strong similarities can be seen. For example, tenure and contract variables were significant in both regression models, all three Internet variables - in the Internet model, and even both charges variables were significant, but total was removed from the model due to multicollinearity. To sum up both logistic regression and Random Forest results we can highlight these seven variables as the main variables influencing the churn.
Figure 10. The importance of each variable in the Random Forest model
Churn prediction results can be seen in Table 5. According to this table the best prediction accuracy is shown by the XGBoost model. The eXtreme Gradient Boosting model correctly predicts 0.8124 of the observations on the test data, which shows a pretty good result. A little bit worse is the accuracy of the Random Forest model. It correctly predicts 0.8098. Comparing these two based on Decision Tree algorithm methods we can see that the Random Forest model has better prediction accuracy on the train dataset than eXtreme Gradient Boosting (0.8697 versus 0.8216). But on the test set the results for Random Forest is much worse: 6% decrease compared to 1% for regression and boosting. It can be explained by some overfitting of the Random Forest method, which gives very accurate results on the train set, but works much worse on the test dataset. This may happen because Random Forest is a parallel method, it builds its trees at once and then chooses the best. As a result, its accuracy prediction for the train set can be very high, but the model itself will be good only on this particular train data, and will show rather bad results on the test one. General logistic regression model also showed a pretty high level of accuracy on both train and test dataset. Thus, it correctly predicts almost 0.805 of the test data. Accuracy of the logistic regression model based on the Internet dataset is a little bit lower than of the other models. It correctly predicts only 0.756 of the test sample observations. But it can be explained by a lower total number of observations in the Internet dataset and deleting one of the variables from the final model due to multicollinearity problem, that is why comparison with other models cannot be done. Bayesian Network's accuracy for test data is only 0.7661, meaning that this method works worse in prediction than others. But perhaps this happens because for Bayesian Belief Network all discrete variables are manually converted to categorical ones, that is, we divide the variable into several intervals, and because of this the accuracy decreases.
Table 5 Churn prediction results
| Log Regression | Bayesian Net | Random Forest | XGBoost | ||
| Train data | |||||
| Accuracy | 0.7938 | 0.7681 | 0.8697 | 0.8216 | |
| Sensitivity | 0.5220 | 0.5149 | 0.6384 | 0.5472 | |
| Specificity | 0.8923 | 0.8598 | 0.9536 | 0.9211 | |
| Test data | |||||
| Accuracy | 0.8049 | 0.7661 | 0.8098 | 0.8124 | |
| Sensitivity | 0.5260 | 0.5159 | 0.4791 | 0.5176 | |
| Specificity | 0.9056 | 0.8565 | 0.9292 | 0.9189 | |
| Precision | 0.6681 | 0.5651 | 0.7097 | 0.6975 | 
To understand predicted accuracy of our models in more detail, we can look at the confusion matrix results (Table 6) and sensitivity, specificity numbers. If we pay attention to the sensitivity and specificity results, we will notice that specificity level is much higher than sensitivity. The specificity of a test is its ability to determine non-churners cases correctly, this means that our model better predicts non-churners from non-churners than churners from churners. For example, for the most accurate model (XGBoost) specificity is almost 0.92 on the test set, telling us that this model incorrectly predicts that a client did not churn if he actually did not only in 8% (134 out of 1653). On the contrary, sensitivity is only around 0.52, meaning that eXtreme Gradient Boosting incorrectly predicts that a client left the company if he actually did almost in 50%. Precision measures how good the model is at assigning positive events to the positive class. For our XGBoost model it equals 0.698. Therefore, almost 70% (309 out of 443) of the clients predicted as churners were actually churners. Sensitivity or recall and precision metrics report the relevance of the model from two perspectives: type I error and type II error. In statistical hypothesis testing, a type I error is the rejection of a true null hypothesis while a type II error is the non-rejection of a false null hypothesis. In our prediction, a type I error is equal to 134 (false positive prediction) and type II error equals 288 (false negative prediction). In other words, the eXtreme Gradient Boosting model telling us 288 clients of Telco company did not churn in the previous month, whereas in reality they did.
Table 6 Confusion matrix results
| Log Regression | Bayesian Net | Random Forest | XGBoost | ||
| True positives | 314 | 308 | 286 | 309 | |
| True negatives | 1497 | 1415 | 1536 | 1519 | |
| False positives | 156 | 237 | 117 | 134 | |
| False negatives | 283 | 289 | 311 | 288 | 
At the same time, the model showed us that 134 clients left the company in the previous month, while in reality they stayed. Looking at these numbers, we can conclude that in general our model more often makes mistakes predicting non-churners. However, it would be incorrect, because such results can be the consequence of uneven distribution of churners and non-churners in the dataset. Due to a bigger amount of non-churners observations in the data, our model gives a bigger number of incorrectly predicted non-churners than churners, because in absolute terms there are fewer. And in reality, going back to sensitivity and specificity measures, our model is better in non-churners prediction, but again, probably only due to the distribution: more observations - better understanding of how to predict them. Interestingly, despite higher accuracy of prediction for XGBoost model, its specificity level is a bit lower than for Random Forest model, telling that Random Forest works better in non-churners prediction than any other methods for this particular dataset. On the contrast, sensitivity level is the highest for logistic regression, meaning that regression works better in churners prediction than any other methods for this particular dataset. Moreover, the accuracy of the general logistic model on the test set is higher than the train one. This happened because in our test set the number of churners is lower than in the train data. We know that our model better predicts non-churners than churners that is why less positive churn observations in the data - better prediction.
Recall and precision are very connected: if we use a stricter churn filter (will assume that more people will churn), we will reduce the number of churners, because we will have time to give some personal offers and clients will not leave the company, but increase the number of normal clients, who will also receive these special offers, even if they did not even think of leaving. The opposite, i.e. a less strict churn filter (less people in the risk of leaving group), would lead to higher churn rate, because if will miss real churners. Maybe, at the first glance the second variant is more harmful than the first one, but in reality the company will lose a lot of money in both scenarios: in the first one will give discounts to the people who were ready to pay more, in the second - loose clients and money they can still be paying you for your services if they did not leave. However, the first scenario may lead to a less dramatic result if “personal offers” or churn prevention factors will not be monetary. Of course, in this case company will process more clients assuming they are about to leave, meaning employees will waste their time on people, who were happy without these offers, and maybe have less time for other important things, but the company will not lose money, at least a lot less than in case of the second scenario. Note, that BN tends to the first scenario with competitive overall prediction accuracy. Anyway, these findings about churn prevention factors can help companies to decrease their churn rate, that is why it is very useful to look at the Bayesian Belief Network results.
4.3 Bayesian Belief Network analysis
Like for logistic regression analysis we decided to build two Bayesian networks: one with only general variables (General network) and another one with all the variables (Internet model). We needed a general network to see how main factors connect with each other and how all these connections affect our studied variable. Before any building we convert all numeric variables into factors by recoding them. Monthly and Total Changes variables were divided into 4 almost equal intervals, while Tenure - into 5 intervals. The final Bayesian network with only general variables can be seen on Figure 11. The first thing that catches your eye is Gender variable. This variable does not have any connections with other variables, which once again confirms the results of the above analysis: gender does not affect churn neither directly nor through other variables.
Figure 11. Bayesian network with only general variables
Socio-demographic variables
Next, let us have a look at other three clients' variables: Senior Citizen, Partner and Dependents. Both Senior Citizen and Partner variables directly affect Dependents variable which is very obvious: younger people have higher probability to have dependents than senior (34% versus 8%), because younger people have minor children; at the same time, the clients with partners have almost 5 times higher probability to have dependents (52% versus 10%) than single, because again couples usually have children while single people have not (Table 7). In general, senior clients without partners have only 1% probability to have dependents versus 11% for younger single clients. On the other hand, old clients with partners have 14% probability to have dependents versus 59% for younger clients in a relationship.
Table 7 Probability to have Dependents according to Partner and Senior Citizen variables
| Partner | Dependents | Senior Citizen | Dependents | |||
| No | Yes | No | Yes | |||
| No | 0.90 | 0.10 | No | 0.66 | 0.34 | |
| Yes | 0.48 | 0.52 | Yes | 0.92 | 0.08 | |
| Dependents | Partner = No | Dependents | Partner = Yes | |||
| No senior | Senior | No senior | Senior | |||
| No | 0.89 | 0.99 | No | 0.41 | 0.86 | |
| Yes | 0.11 | 0.01 | Yes | 0.59 | 0.14 | 
Phone variables
Having dependents in turn affects Internet variable. Thus, the clients who have dependents with 70% of probability will have Internet while without dependents - with 82% (Table 8). Senior Citizen variable itself has an effect on the Internet variable: probability to have Internet for a senior citizen is around 95% whereas for the clients who are not senior - 75%. One more factor - Phone Service, affects Internet variable. According to the results, the probability to have Internet service for the clients with phone service is 76%, whereas for the clients without is 100%.
Table 8 Probability of Internet Service according to the Dependents,
Senior Citizen and Phone Service variables
| Dependents | Internet | Senior Citizen | Internet | Phone Service | Internet | ||||
| No | Yes | No | Yes | No | Yes | ||||
| No | 0.18 | 0.82 | No | 0.25 | 0.75 | No | 0.00 | 1.00 | |
| Yes | 0.30 | 0.70 | Yes | 0.05 | 0.95 | Yes | 0.24 | 0.76 | 
The Internet variable has a direct effect on Paperless Billing factor according to the model. Thus, the probability to have paperless billing option on is 29% if the client does not have Internet service and 68% if does (Table 9). This is logical, because if you have an Internet connection you will be able to check your electronic billing and it will be more comfortable than receiving paper bills. The Senior Citizen variable also affects Paperless Billing. Here, it is more likely to have paperless billing if you are senior (probability equals 77%) than if you are not (56%). This is probably because it is more likely for senior clients to have Internet (Tables 8). The probability to have paperless billing for the clients with two years contracts is lower by 21% than for the clients with month-to-month contracts (46% versus 67%, respectively).
Table 9 Probability of Paperless Billing according to the Senior Citizen, Internet and Contract variables
| Senior Citizen | Paperless Billing | Internet | Paperless Billing | Contract | Paperless Billing | ||||
| No | Yes | No | Yes | No | Yes | ||||
| No | 0.44 | 0.56 | No | 0.71 | 0.29 | M-t-m | 0.33 | 0.67 | |
| Yes | 0.23 | 0.77 | Yes | 0.32 | 0.68 | 1 year | 0.46 | 0.54 | |
| 2 years | 0.54 | 0.46 | 
Monetary variables
Paperless Billing variable together with three other variables influences Monthly Charges. In Table 10 more options (Paperless Billing, Internet, Phone Service) - higher monthly charges tendency can be seen. Thus, the probability to pay more than 90 dollars per month is 31% for the clients with paperless billing versus 14% for the clients without, while pay less than 30 dollars vice versa: 13% for those who have paperless billing versus 39% for the clients without this option. Almost the same situation is for the Internet and Phone Service variables: the clients with Internet have 31% probability to pay more than 90 dollars per month when the clients without Internet - 0%; the clients with phone service have 27% probability to pay more than 90 dollars per month whereas without this option again 0%. At the same time, the probability to pay less than 90 but more than 60 is equal to 8% for the clients without phone service, but 0% for clients without Internet, because the spread of phone service bills is higher. The highest probability (74%) is to pay more than 30 dollars but less than 60 for people without a phone service option.
Table 10 Probability of Monthly Charges according to the Paperless Billing, Internet, Phone Service and Tenure variables
| Monthly Charges | Paperless Billing | Internet | Phone Service | ||||
| No | Yes | No | Yes | No | Yes | ||
| (0;30] | 0.39 | 0.13 | 1.00 | 0.02 | 0.18 | 0.24 | |
| (30;60] | 0.19 | 0.18 | 0.00 | 0.23 | 0.74 | 0.12 | |
| (60;90] | 0.28 | 0.38 | 0.00 | 0.44 | 0.08 | 0.37 | |
| (90;119] | 0.14 | 0.31 | 0.00 | 0.31 | 0.00 | 0.27 | |
| Monthly Charges | Tenure | ||||||
| (0,3] | (3,12] | (12,36] | (36,60] | (60,72] | |||
| (0;30] | 0.25 | 0.24 | 0.23 | 0.23 | 0.23 | ||
| (30;60] | 0.27 | 0.23 | 0.20 | 0.15 | 0.09 | ||
| (60;90] | 0.41 | 0.40 | 0.36 | 0.29 | 0.29 | ||
| (90;119] | 0.07 | 0.13 | 0.21 | 0.33 | 0.39 | 
For Tenure and Monthly Charges variables the relationship is very interesting. Thus, for the loyal clients of the company (using the company's services more than 3 years) the probability to pay at a higher monthly rate (more than 90 dollars) is higher than for the new clients. For example, people who are company's clients for more than 3 years but less than 5 have 33% probability to pay more than 90 dollars per month, while for those who use company's services from 1 to 3 months the probability equals only 7%. We cannot say the same about the lowest monthly payment rate (less than 30 dollars), because the difference is negligible: the probability for a new clients equals 25%, for clients who stayed with the company from 3 to 12 months - 24%, for clients using company's services more than one year - 23%. This probably happens because the trust of the company's customers grows with the continued use of the services of this company, and people ready to use more services from this particular company and pay more. On the other hand, when you just start to use services from a new company, you are more likely to start small or medium, and then, if you like the company, to try other services and options.
Table 11 Probability of Payment Method according to the Contract and Monthly Charges variables
| Contract | Payment Method | ||||
| Bank transfer (automatic) | Credit card (automatic) | Electronic check | Mailed check | ||
| Month-to-month | 0.15 | 0.14 | 0.48 | 0.23 | |
| One year | 0.27 | 0.27 | 0.23 | 0.23 | |
| Two years | 0.33 | 0.34 | 0.10 | 0.23 | |
| Monthly Charges | |||||
| (0;30] | 0.21 | 0.22 | 0.10 | 0.47 | |
| (30;60] | 0.17 | 0.19 | 0.34 | 0.30 | |
| (60;90] | 0.22 | 0.21 | 0.42 | 0.15 | |
| (90;119] | 0.25 | 0.24 | 0.45 | 0.06 | 
The Payment Method variable does not influence any other variables but there are two variables which affect it (Table 11). First of all, the clients with month-to-month, one year or two years contracts have the same probability (23%) to have mailed check payment. The probability for the clients with month-to-month contracts to pay via electronic check equals 48% and to have automatic payment - around 14% (15% for bank transfer and 14% for credit card), while for the clients with two years duration contracts the probability to pay via electronic check is 38% lower (only 10%) and to have automatic payment is 20% higher (33% for bank transfer and 34% for credit card). Here we return to the loyalty importance: clients with longer contracts using the company's services a longer time and their level of trust and loyalty can be much higher than new clients', that is why the first group has higher probability to have automatic payment. People who pay less than 30 dollars per month have higher probability to have mailed check payment than those paying more than 90 dollars (47% versus 6%). Probably because people paying less than 30 dollars are new clients of the company. The probability to pay via automatic bank transfer is higher for the clients who pay more than 90 dollars than the clients paying less than 30, but the difference is not big (only 4%). Those who pay more than 90 dollars have the probability to pay via electronic check equals 45% while those who pay less than 30 dollars - only 10%.
Time variables
The results from Table 12 tell us that clients who have partners have higher probability to be loyal to the company than those who do not have. In other words, the probability to stay in the same company for more than five years for single people is only 9% while for people with partners - 31%. The probability to stay in the same company for less than 3 months for single people is 24% while for people with partners - only 6%. Probably, the explanation can be that partners of the clients are also clients of the company and it is convenient for the couple to get services from the same company whereas single people can easily change the company. We will not pay attention to the Total Charges variable because its relationship with Monthly Charges and Tenure is obvious because of strong correlation: the higher is your monthly costs and the longer you are using services of the company the more will be the total amount charged to you from this company.
Table 12 Probability of Tenure according to the Partner variable
| Partner | Tenure | |||||
| (0,3] | (3,12] | (12,36] | (36,60] | (60,72] | ||
| No | 0.24 | 0.21 | 0.28 | 0.18 | 0.09 | |
| Yes | 0.06 | 0.11 | 0.25 | 0.27 | 0.31 | 
Looking at our last independent variable - Contract, we will see that it is affected by Senior Citizen, Internet Service and Tenure variables (Figure 11). In general, senior clients have higher probability to use shorter contracts than non-senior. In more details, the probability to have a month-to-month contract for senior people is equal to 71% while for younger - 52% (Table 13). At the same time younger clients with 26% probability will have two years contract when senior only with 12%. The probability of having two years contracts for clients with Internet service is only 18% while for the clients without Internet - 45%. In contrast, people without Internet service have 31% probability of having a month-to-month contract whereas those who have Internet - 62%. Again, looking at the Tenure variable results we can see an obvious picture: the longer the clients stayed with the company the higher is the probability to have a longer-duration contract. Thus, the probability to have a two years contract is less than 1% for the clients using the company's services less than 3 months and 70% for the most loyal group of customers (tenure from 60 to 72 months). On the other hand, the most loyal clients' probability to have a month-to-month contract equals 8%, whereas for the new clients equal to almost 98%, because people do not trust a new company and that is why they are not ready to buy long-term contracts from the beginning.
Table 13 Probability of Contract according to the Senior Citizen, Internet and Tenure variables
| Senior Citizen | Contract | |||
| Month-to-month | One year | Two years | ||
| No | 0.52 | 0.22 | 0.26 | |
| Yes | 0.71 | 0.17 | 0.12 | |
| Internet | ||||
| No | 0.31 | 0.24 | 0.45 | |
| Yes | 0.62 | 0.20 | 0.18 | |
| Tenure | ||||
| (0,3] | 0.978 | 0.017 | 0.005 | |
| (3,12] | 0.87 | 0.09 | 0.04 | |
| (12,36] | 0.66 | 0.24 | 0.10 | |
| (36,60] | 0.34 | 0.37 | 0.29 | |
| (60,72] | 0.08 | 0.22 | 0.70 | 
Churn variable
Finally, let us have a close look at the Churn variable node. Only three variables - Contract, Total and Monthly Charges directly affect Churn (Figure 11). The results from Table 14 showed us that the probability to churn is higher for people with smaller duration rate contracts. In terms of percentages, the probability to churn for the clients with month-to-month contracts is equal to 43% while for the clients having two years contracts - only 3%. The difference between two and one year contracts is not big, only 8% percent. Monthly Charges and Churn variables relationship shows more interesting results. Thus, the probability to churn for the clients paying more than 90 dollars per month (the highest paying rate group) is 33%, for the clients paying less than 90 dollars but more than 60 per month is 34%, and for the clients paying less than 60 dollars but more than 30 per month is 27%. These numbers are almost the same, meaning that a single Monthly Charges variable cannot explain the connection with Churn. Only summing up Table 10 and Table 14 output, we will get a whole picture. Thus, the longer the client stays with the company, the more likely he will pay more per month, because his level of trust is growing, and he wants to use more options/services. However, the probability of leaving the company is almost the same for customers who pay at the highest rate, and for those who pay on average and below average. It turns out that for loyal customers, large fees per month are not such an important indicator of churn. The probability to churn for the clients who were charged less than 435 dollars for the whole services using time is 44% while for those who were charged more than 3476 dollars - only 15%. These results are quite predictable, because the higher is the amount charged to the client - the longer he stayed with the company. The longer the client uses the company's services - the less is the chance that he will decide to leave this company.
The higher is the monthly amount of money charged to the client if he has a month-to-month contract - the higher is the probability to churn. For example, if the client has a month-to-month contract and pays less than 30 dollars per month the probability to churn equals 23% whereas if he has a month-to-month contract and pay from 30 to 60 dollars per month the probability is 1.5 times higher and equal to 35%. If we go further, and look at the clients with month-to-month contracts and 60-90 dollars as monthly payment we will see the probability to churn equals 50%, and finally for clients who are charged more than 90 dollars - all 52%. At the same time, if the client has a two years contract and is charged more than 90 dollars, he will leave the company only with 7% probability. The lowest is the probability to churn (only 1%) if the client pays less than 30 dollars per month and has a two years contract. To sum up, monthly charges matter for new clients but do not have the same effect for loyal customers.
Table 14 Probability of Churn according to the Contract,
Total and Monthly Charges variables
| Contract | Churn | Monthly Charges | Churn | Total Charges | Churn | ||||
| No | Yes | No | Yes | No | Yes | ||||
| M-t-m | 0.57 | 0.43 | (0;30] | 0.91 | 0.09 | (0;435] | 0.56 | 0.44 | |
| 1 year | 0.89 | 0.11 | (30;60] | 0.73 | 0.27 | (435; 1738] | 0.77 | 0.23 | |
| 2 years | 0.97 | 0.03 | (60;90] | 0.66 | 0.34 | (1738;3476] | 0.75 | 0.25 | |
| (90;119] | 0.67 | 0.33 | (3476;8690] | 0.85 | 0.15 | ||||
| Monthly Charges | Churn | Contract | |||||||
| Month-to-month | One year | Two years | |||||||
| (0;30] | No | 0.77 | 0.97 | 0.99 | |||||
| Yes | 0.23 | 0.03 | 0.01 | ||||||
| (30;60] | No | 0.65 | 0.93 | 0.97 | |||||
| Yes | 0.35 | 0.07 | 0.03 | ||||||
| (60;90] | No | 0.50 | 0.89 | 0.98 | |||||
| Yes | 0.50 | 0.11 | 0.02 | ||||||
| (90;119] | No | 0.48 | 0.79 | 0.93 | |||||
| Yes | 0.52 | 0.21 | 0.07 | 
The Bayesian Belief Network with Internet variables can be seen on Figure 12. For clients' personal variables there were no changes, because here the connection mostly depends on natural causes: senior - less likely to have minor children, have a partner - have minor children. Here, the Senior Citizen variable all the same affects Internet Service and Paperless Billing variables, but do not have a direct connection with Contract. Probably, this is because other Internet variables have greater effect on this variable and Senior Citizen's effect is only mediocre. The Internet variable now plays a more important role in the network, because it directly affects six variables and indirectly even more (to start with six correlated with Internet variables which were deleted from the previous network). Besides influencing many other independent variables, the Internet has a direct effect on our dependent variable - Churn. The influence of Tenure variable has increased even more in the new network. It still directly affects Total Charges and Contract variables as well as other eighth variables. Tenure has connections with all new Internet variables but still does not have direct connection with the Internet Service variable itself. There is no direct effect on Monthly Charges variable, but for sure Tenure can affect it indirectly through Tech Support and Streaming Movies variables. The biggest difference with the general Bayesian network is that not a single monetary variable has even an indirect effect on Churn. Both Monthly and Total Charges variables are affected by other variables, but do not affect any other variable itself. Contract is the only variable which as well as in the previous network directly affects Churn. Internet Service and Paperless Billing variables also have a direct connection with Churn.
Figure 12. Bayesian network with all variables
According to Bayesian Belief Network results with all the variables, the probability to churn for the clients with Fiber Optic Internet provider is higher than for DSL (38% versus 22%), which confirms regression results (Table 15). The probability to churn for those who do not have Internet at all is only 8%, but as we know from logistic regression analysis this level is insignificant. Probably, this is why we do not see any direct connection between Internet and Churn variables, because it has only two levels (yes and no, respectively) in that network. Thus, the problem of churn is happening mostly for Fiber Optic clients, and the company should turn their attention to it in order to prevent churn. Another new variable affecting Churn is Paperless Billing. The logistic regression results as well as Bayesian network telling the clients with paperless billing is more likely to churn, with the probability equals 31% while for clients without this option the probability is almost two times lower and equal to 18%. But probably the secret is again in indirect connection of Churn and Internet Service variables. Thus, the probability to have paperless billing is higher for people with Fiber optic provider (77%) than for clients with DSL (55%) (Table 16). More clients with paperless billing also have Fiber optic Internet provider, which increases the probability to churn, that is may be a reason why clients with paperless billing have higher probability to churn. On the general network, there was no such connection between Churn and Paperless Billing, because once again our Internet variable had only two levels.
Table 15 Probability of Churn according to the Contract, Total and Monthly Charges variables
| Churn | Internet Service | Paperless Billing | Contract | ||||||
| DSL | Optic | No | No | Yes | M-t-m | 1 year | 2 years | ||
| No | 0.78 | 0.62 | 0.92 | 0.82 | 0.69 | 0.59 | 0.88 | 0.96 | |
| Yes | 0.22 | 0.38 | 0.08 | 0.18 | 0.31 | 0.41 | 0.12 | 0.04 | |
| Paperless billing | Internet Service | Month-to-month | One year contract | Two years contract | |||||
| No | Churn | No | Churn | No | Churn | ||||
| DSL | 0.66 | 0.34 | 0.88 | 0.12 | 0.98 | 0.02 | |||
| Fiber optic | 0.43 | 0.57 | 0.80 | 0.20 | 0.92 | 0.08 | |||
| No | 0.78 | 0.22 | 0.98 | 0.02 | 0.99 | 0.01 | |||
| No | Internet Service | Month-to-month | One year contract | Two years contract | |||||
| No | Churn | No | Churn | No | Churn | ||||
| DSL | 0.71 | 0.29 | 0.94 | 0.06 | 0.98 | 0.02 | |||
| Fiber optic | 0.54 | 0.46 | 0.83 | 0.17 | 0.96 | 0.04 | |||
| No | 0.83 | 0.17 | 0.97 | 0.03 | 0.99 | 0.01 | 
For the Contract - Churn relationship the situation stayed the same. However, with the addition of the connection with Internet variables, namely Device Protection and Tech Support, the probability to churn for the clients with one and two years contract increased by 1%, while for month-to-month - decreased by 1%. Combination of all three variables given the following results. The probability to churn for clients with paperless billing and month-to-month contract is 34% if the client also has a DSL Internet provider, 57% - Fiber optic Internet provider and 22% if the client has no Internet Service. At the same time, the probability to churn for clients with paperless billing and two years contract is only 2% if the client also has a DSL Internet provider, 8% - Fiber optic Internet provider and less than 1% if the client has no Internet Service. The difference for clients without paperless billing and month-to-month contract is 5% if the client also has a DSL Internet provider, 11% - Fiber optic Internet provider and 5% if the client has no Internet Service. For two years contract and lack of paperless billing option the difference can be seen only for Fiber optic Internet service provider. These results tell us that Paperless Billing variable plays a role only for month-to-month contract duration and Fiber optic Internet, in other cases the effect on the probability of churn is almost invisible.
Table 16 Probability of Paperless Billing according to the Internet Service variable
| Paperless Billing | Internet Service | |||
| DSL | Fiber Optic | No | ||
| No | 0.45 | 0.23 | 0.71 | |
| Yes | 0.55 | 0.77 | 0.29 | 
The effect of Contract and Internet Service variables is greater than Paperless Billing, especially of the first variable. Thus, for the clients with paperless billing and DSL Internet provider changing contract from month-to-month to one year will decrease the probability to churn by 22%, from month-to-month to two years - by 32% and from one year to two years - by 10% (For Fiber optic: 37%, 49% and 12%). At the same time, for the clients with paperless billing and no Internet Service changing contract from month-to-month to one year will decrease the probability to churn by 10%, from month-to-month to two years - by 11% and from one year to two years - by 1%. These results again highlight that the effect of Internet variable only makes sense when we are talking about different Internet providers, and lack of Internet almost does not affect Churn. Moreover, according to these differences based on contract change we can conclude that it has a major effect on churn prediction especially for short-term long-term contract comparison (even without Internet changing in contract affects the probability to churn). The effect of Internet Service variable is also greatly reduced for the clients having two years contract. Thus, for example, the probability to churn for the clients without paperless billing and two years contract equals 4% for Fiber optic Internet service provider, 2% - for DSL and 1% for the lack of Internet Service, making the difference equal to 2% comparing Fiber and DSL and 3% comparing Fiber and the lack of Internet. The same parameters for one year contract give 11% and 14% difference in probabilities. For month-to-month the difference is even more dramatic: 17% and 29%, respectively.
4.4 Customer profiles analysis
As mentioned above, with the help of the Bayesian Belief Network it is possible to look not only at direct connections but also at mediate. This is the uniqueness of this network, thanks to which it is possible to identify the probability of churn depending on customers' profiles. A profile here means a certain set of demographic characteristics of a user in a compartment with the peculiarities of his use of company services. Of course, the very task of churn prediction is extremely important, but understanding which profile will be at risk already gives the company the opportunity to respond in time. Moreover, the knowledge of profiles can be used at the initial stage as a kind of the first stage of data screening: it will be possible to remove profiles from the analysis where the probability of leaving is extremely small. Finally, the Bayesian network can serve as a prototype of customer actions on change of any of the company's services, meaning be useful not only in prediction tasks but in prevention as well. That is, if a company wants to check how the increase in the monthly tariff affects the pace of customer churn, it does not really need to raise the price and look at the reaction, this situation can be simulated on the network. Of course, accurate data cannot be obtained in any way, but a picture close to reality is possible. And this will already give the company an understanding of whether it is worth changing something and, if so, in which direction, plus it will save millions on a live experiment.
Example 1: profile of a recently joined client
Thus, for our data we can also create several profiles. For example, we have a new client, who joined our company only 2 months ago, he has a month-to-month contract, because he is just testing our services. His monthly payment is around 65 dollars and he is paying via electronic check. His probability to leave is very high - 51% (Figure 13). The question is what a company can do to prevent client's leaving or at least to decrease the probability of leaving. Let us play with only significant variables and levels of variables from our logistic regression model output. First of all, the company can give him a personal discount, so the client will now pay, for example, 60 dollars per month. The probability to churn for this client will decrease by 19% (changing the level of Monetary Charges variable from “(60,90]” to “(30;60]” with Tenure fixed on (0;3] level, Payment Method on “Electronic check” level and Contract on “Month-to-month” level), which is quite a good improvement. Or, the company can try to convince the client that automatic payment via bank transfer is much easier and more comfortable for him, and if he agrees his probability to leave will decrease by 3% (changing the level of Payment Method variable from “Electronic check” to “Bank transfer (automatic)” with Tenure fixed on (0;3] level, Monetary Charges on “(60;90]” level and Contract on “Month-to-month” level). One more variant will be to sell a one year contract to this client (changing the level of Contract variable from “Month-to-month” to “One year” with Tenure fixed on (0;3] level, Monetary Charges on “(60;90]” level and Payment Method on “Electronic check” level)., because it is cheaper (the price per month will be lower). In this case the probability to churn for this client will be only 17% (decreased by 34%), and at least for a year the company can be calm that the client will not go anywhere.
Figure 13. Churn simulations for a recently joined client's profile
We can even complicate the task and add two specific Internet variables to our analysis: online security and tech support. At the beginning let us assume our client does not have these two options, then his probability to churn will be 52%. By changing only the level of the Online Security variable into “yes” (with Tenure fixed on (0;3] level, Monetary Charges on “(60;90]” level, Payment Method on “Electronic check” level, Contract on “Month-to-month” level and Tech Support on “No” level)., we decrease the probability of leaving by 4%, only the level of the Tech Support variable - by 6% (with Tenure fixed on (0;3] level, Monetary Charges on “(60;90]” level, Payment Method on “Electronic check” level, Contract on “Month-to-month” level and Online Security on “No” level), both variables - by 11% (Figure 14). These two examples show how non-monetary factors, like automatic payment, can seriously affect the probability to churn. Furthermore, it shows that sometimes non-monetary factors can be even more effective than monetary.
Figure 14. Churn simulations with specific Internet variables for a recently joined client's profile
Example 2: profile of a long-time client
On the other hand, let us look at a completely different profile of a long-term loyal client. Our client has been using our services for 5 years, paying at the highest rate - 95 dollars per month, still paying via electronic check and does not have tech support or online security options. He is having a two years contract, because he is a loyal client of the company. The probability to churn for such a profile will be low, only 7.5% (Figure 15). First of all, this information lets us delete such profiles from our data and concentrate our attention on other profiles, which have higher probabilities to leave. When you have millions of clients it may be very important to focus your attention on more urgent cases, and then go back to others. Secondly, we still can think how to decrease this 7.5% probability of churn for loyal clients to be almost 100% sure that these clients will never leave our company and will be using our services no matter what.
If the client will change his payment method to automatic or agrees to try an online security option - his probability will not change (with Tenure fixed on (36;60] level, Contract on “Two year” level, Monetary Charges on “(90;119]” level, and Tech Support on “No” level). Surprisingly, how for a one customer profile the same changes decreased the probability by 3-4% and for other no changes can be observed. But if the client will activate the tech support option, his probability will decrease by 0.07%, at least something (with Tenure fixed on (36;60] level, Contract on “Two year” level, Monetary Charges on “(90;119]” level, Payment Method on “Electronic check” level, and Online Security on “No” level). If the company will give a personal discount for monthly payment to this loyal client, and he will be paying 90 dollars per month, once again, the probability will only decrease by 0.9%, still not enough (with Tenure fixed on (36;60] level, Contract on “Two year” level, Payment Method on “Electronic check” level, Online Security and Tech Support on “No” level). But what will happen if the company will try to work comprehensively, meaning to suggest different offers to the client at once. Then, our client will be paying 90 dollars per month via automatic transfer from his bank account and using online security and tech support options ((with Tenure fixed on (36;60] level, and Contract on “Two year” level). The probability to churn will decrease by 4.5%, and now the probability to leave for this loyal client will be only 3%.
Figure 15. Churn simulations for a long-time client`s profile
If we look at our network, we will see that all these variables are affected by the Tenure variable, making it very important. But only one variable from our data has an effect on tenure - partner variable. According to our analysis we know that the longer the client uses company's services, the lower will be the probability to churn for this client. And the company, actually, can influence this probability, because, according to the Bayesian Belief Network having a partner increases the probability to stay longer with the company. This knowledge gives the company another method of churn prevention. Thus, the company can suggest the client to invite his partner to join the company's services with some discount for the first month or can set up their marketing company on couples with the slogan “together cheaper”. Yes, the company will lose some money on that discount, but at the end get two clients connected to each other and thereby connected with this company. All the examples above show how changers in different factors influence different profiles. For a new client, monetary factors can be good in churn prevention as well as different non-monetary factors can successfully decrease the probability of leaving, but for the loyal clients changing one factor will not be enough, here the company should act in a complex.
Conclusion and discussion
The results in this research on customer churn prediction coincide with the results of the earlier written papers. As in works (Borrotti, 2018; Machado et al., 2019), XGBoost works best in predicting customer churn for telecommunications companies. This machine learning method through the sequential construction of many decision trees achieves the most accurate results in prediction, while avoiding overfitting with the help of regularization. Logistic regression works slightly worse than machine learning algorithms in prediction of customer churn. At the same time other researchers Li and Li (2019) similarly to us used logistic regression in compare to XGBoost in order to predict churners on the e-commerce platform, and similarly XGBoost showed better results. In addition, in other work of Mandбk (2017) he came to conclusion even if logistic regression is not the best algorithm for churn prediction, it still can be as an rational tool to identify customers who are at risk of churning. Otherwise in our work it shows a good result, but XGBoost outperforms. This can be partially explained by the fact that due to the high correlation of many variables, not all of them are included in the model. As a result, the model is built only on the part of all variables that could have an effect on churn, which gives a less accurate prediction results.
This work is unique due to the distinctive combination of several methods of prediction based on completely different algorithms of action. It is the set of results of each method that helps to most fully look at the picture of customers' churn in telecommunication companies. A single XGBoost method gives the most accurate churners prediction results, but it does not provide information on which variables have the greatest impact on churn and how the variables are related between themselves. Random Forest, in turn, reflects the importance of each variable for churn prediction, but does not explain how exactly the different levels of each variable affect this churn variable. By adding logistic regression, we finally get the results for each variable, but we still cannot see the relationship of each independent variable with each other and causality between them. And for the work of Dalvi (2016) they also figured out that logistic regression especially in pair with other machine learning methods (decision tree) supplement each other and gives "valuable insights of the telecommunication industry". And here the Bayesian network comes to our aid, which reveals how, depending on the change in one variable, the effect of another variable on the churn changes, that is, we see both direct and indirect connections. This tool helps to cover the gap between building the extremely accurate models and their usefulness for resolving managerial issues. Bayesian Belief Networks help to cover this gap with resulting a high accuracy, which can be comparable with the accuracy for models built with usage of machine learning algorithms, with giving the results, which can show causal relations between the variables. Therefore, Bayesian networks perfectly complement the results obtained using classical methods and machine learning algorithms. At the same time, with usage of this algorithm we can see how changing in one factor can affect the probability of event happening.
For the Telco dataset, the results obtained Bayesian Belief Network indicate that the most important general factors influencing churn are time and money. In other words, the longer the client uses the services of the company and, accordingly, the longer is the term of the contract concluding with this company, as well as the lower the monthly payments for services, the less is the chance of leaving this company. However, when adding specific variables, such as various options related to the presence of the Internet, the effect of monetary variables is nullified. It turns out that to prevent customer churn, companies do not have to lose millions on personal discounts. Monetary factors of churn prevention certainly have a place to be, however, before lowering prices, companies better to look not just at the changes in the probability of leaving the company for the client regarding monthly payments but taking into account other factors. Thus, from the results obtained in this paper, the monetary factor will not be always the best option in churn prevention. For example, for new customers the company can think about non-monetary ways of customer retention, because it will be more useful. Thus, the company can offer them free of charge monthly trial to connect a device protection option or a technical support service. After a month of trial, the client may decide to keep these services because of their convenience. Both of these factors reduce the probability of churn, and for the company the loss of one month of the trial version will be insignificant and even pay off if the client wants to connect them for good. As a result, customers at risk will remain in the company, and their probability of leaving will become close to zero. Completely removing the monetary component, you can offer the client to use automatic payment through a bank account or credit card as a payment method. This will not give any cash costs to the company, except to slightly increase the work of one of the customer service managers, however, it will reduce the probability of this client leaving, and as a result, the probability of company's cash losses for the next period of time. For long-term loyal clients it will be more difficult to decrease the probability to churn, but at the same time, it is small enough to do nothing with them.
Подобные документы
- Improving the business processes of customer relationship management through automation. Solutions the problem of the absence of automation of customer related business processes. Develop templates to support ongoing processes of customer relationships. 
 реферат [173,6 K], добавлен 14.02.2016
- Analysis of the peculiarities of the mobile applications market. The specifics of the process of mobile application development. Systematization of the main project management methodologies. Decision of the problems of use of the classical methodologies. 
 контрольная работа [1,4 M], добавлен 14.02.2016
- The concept and features of bankruptcy. Methods prevent bankruptcy of Russian small businesses. General characteristics of crisis management. Calculating the probability of bankruptcy discriminant function in the example of "Kirov Plant "Mayak". 
 курсовая работа [74,5 K], добавлен 18.05.2015
- Selected aspects of stimulation of scientific thinking. Meta-skills. Methods of critical and creative thinking. Analysis of the decision-making methods without use of numerical values of probability (exemplificative of the investment projects). 
 аттестационная работа [196,7 K], добавлен 15.10.2008
- Impact of globalization on the way organizations conduct their businesses overseas, in the light of increased outsourcing. The strategies adopted by General Electric. Offshore Outsourcing Business Models. Factors for affect the success of the outsourcing. 
 реферат [32,3 K], добавлен 13.10.2011
- Сущность CRM-систем - Customer Relationship Management. Преимущества клиенториентированного подхода к бизнесу. Формы функционирования и классификация CRM-систем. Основные инструменты, которые включает в себя технология управления отношениями с клиентами. 
 реферат [30,9 K], добавлен 12.01.2011
- Рассмотрение концепции Customer Relationship Management по управлению взаимоотношениями с клиентами. Возможности CRM-систем, их влияние на эффективность бизнеса. Разработка, реализация и стоимость проекта внедрения CRM-системы для ЗАО "Сибтехнология". 
 дипломная работа [5,5 M], добавлен 15.09.2012
- Description of the structure of the airline and the structure of its subsystems. Analysis of the main activities of the airline, other goals. Building the “objective tree” of the airline. Description of the environmental features of the transport company. 
 курсовая работа [1,2 M], добавлен 03.03.2013
- Value and probability weighting function. Tournament games as special settings for a competition between individuals. Model: competitive environment, application of prospect theory. Experiment: design, conducting. Analysis of experiment results. 
 курсовая работа [1,9 M], добавлен 20.03.2016
- About cross-cultural management. Differences in cross-cultural management. Differences in methods of doing business. The globalization of the world economy and the role of cross-cultural relations. Cross-cultural issues in International Management. 
 контрольная работа [156,7 K], добавлен 14.04.2014
