Probabilistic graphical models in customer analytics: comparison with classical predictive models
This paper presents the application of different methods in order to have a complex vision on customers’ churn problem. It detects which algorithms can be used not only for churn prediction but also for churn prevention issues. Churn prevention analysis.
Рубрика | Менеджмент и трудовые отношения |
Вид | дипломная работа |
Язык | английский |
Дата добавления | 25.08.2020 |
Размер файла | 1,3 M |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
The effect of the duration of the contract on the churn does not change depending on the inclusion or ignoring of additional Internet factors. That is, in any case, one of the key parameters of leaving is a short-term contract. In paper of Mandбk (2017) same results were achieved: customer duration and contract duration found out to be the most influential in both of his models, while value added services variable has a big impact on dependent variable in logistic regression model, too. Contract duration is also important predictor of churners for the works of Mandбk and Hanиlovб (2019). Similarly, to our paper clients with soon-to-be-expired (short in our case) are more possible to churn. This conclusion should give companies an understanding that it is not so important whether the client has paperless billing, technical support, Internet or streaming TV services, but it is important what is the duration of his contract and how long he has been with the company. The short tenure is an important predictor for the works of Kisioglu and Topcu (2011), Kumar and Kumar (2019) as in our paper: less time customers spent in our company, the higher chances of the, to churn. It is important to mention that for some of works before ours internet services were one of the most important features of churners and had more influential weighted factors in terms of customer churn and in spite of this they positively related with churn behaviour: so the usage of these services exceed the probability of customer to be a churner (Kumar & Kumar, 2019). In the other work of Hou (2018) customer's "consuming" content characteristics play a significant part in churn customer's prediction. If time of staying with the company can be influenced only over time, providing the service at the highest level, which will affect customer loyalty and its continued use of the services, the first can be affected immediately. For example, to make conditions on one and two-year contracts much more profitable in comparison with a monthly one. Yes, perhaps this will bring some monetary losses per client, but the total revenue of the company will increase, as the churn rate will decrease, and you will not have to spend money on retaining a large group of customers.
At the same time, our paper has some limitations, with having of which not all the results can be clearly explained, and not all the dependencies can be seen from the performed Bayesian Network analysis. As it was written, for BN for getting valuable data we need to look at all the nodes and edges, where middle node for two other extremes can be "mediator" which helps to figure out the direction and type of relation, and makes two extremes conditionally independent until the "mediator" exists. At the same time, some relations cannot be explained because we might mislead some useful explanations for understanding the type and direction of the relationship. Thus, in order to get all the results correctly and clearly explained - we need to have all the variables under control, without thinking of other latent variables. Though the data we used was taken from Kaggle platform and exists there in a shortage form: not all the available in the full data variables are included in Kaggle version. We figured it out lately and decided not to include all the variables from the original dataset and to focus on only Kaggle data. It may happen that additional variables can clear some unexplained results or relations. Originally, IBM offers 5 datasets, which contain different information about customers. We will describe only variables, which differ from our dataset. The first dataset has information about customers' demographics: concrete age in years, if customer married or not, and number of dependents. The second and the third datasets enclose information about customers' location: country, state, city, zip code, combined latitude and longitude, latitude, longitude, and population estimate for the area. The fourth dataset has similar information to our data, while additionally it has information about number of referrals customer has mode or their absence, about the last marketing offer customer accepted, average charges for calls outside the area and download volume in gigabytes, indicates if person uses internet for streaming music, if customer paid additionally to have unlimited downloads, and finally there are information about all the total refunds, extra data charges, and charges for roaming. The fifth dataset contains information about churn status, and everything related to it. The additional variables that are not included into the dataset we analysed are satisfaction score of the customer, customer status (if he stayed, churned or just joined), churn score (predicted with SPSS), predicted CLTV, reason for churn, and churn category, where customer's reason for churning can be found.
Of course, the goal of our work is not to create the ideal prediction model, while looking deeply at what additional useful information we can get from adding graphical models, or what basic models could give us if we look at the complementary. As you can see from the listed variables above, additional information about customers can be used in the future works where the methods and models performed in our work can be checked with new added variables. These variables can help to form more concrete customers' profiles. Further, the approach from this paper could be used not only in churn prediction, but as well in churn prevention, or for any marketing needs of the company.
References
Abedzadeh, N., & Nematbakhsh, M. (2012). Using CLV for modelling churn and customer retention. International Journal of Electronic Marketing and Retailing, 5(2), 128-146.
Ascarza, E. (2018). Retention Futility: Targeting High-Risk Customers Might be Ineffective. Journal of Marketing Research, 55 (1), 80-98.
Ascarza, E., Iyengar, R. & Schleicher, M. (2018). The Perils of Proactive Churn Prevention Using Plan Recommendations: Evidence from a Field Experiment. Journal of Marketing Research, 53 (1), 46-60.
Bergmeir, C., Hyndman, R. J., & Koo, B. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics and Data Analysis, 120, 70-83.
Bewick, V., Cheek, L., & Ball, J. (2005). Statistics review 14: Logistic regression. Critical care, 9(1), 112.
Bharadwaj. S., Anil. B.S., Pahargarh. A., Pahargarh. A., Gowra. P.S. & Kumar. S. (2018). Customer Churn Prediction in Mobile Networks using Logistic Regression and Multilayer Perceptron(MLP). 2nd International Conference on Green Computing and Internet of Things, ICGCIoT 2018, 8752982, pp. 436-438.
Bilal Zoriж, A. (2016). Predicting Customer Churn in Banking Industry Using Neural Networks. Interdisciplinary Description of Complex Systems: INDECS, 14(2), 116-124.
Blattberg, R., Getz, G., & Thomas, J.S. (2001). Customer Equity: Building and Managing Relationships as Valuable Assets. Harvard Business School Press.
Blattberg, R., Malthouse, E. & Neslin, S. (2009). Customer Lifetime Value: Empirical Generalizations and Some Conceptual Questions. Journal of Interactive Marketing, 23, 157-168.
Borrotti, M. (2018, April). Customer Churn prediction based on eXtreme Gradient Boosting classifier. 49th Scientific meeting of the Italian Statistical Society.
Breiman, L. (2002). Manual on setting up, using, and understanding random forests v3. 1. Statistics Department University of California Berkeley, CA, USA, 1, 58.
Bruce, P., & Bruce, A. (2017). Practical statistics for data scientists: 50 essential concepts. "O'Reilly Media, Inc.".
Cai, Z., Sun, S., Si, S., & Yannou, B. (2011). Identifying product failure rate based on a conditional Bayesian network classifier. Expert Systems with Applications, 38 (5), pp. 5036-5043
Calciu, M. (2008). Numeric decision support to find optimal balance between customer acquisition and retention spending. Journal of Targeting, Measurement and Analysis for Marketing, 16(3), 214-227.
Chakraborty, S., Mengersen, K., Fidge, C., Ma, L., & Lassen, D. (2016). A Bayesian Network-based customer satisfaction model: a tool for management decisions in railway transport. Decision Analytics, 3(1), 4.
Chen, S., Huang, W., Chen, M., Zhong, J., & Cheng, J. (2017). Airlines Content Recommendations Based on Passengers' Choice Using Bayesian Belief Networks. In Bayesian Inference. IntechOpen.
Chen, X., Yang, B., & Lin, Z. (2018). A random forest learning assisted “divide and conquer” approach for peptide conformation search. Scientific reports, 8(1), 1-8.
Chiang, D., Wang, Y., Lee, S. & Lin, C. (2003). Goal-oriented sequential pattern for network banking churn, Expert Systems with Applications 25, pp. 293-302.
Dalvi, P.K., Khandge, S.K., Deomore, A., Bankar, A., & Kanade, V.A. (2016). Analysis of customer churn prediction in telecom industry using decision trees and logistic regression. 2016 Symposium on Colossal Data Analysis and Networking, CDAN 2016 7570883.
De Caigny, A., Coussement, K., & De Bock, K.W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research 269(2), pp. 760-772.
Dogan, I. (2012). Analysis of facility location model using Bayesian Networks. Expert Systems with Applications, 39, pp. 1092-1104
Eircom (2008). <http://www.eircom.ie/cgi-bin/bvsm/bveircom/mainPage.jsp>.
Faris, H., Al-Shboul, B., & Ghatasheh, N. (2014, September). A genetic programming based framework for churn prediction in telecommunication industry. In International Conference on Computational Collective Intelligence (pp. 353-362). Springer, Cham.
Ferreiro, S., Arnaiz, A., Sierra, B., & Irigoien I. (2012). Application of Bayesian networks in prognostics for a new Integrated Vehicle Health Management concept. Expert Systems with Applications, 39, pp. 6402-6418
Gregory, B. (2018). Predicting customer churn: Extreme gradient boosting with temporal data. arXiv preprint arXiv:1802.03396.
Hadden, J., Tiwari, A., Roy, R., & Ruta, D. (2006). Churn prediction: Does technology matter. International Journal of Intelligent Technology, 1(2), 104-110.
He, B., Shi, Y., Wan, Q., & Zhao, X. (2014). Prediction of customer attrition of commercial banks based on SVM model. Procedia Computer Science, 31, 423-430.
Hou, B.-Z., Wu, Y., Zheng, L.-M., Zhao, D.-L., & Xie, A.-R. (2018). Customer churn prediction in Chinese traditional broadcasting industry: A positive analysis. International Conference on Management Science and Engineering - Annual Conference Proceedings2017-August, 8574436, pp. 596-605.
Hu, H.-Y. (2019). Research on customer churn prediction using logistic regression model. Advances in Intelligent Systems and Computing, vol. 885, pp. 344-350.
Huang, B. Q., Kechadi, T. M., Buckley, B., Kiernan, G., Keogh, E., & Rashid, T. (2010). A new feature set with new window techniques for customer churn prediction in land-line telecommunications. Expert Systems with Applications, 37(5), 3657-3665.
Huang, Y., & Kechadi, T. (2013). An effective hybrid learning system for telecommunication churn prediction. Expert Systems with Applications, 40(14), 5635-5647.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, pp. 3-7). New York: springer.
Karapinar, H.C., Altay, A., & Kayakutlu, G. (2016). Churn detection and prediction in automotive supply industry. Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, FedCSIS 2016 7733421, pp. 1349-1354.
Keramati, A., Ghaneei, H., & Mirmohammadi, S.M. (2016). Developing a prediction model for customer churn from electronic banking services using data mining. Financial Innovation 2(1),10.
Kisioglu, P., & Topcu, Y. I. (2011). Applying Bayesian Belief Network approach to customer churn analysis: A case study on the telecom industry of Turkey. Expert Systems with Applications, 38(6), 7151-7157.
Knox, G. & Oest, R. (2014). Customer Complaints and Recovery Effectiveness: A Customer Base Approach. Journal of Marketing, 78, 42-57.
Kumar, S., & Kumar, M. (2019, May). Predicting Customer Churn Using Artificial Neural Network. In International Conference on Engineering Applications of Neural Networks (pp. 299-306). Springer, Cham.
Lee, K. C., & Jo, N. Y. (2010). Bayesian Network Approach to Predict Mobile Churn Motivations: Emphasis on General Bayesian Network, Markov Blanket, and What-If Simulation. Lecture Notes in Computer Science, 304-313.
Li, X., & Li, Z. (2019). A hybrid prediction model for e-commerce customer churn based on logistic regression and extreme gradient boosting algorithm. Ingenierie des Systemes d'Information 24(5), pp. 525-530.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22.
Machado, M. R., Karray, S., & de Sousa, I. T. (2019, August). LightGBM: an Effective Decision Tree Gradient Boosting Method to Predict Customer Loyalty in the Finance Industry. In 2019 14th International Conference on Computer Science & Education (ICCSE) (pp. 1111-1116). IEEE.
Mandбk, J. (2017). Comparison of logistic regression and decision tree for customer churn prediction in Telecommunications. SMSIS 2017 - Proceedings of the 12th International Conference on Strategic Management and its Support by Information Systems 2017 pp. 282-292.
Mandбk. J. & Hanиlovб. J. (2019). Use of logistic regression for understanding and prediction of customer churn in telecommunications. Statistika, 99 (2), pp. 129-141.
Manongdo, R., & Xu, G. (2017). Applying client churn prediction modeling on home-based care services industry. IEEE/ACM BESC 2016 - Proceedings of 2016 International Conference on Behavioral, Economic, Socio - Cultural Computing 7804503.
McNeish, D. M. (2015). Using lasso for predictor selection and to assuage overfitting: A method long overlooked in behavioral sciences. Multivariate Behavioral Research, 50(5), 471-484.
Mena, C. G., De Caigny, A., Coussement, K., De Bock, K. W., & Lessmann, S. (2019). Churn Prediction with Sequential Data and Deep Neural Networks. A Comparative Analysis. arXiv preprint arXiv:1909.11114.
Mood, C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European sociological review, 26(1), 67-82.
Novo, J. (2004). Drilling down: turning customer data into profits with a spreadsheet. Jim Novo.
Peng, C. Y. J., Lee, K. L., & Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. The journal of educational research, 96(1), 3-14.
Perucca, G., & Salini, S. (2014). Travellers' Satisfaction with Railway Transport: A Bayesian Network Approach. Quality Technology and Quantitative Management, vol. 11, no. 1, pp. 71-84
Pfeifer, P. E. (2005). The optimal ratio of acquisition and retention costs. Journal of Targeting, Measurement and Analysis for Marketing, 13(2), 179-188.
Reichheld, F. F., & Kenny, D. W. (1990). The hidden advantages of customer retention. Journal of Retail Banking, 12(4), 19-24.
Safinejad, F., Noughabi, E. A. Z., & Far, B. H. (2018). A Fuzzy Dynamic Model for Customer Churn Prediction in Retail Banking Industry. In Applications of Data Management and Analysis (pp. 85-101). Springer, Cham.
Scholkopf, B., Platt, J. C., Shawe, J. T., Smola, A. J., Williamson, R. C. (1999). Estimation the support of a high-dimensional Distribution. Technical Report MSR-TR-99-87, Microsoft Research.
Si, S., Zhang, H., Keerthi, S. S., Mahajan, D., Dhillon, I. S., & Hsieh, C. J. (2017, August). Gradient boosted decision trees for high dimensional sparse output. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 3182-3190). JMLR. org.
Stripling. E., Vanden Broucke. S., Antonio. K., Baesens. B. & Snoeck. M. (2015). Profit maximizing logistic regression modeling for customer churn prediction. 2015 IEEE International Conference on Data Science and Advanced Analytics, 7344874.
Tambde, A., & Motwani, D. (2019). Employee churn rate prediction and performance using machine learning. International Journal of Recent Technology and Engineering 8(2 Special Issue 11), pp. 824-826.
Tsai, C. F., & Chen, M. Y. (2010). Variable selection by association rules for customer churn prediction of multimedia on demand. Expert Systems with Applications, 37(3), 2006-2015.
Tsai, C. F., & Lu, Y. H. (2009). Customer churn prediction by hybrid neural networks. Expert Systems with Applications, 36(10), 12547-12553.
Ullah, I., Raza, B., Malik, A. K., Imran, M., Islam, S. U., & Kim, S. W. (2019). A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access, 7, 60134-60149.
Vafeiadis, T., Diamantaras, K. I., Sarigiannidis, G., & Chatzisavvas, K. C. (2015). A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55, 1-9.
Venkatraman, R., & Ragala, R. (2017). A survey on churn analysis and prediction in video on demand. Asian Journal of Pharmaceutical and Clinical Research 10, pp. 158-161.
Verbeke, W., Dejaeger, K., Martens, D., Hur, J., & Baesens, B. (2012). New insights into churn prediction in the telecommunication sector: A profit driven data mining approach. European Journal of Operational Research, 218(1), 211-229.
Wьbben, M., & Wangenheim, F. V. (2008). Instant customer base analysis: Managerial heuristics often “get it right”. Journal of Marketing, 72(3), 82-93.
Xia, G. E., & Jin, W. D. (2008). Model of customer churn prediction on support vector machine. Systems Engineering-Theory & Practice, 28(1), 71-77.
Xie, Y., Li, X., Ngai, E. W. T., & Ying, W. (2009). Customer churn prediction using improved balanced random forests. Expert Systems with Applications, 36(3), 5445-5449.
Yanfang, Q., & Chen, L. (2018). Research on E-commerce user churn prediction based on logistic regression. Proceedings of the 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2017, 2018-January, pp. 87-91.
Yang, C., Shi, X., Luo, J., & Han, J. (2018). I know you'll be back: Interpretable new user clustering and churn prediction on a mobile social application. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 914-922.
Zhang, Y., Qi, J., Shu, H., & Cao, J. (2007, October). A hybrid KNN-LR classifier and its application in customer churn prediction. In 2007 IEEE International Conference on Systems, Man and Cybernetics (pp. 3265-3269). IEEE.
Zhao, Y., Li, B., Li, X., Liu, W., & Ren, S. (2005, July). Customer churn prediction using improved one-class support vector machine. In International Conference on Advanced Data Mining and Applications (pp. 300-306). Springer, Berlin, Heidelberg.
Zhu, Q., Yu, X., Zhao, Y., & Li, D. (2019, October). Customer churn prediction based on LASSO and Random Forest models. In IOP Conference Series: Materials Science and Engineering (Vol. 631, No. 5, p. 052008). IOP Publishing.
Appendice
Table 1 Description of each variable from the dataset
Variable |
Description |
Scale |
Examples |
|
customerID |
Unique customer number |
|||
gender |
Represents if the customer is a male or a female |
Сategorical |
Male, Female |
|
SeniorCitizen |
Shows if the customer is a senior citizen or not |
Сategorical |
(Yes, No) |
|
Partner |
Represents if the customer has a partner or not |
Сategorical |
(Yes, No) |
|
Dependents |
Shows if the customer has dependents or not |
Сategorical |
(Yes, No) |
|
tenure |
Shows the number of months the customer has stayed with the company |
Numeric |
Range: 0-72 |
|
PhoneService |
Represents if the customer uses phone service or not |
Сategorical |
(Yes, No) |
|
MultipleLines |
Indicates if the customer has multiple lines or not |
Categorical |
(Yes, No, No phone service) |
|
InternetService |
Shows if the customer has an internet service provider type and which one, or not |
Categorical |
(DSL, Fiber optic, No) |
|
OnlineSecurity |
Represents if the customer has online security or not (Online security - is about online protection of customer online by instantly blocking harmful and phishing websites) |
Categorical |
(Yes, No, No internet service) |
|
OnlineBackup |
Shows if the customer has online backup or not (Online backup - if the data from customer phone is stored in cloud or not) |
Categorical |
(Yes, No, No internet service) |
|
DeviceProtection |
Represents the customer has device protection or not (Device protection - Range of security measures, from anti-malware protection and VPN to physical theft counteractions that include remote wiping, locating of stolen device and blocking of access to it) |
Categorical |
(Yes, No, No internet service) |
|
TechSupport |
Represents the customer has tech support or not (Tech support is a service which provides technical help and solutions to hardware and software problems) |
Categorical |
(Yes, No, No internet service) |
|
StreamingTV |
Shows if the customer has streaming TV or not |
Categorical |
(Yes, No, No internet service) |
|
StreamingMovies |
Represents if the customer has streaming movies or not |
Categorical |
(Yes, No, No internet service) |
|
Contract |
The type contract term of the customer, the duration of the contract |
Categorical |
(Month-to-month, One year, Two year) |
|
PaperlessBilling |
Shows if the customer has paperless billing or not |
Сategorical |
(Yes, No) |
|
PaymentMethod |
The type of how customer pays for the service |
Categorical |
(Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic)) |
|
MonthlyCharges |
The money amount charged to the customer monthly |
Numeric |
Range: 18.3-119 |
|
TotalCharges |
The total money amount charged to the customer for the whole duration of service usage |
Numeric |
Range: 18.8-8680 |
|
Churn |
Shows if the customer churned or not |
Сategorical |
(Yes or No) |
Размещено на Allbest.ru
Подобные документы
Improving the business processes of customer relationship management through automation. Solutions the problem of the absence of automation of customer related business processes. Develop templates to support ongoing processes of customer relationships.
реферат [173,6 K], добавлен 14.02.2016Analysis of the peculiarities of the mobile applications market. The specifics of the process of mobile application development. Systematization of the main project management methodologies. Decision of the problems of use of the classical methodologies.
контрольная работа [1,4 M], добавлен 14.02.2016The concept and features of bankruptcy. Methods prevent bankruptcy of Russian small businesses. General characteristics of crisis management. Calculating the probability of bankruptcy discriminant function in the example of "Kirov Plant "Mayak".
курсовая работа [74,5 K], добавлен 18.05.2015Selected aspects of stimulation of scientific thinking. Meta-skills. Methods of critical and creative thinking. Analysis of the decision-making methods without use of numerical values of probability (exemplificative of the investment projects).
аттестационная работа [196,7 K], добавлен 15.10.2008Impact of globalization on the way organizations conduct their businesses overseas, in the light of increased outsourcing. The strategies adopted by General Electric. Offshore Outsourcing Business Models. Factors for affect the success of the outsourcing.
реферат [32,3 K], добавлен 13.10.2011Сущность CRM-систем - Customer Relationship Management. Преимущества клиенториентированного подхода к бизнесу. Формы функционирования и классификация CRM-систем. Основные инструменты, которые включает в себя технология управления отношениями с клиентами.
реферат [30,9 K], добавлен 12.01.2011Рассмотрение концепции Customer Relationship Management по управлению взаимоотношениями с клиентами. Возможности CRM-систем, их влияние на эффективность бизнеса. Разработка, реализация и стоимость проекта внедрения CRM-системы для ЗАО "Сибтехнология".
дипломная работа [5,5 M], добавлен 15.09.2012Description of the structure of the airline and the structure of its subsystems. Analysis of the main activities of the airline, other goals. Building the “objective tree” of the airline. Description of the environmental features of the transport company.
курсовая работа [1,2 M], добавлен 03.03.2013Value and probability weighting function. Tournament games as special settings for a competition between individuals. Model: competitive environment, application of prospect theory. Experiment: design, conducting. Analysis of experiment results.
курсовая работа [1,9 M], добавлен 20.03.2016About cross-cultural management. Differences in cross-cultural management. Differences in methods of doing business. The globalization of the world economy and the role of cross-cultural relations. Cross-cultural issues in International Management.
контрольная работа [156,7 K], добавлен 14.04.2014