Predicting Customer Lifetime Value through Data Mining Technique in a Direct Selling Company Arsie P. Mauricio1, John Michael M. Payawal2, Maida A. Dela Cueva3, Venusmar C. Quevedo4 Industrial Engineering Department Adamson University Manila, Philippines 1000 [email protected], [email protected], [email protected], [email protected] Abstract— The Direct Selling Industry in the Philippines is continuously growing as more people become direct sellers. With this, the ability of direct selling companies to manage its sellers will be a challenge. Customer Lifetime Value (CLV), or the monetary value a customer is expected to contribute to the company before churning, is one measure that can be used as a basis for managing customers and for this matter, the computation of the CLV must be accurate enough to be used effectively. However, CLV computation that is specific only for direct selling companies is not yet established. This research used Data Mining Techniques, specifically Binomial Logistic Regression Analysis and Multilayer Perceptron Neural Network to develop a model that can predict CLV based on historical customer’s transaction and demographic data. Through Binomial Logistic Regression Analysis, the direct seller’s average Service Lifetime was found to be 12 years and the significant factors that affects customer churn was determined as well. Through Multiple Linear Regression Analysis the significant factors that affect customer profit contribution was identified. Markov Chain Analysis was then used to establish possible customer states and a state transition probability matrix. Finally, Multilayer Perceptron Neural Network with 1 hidden layer was used to establish a neural network predictive model. The results are used to develop the final model, which was based on the Present Worth formula. The resulting model has a hold out relative error of 0.018, which indicates a good predictive accuracy of the equation. The model can be used by direct selling companies to help them manage their customers more effectively. Keywords— Selling, Customer Lifetime Value, Data Mining, Binomial Logistic Regression, Multilayer Perceptron Neural Network, Multiple Linear Regression, Markov Chain I. INTRODUCTION Customer Lifetime Value (CLV) is one of the core tools in Customer Relationship Management. Customer Lifetime Value is the monetary net present value of the profit contribution of the customer throughout his lifetime or service length with the company. It can be used to determine which customer segments are the most profitable (high value) and which are not (low value) by predicting how much profit will the customer contribute to the company in the future. CLV model in each business differs, as the behaviour of the customers in each of the different business varies. Different CLV models are proposed by several researchers for specific industries in the past years but the Direct Selling Industry was not tapped yet. This study created a CLV model, based on Data Mining Models, to be applied on the Direct Selling Industry with the direct sellers as its Customers. The researchers conducted the study on a leading direct selling company in the Philippines that sells fashion items. Three (3) year customer transaction and demographic data were collected and are subjected to Data Mining Techniques. In this study, we aim (a) to identify the significant factors that affect customer churn in a direct selling company, (b) to identify the significant factors that affect customer profit contribution in a direct selling industry, and (c) to develop a model that can be used to predict customer lifetime value in a direct selling company. II. REVIEW OF RELATED LITERATURE AND STUDIES The Direct Selling Industry of the Philippines (DSAP) defined Direct Selling as the “face to face selling” of any product to customers via independent distributors or sellers [1]. Customer lifetime value (CLV) prediction is considered the “touchstone for customer relationship management” [2]. In a direct selling sense, CLV is the measure of how much a direct seller will contribute to the company, in monetary terms, before he/she terminates his/her transaction terminate, say he/she stops being a direct seller or switched transactions to other direct selling company. Through the years, CLV prediction models have been established by several researchers. In the context of banking sector, Khajvand and Jafar proposed an RFM model for segmenting customers and a time series method for CLV prediction [3]. A customer-pyramid approach segmentation technique and Markov Decision process in determining the maximum customer lifetime value was also studied [4]. On a Department Store and Supermarket setting, a weighted RFM by AHP method and Artificial Neural Network SOM method for sorting customers and enhancing customer lifetime value prediction was presented [5]. In a car manufacturing and maintenance sectors, a study regarding the use of data mining models, specifically Decision Tree, Logistics Regression, Markov Chain, and Neural Network, was used to determine the most applicable method in customer lifetime value prediction [6]. Based on different literatures present, Data 978-1-5090-1671-6/16/$31.00 ©2016 IEEE Mining is the most appropriate technique to use in CLV prediction. Data Mining is a means of “explaining the past and predicting the future” through the use of data analysis [7]. Data Mining uses mathematical models and algorithms to segment data and predict values through searching large collections of data to detect patterns and trends [8]. It should be noted that the field where Data Mining is used the most is CRM [9]. In this paper, the researchers used one type of Classification Data Mining Technique: Logistic Regression. Logistic Regression attempts to predict the probability that a certain binary event will happen. Logistic Regression enables a statistical analysis to overcome restrictions of linear regression. For example, in Logistic Regression, the dependent and independent variables can have no linear relationship and that the dependent variable and the error terms need not to be normally distributed [10]. Logistic Regression is similar to a linear regression except that it predicts a dichotomous dependent variable instead of a continuous one. Logistic Regression was used to predict the service life or the time before churn of each subject customer. The researchers also used one type of Regression Data Mining Technique: Artificial Neural Network (ANN). Neural Network, or more properly known as Artificial Neural Network, is based on the structure of a mammalian cerebral cortex where a neuron cell is stimulated by several inputs and can be activated by an outside process [11]. The overall process creates a predictive model that can be used even if linear relationship between variables are not present. Also, ANN has less assumptions and restrictions for data compared to other Data Mining Models. In this paper’s context, the ANN, specifically Multilayer Perceptron, was used to predict future customer contribution. Multilayer Perceptron (MLP) is the most famous ANN architecture as it can approximate any function. MLP has three distinct layers: the input layer, the hidden layers, and the output layer. It uses neurons that are capable of processing data using weights and activation functions, send data to the succeeding neuron, and propagating the error of preceding neurons, or back propagation [12]. MLP is an example of supervised training and its most common training algorithm is back propagation [13]. III. METHODOLOGY This research used a quantitative research design to solve the research questions presented. To solve the research problems, statistical techniques was applied, and thus required a quantitative design approach to the problem. The researchers used a correlational research design to determine the factors that affect customer churn and profit contribution. The first sampling design the researchers used was convenience sampling, to find the company that the researchers will conduct its study. The researchers sought the help of a leading direct selling company (Company A) in the Philippines to gather data to be used in the study. The researchers then used stratified random sampling where only the direct sellers transacting in Company A with at least 3 years in the business will be subjected for the study. The computed sample size is 51 participants. The 3 years transaction and demographic data of the 51 randomly chosen direct sellers were then provided to the researchers which totals to 4661 transactions. The researchers used software applications, specifically, Microsoft Excel and SPSS (Statistical Product and Service Solutions) to analyze and interpret data. The following are the research procedure that lead to answering the research questions. A. Data Gathering The researchers collected a 3-year transaction and demographic data of the randomly selected direct sellers. B. Estimation of Service Length through Churn Prediction Possible factors are extracted from the transaction and demographic data collected and were translated into variables. These variables were used as independent variables in the logistic regression analysis. A churn criteria was established to determine the churn status of the customer. The churn criteria was established by the subject company. The churn status was used as the dependent variable in the logistic regression analysis. Let Cs be the churn status of the customer where Cs = 0 indicates a not-churn customer and Cs = 1 indicates a churn customer The factors to be tested were divided into transaction and demographic data. The factors were the independent variable for the test while the Churn Status (Cs) was the dependent variable. The statistical software SPSS was used to aid the analysis. The researchers used the following parameters for the Logistic Regression Analysis in this study: CI for exp(B) is still 95%, Probability for Stepwise Entry = 0.05 and Removal = 0.10, Classification cutoff = 0.5, and maximum iterations = 20. The logistic regression equation model based on the analysis was then created and the model’s accuracy was tested. Then, the probability Pci that the churn status is equal to 1 for each of the selected direct sellers was computed using the formula Pci = eYi / (1 + eYi) (1) Where Yi is the Y score computed using the logistic regression generated. The mean churn probability Pc of all the customers was then computed. The probability of churn in a logistic regression follows a geometric distribution. The expected length L of observations before the first churn customer occurs is expressed by the formula L = 1 / Pc (2) C. Estimation of State Transition Probability From the transaction and demographic data, factors that can affect customer profit contribution prediction are identified. The factors were then translated into variables. The variables for significance in predicting profit contribution using multiple linear regression analysis were tested. The Net Sales was the dependent variable for the equation. The confidence intervals level under Regression Coefficients is 95% and the Probability of F under Stepping Method Criteria was used where Entry = .05 and Removal = .10. Using the significant variables identified, the possible customer states and next states were determined. Based on past transactions, the transition probabilities are computed and then a transition probability matrix P was created. An Identity matrix It that will be used as a matrix multiplier to indicate the current state of the direct seller was also created. D. Prediction of Customer Contribution Multilayer Perceptron Neural Network was then run using Net Sales as the dependent variable and the identified significant factors be the covariates. One hidden layer was used and the activation function used was identity type. 70%, 20%, and 10% was used as training, test, and hold out set for the model, respectively. The researchers used SPSS to perform Multilayer Perceptron Neural Network. The researchers chose to randomly assign cases based on relative numbers of cases for Partition Dataset. The number of hidden layer is one which has a Hyperbolic Tangent activation function. Under Output Layer, activation function is Identity and the rescaling of scale dependent variables is standardized. In this study, the type of training for MLP is online and the optimization algorithm is gradient descent. Under SPSS MLP options, user- missing values is excluded, maximum steps without a decrease in error is 1, the data to use for computing prediction error and maximum training epochs is automatic, the maximum training time is 15 minutes, the minimum relative change in training error is 0.0001, the minimum relative change in training error ratio is 0.001, and the maximum cases to store in memory is 1000. Based on the resulting Network diagram and synaptic weights, the Neural Network model equation Zt was established. The model’s accuracy was also tested. E. CLV Prediction Model The prevailing discount factor d in the market was determined. Using the Service Lifetime (L) computed in equation (2), Initial State Vector (It), Initial State Transition Probability (P), and Neural Network Model Equation (Zt), a model that will compute Customer Lifetime Value was then created. IV. DATA AND RESULTS A. Data Gathering The transaction data used are Dealer Code, Transaction Number, Transaction Month, Transaction Day of the Month, and Transaction Day of the Week, Gross Sales per Transaction, Discount, Returns, and Net Sales per Transaction. The demographic data gathered includes the Position, Age, and Gender of the direct sellers subject for this study. B. Estimation of Service Length through Churn Prediction The subject company consider their dealers as “churn” once they didn’t pay their debts within the allotted due date (within 40 days of the specified due date). Based on the Binomial Logistic Regression Analysis, the Average Expense per Visit and Average Return per Visit are significant in determining the likelihood that a customer will churn or not. The Nagelkerke R2 value or the variation of the dependent variable in the model is 62%. Overall, the full model can correctly predict 94.1% of the cases. Based on the average Pci, the model predicted that the churn probability Pc of a customer is 0.0892. The Estimated Service Length L of the customers is therefore 12 years. This means that the average length of time before a customer stops transaction is 12 years. C. Estimation of State Transition Probability Through Multiple Linear Regression Analysis, the adjusted R² of the regression model is 0.995 with the R² = .995. This implies that the independent variables explain 99.5% of the variation in the dependent variable, Net Sales. Position, Frequency of Visit per Month, Expense per Visit, Discount per Visit, and Return per Visit are statistically significant since their p-value < .05 (alpha level). The factors Position, Frequency of Visit per Month, Expense per Visit, Discount per Visit, and Return per Visit, were subjected to state transition probabilities. Since Position depends on the accumulated gross sales he/she made during 2 consecutive months. Only the variables Average Expense per Visit, Average Discount per Visit, Average Return per Visit, and Frequency of Visit per month were used to define customer states. This means that there are 4 dimensions for each State. To reduce the number of possible states, the continuous variables are further divided into classes. This results to a total of 672 possible customer states. An Initial State Vector It needs to be set to indicate at what state the customer is during time t. The Estimated Service Lifetime of the customers is 12 years which means that there are 672 different It. The customer states identified will then be subjected to Markov Chain analysis to determine the customer state transition probability P. D. Prediction of Customer Contribution SPSS software was also used to aid the analysis of Multilayer Perceptron Neural Network. The significant factors that affect customer profit contribution are the Independent Variables while the Net Sales is the Dependent Variable. A network diagram was generated by the Neural Network analysis. There are 6 sources of input (Bias, Position, Frequency of Visit, Ave. Expense per Visit, Ave. Discount per Visit, and Ave. Return per Visit) which will be activated by the 4 hidden layer nodes through a Hyperbolic tangent activation function along with the assigned synaptic weights. The results of each the 4 hidden nodes are subjected to an identity function and the output (Net Sales) are generated through the sum of these resulting values. Table (1) shows the synaptic weights of each of the nodes from the input layer to the hidden layer and from hidden layer to the output layer. From the network diagram, the formula is generated, shown on equation (3). (3) Where ri is the input of factor i, wi is the synaptic weight from input i to hidden layer, and vj is the synaptic weight from hidden layer j to the output node. The model has a Training Relative Error of 0.151, a Testing Relative Error of 0.014, and a Hold out Relative Error of 0.018. The values are a good indicator of model accuracy. E. CLV Prediction The customer lifetime value is the present worth of all the predicted customer contribution in the future during their service lifetime. The predicted Customer Lifetime Value (CLV) is computed using equation (4) Table 1 Synaptic Weights for Artificial Neural Network (4) Predicted Predictor Hidden Layer 1 H(1:1) H(1:3) (Bias) -.121 -.534 -.690 Position .133 -.043 .024 .085 .037 -.057 .695 .606 .635 1.212 -.246 -.226 -.205 -.132 -.173 Frequency Input of Visit Layer Expense per Visit Discount per Visit Return per Visit Hidden H(1:2) Output Layer Net Sales (Bias) 1.608 H(1:1) .262 H(1:2) 1.423 H(1:3) 1.548 Layer 1 Where It is the Identity matrix used in time t, P is the transition probability matrix, Zt is the predicted customer contribution at time t, and d is the discount rate in the market. V. CONCLUSION With the use of Binomial Logistic Regression, the researchers identified that the Average Expense per Visit and Average Return per Visit are the significant factors that affect customer churn in a direct selling industry. These are the only factors associated with the likelihood of churn. Also, the probability that a customer will churn using Binomial Logistic Regression is 0.0892. The Binomial Logistic Regression Analysis was found to have a good prediction accuracy in classifying churn and not-churn customers which is 94.1%. The model generated is therefore accurate and reliable. Based on the computed churn probability, the expected Service Lifetime L of the Customer is found to be . This means that on average, a customer will churn within 12 years of transaction with the company. With the use of Multiple Linear Regression, the researchers identified that Position, Frequency of Visit per Month, Expense per Visit, Discount per Visit, and Return per Visit are the significant factors that affect customer profit contribution. The Multiple Linear Regression model generated from the analysis was found to have a good fit of data. Based on the significant factors determined, a Neural Network was created to predict customer profit contribution. Neural Networks are used because of its ease of use but an accurate and reliable prediction can be obtained. The resulting Neural Network model, based on a feed forward Multilayer Perceptron Analysis with 1 hidden layer and a hyperbolic tangent hidden layer function and an identity output function. The model has a Training Relative Error of 0.151, a Testing Relative Error of 0.014, and a Hold out Relative Error of 0.018 which indicates a good prediction model. The Identity matrix is a 1 x 672 matrix that denotes the state of a customer at time t. The transition probability matrix is a 672 x 672 square matrix that shows the probability that a customer will shift from one state to another for the next year t. The discount factor is prevailing the interest rate of money. The predictive model can be used by the company to compute for the Customer Lifetime Value of each customers. [3] [4] [5] VI. RECOMMENDATION The computed CLV using the developed model should be used and interpreted correctly by the company. Otherwise, the CLV will not be useful at all. The company can also use the results of the significance test to prevent customers from the likelihood of churn and/or to improve the customer’s profit contribution. The company’s marketing strategy can be based on the computed CLV. The company should also be able to improve the model through the acquisition of more data, use of other techniques, increase of sample size, and a larger time scope. For further studies, the product details such as type of product purchased, quantity of product purchased and its specifications can also be included for additional significant factors to be tested. A deeper look at the factors might also extract data that the Data Mining techniques might find significant in both customer churn status and profit contribution. Determining possible customer states became extensive with the use of Markov Chain analysis as the said method accepts only one dimension variables. Multiple variables are likely to be involved in representing a customer state when this model is applied to other companies making the Markov chain model unmanageable (four, in this research’s case). In order to resolve this problem, System dynamic or Petri net might be used to model a customer’s future purchasing behaviour as proposed by Cheng et.al. Also, the Data Mining techniques used in this study such as Binomial Logistic Regression, Multiple Linear Regression, and Multilayer Perceptron Neural Network can be improved for more accurate results by integrating other data mining techniques such as Support Vector Machine (SVM). According to literatures, this type of data mining is a complex algorithm and time consuming but if used correctly and if given more time it will also give good prediction accuracy. A bigger sample size and a larger time scope can be used as input for the study to increase the accuracy of the results of the model. The study can also include different companies with similar line of business to make the result generalized for the whole direct selling industry under the garments line of products. [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] REFERENCES [1] [2] DSAP. (n.d.). Definition of Direct Selling and MLM vs Pyramiding. DSAP. Retrieved from http://www.dsap.ph/definition-of-direct-sellingand-mlm-vs-pyramiding.html Chen, X. (2006). Customer Lifetime Value: An Integrated Data Mining Approach. Lingnan University. [20] Khajvand, M., & Jafar, M. (2011). Procedia Computer Science Estimating customer future value of different customer segments based on adapted RFM model in retail banking context. Procedia Computer Science, 3, 1327–1332. doi:10.1016/j.procs.2011.01.011 Ekinci, Y., Ülengin, F., Uray, N., & Ülengin, B. (2014). Analysis of customer lifetime value and marketing expenditure decisions through a Markovian-based model. European Journal of Operational Research, 237(1), 278–288. doi:10.1016/j.ejor.2014.01.014 Ekinci, Y., Ülengin, F., Uray, N., & Ülengin, B. (2014). Analysis of customer lifetime value and marketing expenditure decisions through a Markovian-based model. European Journal of Operational Research, 237(1), 278–288. doi:10.1016/j.ejor.2014.01.014 Cheng, C.-J. C.-B., Chiu, S. W. W., & Wu, J.-Y. (2012). Customer lifetime value prediction by a Markov chain based data mining model: Application to an auto repair and maintenance company in Taiwan. Scientia Iranica, 19(3), 849–855. doi:10.1016/j.scient.2011.11.045 Sayad, S. (2010). Data Mining (pp. 1–25). Oracle ® Data Mining. (2008), 1(May). Poll Data Mining Applications in 2008. (2008). KDNuggets. Fadlalla, A. (2005). An experimental investigation of the impact of aggregation on the performance of data mining with logistic regression. Information & Management, 42(5), 695–707. doi:10.1016/j.im.2004.04.005 Dolhansky, B. (2013). Artificial Neural Networks: Linear Regression (Part 1). Retrieved February 04, 2015, from http://briandolhansky.com/blog/artificial-neural-networks-linearregression-part-1 Fragkaki, A. G., Farmaki, E., Thomaidis, N., Tsantili-Kakoulidou, A., Angelis, Y. S., Koupparis, M., & Georgakopoulos, C. (2012). Comparison of multiple linear regression, partial least squares and artificial neural networks for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. Journal of Chromatography. A, 1256, 232–9. doi:10.1016/j.chroma.2012.07.064 Chen, X. (2006). Customer Lifetime Value: An Integrated Data Mining Approach. Lingnan University. Cheng, C.-J. C.-B., Chiu, S. W. W., & Wu, J.-Y. (2012). Customer lifetime value prediction by a Markov chain based data mining model: Application to an auto repair and maintenance company in Taiwan. Scientia Iranica, 19(3), 849–855. doi:10.1016/j.scient.2011.11.045 Dolhansky, B. (2013). Artificial Neural Networks: Linear Regression (Part 1). Retrieved February 04, 2015, from http://briandolhansky.com/blog/artificial-neural-networks-linearregression-part-1 DSAP. (n.d.). Definition of Direct Selling and MLM vs Pyramiding. DSAP. Retrieved from http://www.dsap.ph/definition-of-direct-sellingand-mlm-vs-pyramiding.html Ekinci, Y., Ülengin, F., Uray, N., & Ülengin, B. (2014). Analysis of customer lifetime value and marketing expenditure decisions through a Markovian-based model. European Journal of Operational Research, 237(1), 278–288. doi:10.1016/j.ejor.2014.01.014 Fadlalla, A. (2005). An experimental investigation of the impact of aggregation on the performance of data mining with logistic regression. Information & Management, 42(5), 695–707. doi:10.1016/j.im.2004.04.005 Fragkaki, A. G., Farmaki, E., Thomaidis, N., Tsantili-Kakoulidou, A., Angelis, Y. S., Koupparis, M., & Georgakopoulos, C. (2012). Comparison of multiple linear regression, partial least squares and artificial neural networks for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. Journal of Chromatography. A, 1256, 232–9. doi:10.1016/j.chroma.2012.07.064 García Nieto, P. J., Martínez Torres, J., de Cos Juez, F. J., & Sánchez Lasheras, F. (2012). Using multivariate adaptive regression splines and multilayer perceptron networks to evaluate paper manufactured using Eucalyptus globulus. Applied Mathematics and Computation, 219(2), 755–763. doi:10.1016/j.amc.2012.07.001