1

I have built a churn prediction model for a e-commerce company data. In the model, churn criterion is to be inactive for 12 months from the last available date in the data. While building the model, I created some calculated features to consider the activity in the prediction. I added last 3 and 6 months activities of the customers as binary. Their correlation with the churn is 0.5 and 0.7 respectively. When I checked the other churn prediction models on the web, I saw similar metrics in some projects and some others do not include such a metric.

My models' accuracy is around 90% and I am concerned that if I am doing it wrong by putting the last 3 and/or 6 month activities of customers as an input to the model. Moreover, should I be worried about the correlation between 3m activity and 6m activity? I used PCA for the feature extraction keeping the 0.95 of the variance but is it enough to avoid the correlation problem?

0 Answers0