Daily Churn Prediction Pipeline

Key Outcomes

  • Designed and deployed a system to reliably produce daily churn predictions on latest customer activity data.
  • Implemented regular model evaluation, comparing predicitons to observed churn events, to provide customer confidence in results.

More Information

Businesses often want to understand customer behaviour, and track usage patterns to help entice, engage, and retain customers. A core aspect of this comes in the form of churn modelling to help predict when customers are about to leave and potentially understand why and what could be done to keep them engaged. However, for a churn model to truely be effective, it needs to be fully integrated into a business’s processes. A business should be able to act on churn predictions as needed, and identify how different interventions have positively or negatively affected the customer’s chances of churning.

At Qrious, to produce a system that would provide value to a client beyond a one-off churn report, I designed and built a churn prediction system that delivered results daily and integrated with the client’s data warehouse. The system was developed in the form of a Kubeflow pipeline, running on Kubernetes. A database query was setup to connect to the client’s data warehouse daily and retrieve the latest usage patterns for all customers on the platform. This data was saved to Amazon S3, in a bucket where policies restricted access such that only the running churn prediction pipeline could access the data. The prediction pipeline processed the data into a format compatible with the churn model that was developed by the team, and then ran inference on it. The results were also saved to Amazon S3, and retrieved via a daily job in the client’s data warehouse. This was important to aid the client in being able to easily integrate the churn predictions into their own businesses processes if they desired to do so.

Given the daily nature of the system, it was also possible to regularly evaluate the model, and compare predicted results to actual churn events. To keep track of the usefulness of the model, and understand when the model needed to be retrained, a model evaluation component was added to the daily pipeline. These results gave us a constant view of how the model was performing and alongside the addition of an alerting system (using Prometheus) that monitored the daily pipeline for failed runs, we were able to have confidence in the system running without intervention; only needing to analyse and debug issues when alerted.