Customer Churn

"Churn Rate" is a business term describing the rate at which customers leave or cease paying for a product or service. It's a critical figure in many businesses, as it's often the case that acquiring new customers is a lot more costly than retaining existing ones (in some cases, 5 to 20 times more expensive).

Understanding what keeps customers engaged, therefore, is incredibly valuable, as it is a logical foundation from which to develop retention strategies and roll out operational practices aimed to keep customers from walking out the door. Consequently, there's growing interest among companies to develop better churn-detection techniques, leading many to look to data mining and machine learning for new and creative approaches.

Predicting churn is particularly important for businesses w/ subscription models such as cell phone, cable, or merchant credit card processing plans. But modeling churn has wide reaching applications in many domains. For example, casinos have used predictive models to predict ideal room conditions for keeping patrons at the blackjack table and when to reward unlucky gamblers with front row seats to Celine Dion. Similarly, airlines may offer first class upgrades to complaining customers. The list goes on.

This is a post about modeling customer churn using Python.

One of the motivations for modelling this is to consider what would happen in a telecoms setting.

In [1]:
from __future__ import division
import pandas as pd
import numpy as np

churn_df = pd.read_csv('data/churn.csv', )
col_names = churn_df.columns.tolist()

print("Column names:")
print(col_names)

to_show = col_names[:6] + col_names[-6:]

print("\nSample data:")
churn_df[to_show].head(6)
Column names:
['State', 'Account Length', 'Area Code', 'Phone', "Int'l Plan", 'VMail Plan', 'VMail Message', 'Day Mins', 'Day Calls', 'Day Charge', 'Eve Mins', 'Eve Calls', 'Eve Charge', 'Night Mins', 'Night Calls', 'Night Charge', 'Intl Mins', 'Intl Calls', 'Intl Charge', 'CustServ Calls', 'Churn?']

Sample data:

Out[1]:
State Account Length Area Code Phone Int'l Plan VMail Plan Night Charge Intl Mins Intl Calls Intl Charge CustServ Calls Churn?
0 KS 128 415 382-4657 no yes 11.01 10.0 3 2.70 1 False.
1 OH 107 415 371-7191 no yes 11.45 13.7 3 3.70 1 False.
2 NJ 137 415 358-1921 no no 7.32 12.2 5 3.29 0 False.
3 OH 84 408 375-9999 yes no 8.86 6.6 7 1.78 2 False.
4 OK 75 415 330-6626 yes no 8.41 10.1 3 2.73 3 False.
5 AL 118 510 391-8027 yes no 9.18 6.3 6 1.70 0 False.

Survival Analysis

One interesting way is to use Survival analysis. The basic idea of 'survival analysis' is to estimate using some fancy statistics the 'survival curve'. In the case of telecommunications this is when a subscriber will leave the service. In the case of HR - the question might be 'what characteristics do employees who leave our company have?'. There are many interesting applications of survival analysis

In [2]:
#Firstly let us look at the column headings.
churn_df.columns.tolist
Out[2]:
<bound method Index.tolist of Index(['State', 'Account Length', 'Area Code', 'Phone', 'Int'l Plan', 'VMail Plan', 'VMail Message', 'Day Mins', 'Day Calls', 'Day Charge', 'Eve Mins', 'Eve Calls', 'Eve Charge', 'Night Mins', 'Night Calls', 'Night Charge', 'Intl Mins', 'Intl Calls', 'Intl Charge', 'CustServ Calls', 'Churn?'], dtype='object')>
In [3]:
#We noted above that Churn had string values. The lifelines library that we'll use doesn't like that and prefers 1 and 0.
# So we transform this.
d = {'True.':1, 'False.':0}
churn_df['Churn?'] = churn_df[['Churn?']].applymap(lambda x: d[x])
In [4]:
#Let us look again at the head of the data. 
churn_df.head()
Out[4]:
State Account Length Area Code Phone Int'l Plan VMail Plan VMail Message Day Mins Day Calls Day Charge ... Eve Calls Eve Charge Night Mins Night Calls Night Charge Intl Mins Intl Calls Intl Charge CustServ Calls Churn?
0 KS 128 415 382-4657 no yes 25 265.1 110 45.07 ... 99 16.78 244.7 91 11.01 10.0 3 2.70 1 0
1 OH 107 415 371-7191 no yes 26 161.6 123 27.47 ... 103 16.62 254.4 103 11.45 13.7 3 3.70 1 0
2 NJ 137 415 358-1921 no no 0 243.4 114 41.38 ... 110 10.30 162.6 104 7.32 12.2 5 3.29 0 0
3 OH 84 408 375-9999 yes no 0 299.4 71 50.90 ... 88 5.26 196.9 89 8.86 6.6 7 1.78 2 0
4 OK 75 415 330-6626 yes no 0 166.7 113 28.34 ... 122 12.61 186.9 121 8.41 10.1 3 2.73 3 0

5 rows × 21 columns

We can see that some subscribers in this list have not churned -- censorship!

Before we dig in to the data, I first need to introduce the first mathematical creature in survival analysis: the survival function!

Survival function

Define an subsribers lifetime, defined as the time between when they first purchase the subsription and when they churned, as capital $T$. Let small $t$ represent number of days from when they were first subsribe that is, since there "birth". Then the survival function, $S(t)$, is defined as:

$$ S(t) = P(T > t ) $$ What is the probability that a randomly chosen individual from the population lasts longer than small t?

The survival curve actually gives as a perfect description of the lifespans of a population. But, its never given to us, we need to estimate it using the data at hand.

Kaplan-Meier estimate

IMO, the best way to estimate the survival function is using the Kaplan-Meier estimate. It's nonparametric, which means we don't assume the data follows any particular form:

$$\hat{S(t)} = \prod_{i=0}^t \left(1 - \frac{d_i}{n_i}\right), \;\; \text{for all $t$}$$

where $d_i$ are number of deaths at time $i$, and $n_i$ are the number of individuals in the population who are at risk of dieing. Note that the above formula is for a specific $t$: if we compute this estimate over all $t$, then we get a curve - we'll see this later.

This formula can be derived from the following logic:

$$P( T = 0 ) \approx \frac{d_0}{n_0}$$

$$ \Rightarrow P( T > 0 ) \approx \left(1 - \frac{d_0}{n_0} \right) $$

$$ P( T > 1 ) = P( T > 1 \;|\; T > 0 )P( T > 0 ) \\ \approx \left(1 - \frac{d_1}{n_1}\right)\left(1 - \frac{d_0}{n_0}\right)$$

and so on...

How are censored individuals dealt with in the Kaplan Meier estimate? They are still part of denominator (as they are at risk of dieing), but don't count into the numerator (as technically they don't die.)

Estimating the Survival Curve

  • I am inspired here by Cam Davidson-Pilon and his work on Lifelines. We define in the language of Survival analysis the 'censoring' event to be when 'Churn?' is observed. This can be defined in different ways for different industries. But let us assume we have a well-defined definition of Churn. Which in the case of Telcommunication data is probably 'stopping paying for the service'. In other industries like for example Cloud services - it might be harder, but then you can define Churn as 'to stop using a service for 30 days'.
In [5]:
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
In [6]:
T = churn_df["Account Length"]
C = churn_df["Churn?"]
kmf.fit(T, event_observed=C )
Out[6]:
<lifelines.KaplanMeierFitter: fitted with 3333 observations, 2850 censored>
In [7]:
C
Out[7]:
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    1
11    0
12    0
13    0
14    0
...
3318    0
3319    0
3320    1
3321    0
3322    1
3323    1
3324    0
3325    0
3326    0
3327    0
3328    0
3329    0
3330    0
3331    0
3332    0
Name: Churn?, Length: 3333, dtype: int64
In [8]:
kmf.survival_function_
Out[8]:
KM-estimate
timeline
0 1.000000
1 0.999700
2 0.999399
3 0.999399
4 0.999399
5 0.999399
6 0.999399
7 0.999399
8 0.999399
9 0.999399
10 0.999399
11 0.999399
12 0.999097
13 0.998794
15 0.998794
16 0.998490
17 0.997881
18 0.997881
19 0.997576
20 0.997576
21 0.997270
22 0.997270
23 0.996657
24 0.995736
25 0.995429
26 0.995429
27 0.995121
28 0.994812
29 0.994503
30 0.994503
... ...
184 0.555625
185 0.555625
186 0.555625
188 0.546365
189 0.546365
190 0.546365
191 0.546365
192 0.546365
193 0.533356
194 0.533356
195 0.533356
196 0.533356
197 0.516151
199 0.516151
200 0.516151
201 0.496299
202 0.496299
204 0.496299
205 0.496299
208 0.468727
209 0.441155
210 0.441155
212 0.404392
215 0.404392
217 0.404392
221 0.404392
224 0.336993
225 0.252745
232 0.252745
243 0.252745

213 rows × 1 columns

So we can say something like after 200 days about 50% of our distribution have churned. And after 243 days about 75% have churned. I don't have any experience with the telecoms industry but this would corrobrate my preconceptions.

Plots

The next interesting thing that the survival analysis library can do is to plot a graph. So we'll do that

In [9]:
kmf.survival_function_.plot()
from matplotlib import pyplot as plt
import seaborn as sns
plt.title('Survival function of Telecommunication Customers');
In [10]:
%matplotlib inline
kmf.plot()
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0xaf0d25ac>

Are New Yorkers fickle?

As our final piece of analysis. We can ask the question - 'do customers in different states have different churn behaviour?'. This could be used as part of a customer sales strategy!

In [11]:
ax = plt.subplot(111)

NJ = (churn_df["State"] == "NJ")
kmf.fit(T[NJ], event_observed=C[NJ], label="New Jersey")
kmf.plot(ax=ax, ci_force_lines=True)
KS = (churn_df["State"] == "KS")
kmf.fit(T[KS], event_observed=C[KS], label="Kansas")
NY = (churn_df["State"] == "NY")
kmf.fit(T[NY], event_observed=C[NY], label="New York")
kmf.plot(ax=ax, ci_force_lines=True)

plt.ylim(0,1);
plt.title("Lifespans of customers in different states");
In [12]:
kmf.median_
Out[12]:
168.0

So we see that the median user stays 168 days which is realistic

In [12]:
 

blogroll

social