← Back

Bank Customer Churn Prediction

Comparative analysis of machine learning models to identify at-risk customers and optimize retention strategies.

TL;DR: LightGBM outperformed Logistic Regression, Random Forest, XGBoost, and Neural Networks with 86.54% accuracy. Threshold tuning boosted recall from 55% to 75% to capture high-value at-risk customers.

Figure 1: Distribution of key numerical variables showing skewness in Age and Salary.

Executive Summary

Customer retention is a critical priority for financial institutions, as churn directly impacts long-term growth and revenue. This project evaluated five supervised machine learning models to predict customer churn using a dataset of 165,000 banking records.

Key Results: LightGBM emerged as the strongest model, achieving the highest Test Accuracy (86.54%) and AUC (0.8896). While the default model prioritized precision, a strategic adjustment to the decision threshold (0.25) significantly improved the identification of churners (Recall increased to 75%), allowing for more effective resource allocation in retention campaigns.

Exploratory Data Analysis (EDA)

The dataset showed a moderate class imbalance, with 21% of customers having churned and 79% remaining. We identified several critical patterns during the analysis:

The "Zero Balance" Anomaly: 54% of customers had a zero balance. Alarmingly, 83% of these zero-balance customers churned, compared to only 73% of non-zero balance customers.
Geographic Disparities: 72% of zero-balance members resided in France, while the majority of non-zero balance members were from Germany.
Age Factor: The age distribution was right-skewed (mean 38), and Age later proved to be a critical driver of churn behavior.

Comparison of distributions between Zero Balance and Non-Zero Balance customers — Figure 3: Distinct behavioral patterns observed between Zero Balance and Non-Zero Balance customers.

Model Performance & Comparisons

We implemented and optimized five distinct models: Logistic Regression, Random Forest, XGBoost, LightGBM, and a Neural Network. All numerical features were standardized, and categorical features were one-hot encoded.

1. Logistic Regression (Baseline)

The baseline model achieved a test accuracy of 85.51%. Feature selection attempts using ANOVA and Lasso did not yield significant improvements, confirming that the baseline configuration was robust. Feature importance analysis highlighted Age and Location (Germany) as top predictors.

2. Random Forest & XGBoost

Random Forest optimization revealed that performance stabilized after 100 trees. The optimized model (400 trees, max depth 10) achieved 86.51% accuracy. XGBoost delivered the highest F1-score (63.54%) despite a lower cross-validation accuracy, demonstrating strong capability in handling class imbalance.

3. LightGBM (Best Performer)

LightGBM was the overall strongest performer. Optimized via 5-fold cross-validation, it achieved the highest AUC (0.8896) and Test Accuracy (86.54%).

4. Neural Network

The Neural Network utilized a sequential architecture with 3 hidden layers (64-32-16 nodes), incorporating batch normalization and dropout to prevent overfitting. It achieved competitive accuracy (86.45%) but slightly lower recall than tree-based methods. Training was monitored with early stopping to ensure optimal convergence.

# Neural Network Architecture used for classification
model = Sequential([
    Input(shape=(X_train.shape[1],)),
    
    Dense(64, activation='relu', kernel_regularizer=l2(0.001)),
    BatchNormalization(),
    
    Dense(32, activation='relu', kernel_regularizer=l2(0.001)),
    BatchNormalization(),
    
    Dense(16, activation='relu', kernel_regularizer=l2(0.001)),
    BatchNormalization(),
    
    Dense(1, activation='sigmoid')
])

Training and Validation loss curves showing convergence. — Figure 7: Training vs. Validation Loss/Accuracy curves demonstrating model stability.

Performance Summary

Model	Test Accuracy	F1-Score	AUC
Logistic Regression	85.51%	59.96%	0.8697
Random Forest	86.51%	62.07%	0.8892
Neural Network	86.45%	62.93%	0.8879
LightGBM	86.54%	63.44%	0.8896
XGBoost	84.28%	63.54%	0.8842

Table 4: Summary of Model Performance.

Feature Importance & Key Drivers

Understanding why customers leave is just as important as predicting who will leave. We analyzed feature importance across Logistic Regression, Random Forest, and LightGBM models, finding consistent patterns in customer behavior.

Key Findings:

Age is Critical: Age consistently emerged as the dominant predictor across all models, suggesting older customers are more likely to churn.
Financial Metrics Matter: Balance and Estimated Salary were top predictors in the Random Forest and LightGBM models.
Product Usage: The number of bank products held was a strong influence in the Logistic Regression and LightGBM models.

LightGBM Feature Importance Chart showing Age, Balance, and Salary as top predictors — Figure 9: Feature Importance from the best-performing LightGBM model. Age is the most significant factor.

Business Insights & Retention Strategy

While LightGBM had high accuracy, its initial recall was only 55.21%, meaning it missed nearly half of the actual churners. From a business perspective, missing a churner is more costly than falsely flagging a loyal customer.

Strategic Threshold Tuning

We lowered the classification threshold to 0.25. This strategic adjustment improved recall from 55% to 75%. While precision dropped, this trade-off allows the bank to capture a significantly larger portion of at-risk customers for intervention.

Recommendations

Target Zero-Balance Accounts: Implement re-engagement offers for zero-balance customers in France, a high-risk segment.
Age-Based Retention: Leverage the model's finding that Age is a key predictor by creating generationally tailored retention programs.
Tiered Intervention: Use the probability scores to offer premium retention packages to high-probability churners and lighter-touch engagement for moderate risks.

LightGBM Precision-Recall Curve showing the trade-off between precision and recall. — Figure 8: LightGBM Precision-Recall Curve. Lowering the threshold prioritizes recall.

Additional Details

Tech Stack

Language: Python
Libraries: Pandas, NumPy, Scikit-Learn
ML Frameworks: XGBoost, LightGBM, TensorFlow/Keras
Visualization: Matplotlib, Seaborn

Credits

Dataset: Bank Churn Dataset (Kaggle).

This was a collaborative team project; I contributed primarily to building the Neural Network model, aggregating and interpreting model performance, and developing the retention strategy recommendations.