Recently, I am working on projects with overdue credit , Used xgboost Model , Roughly record the process here , The data details will not be expanded . User data information required for credit overdue projects to predict whether it is overdue , It is essentially a problem of two classifications , Use here xgboost Model for prediction .
1. Data cleaning
According to data characteristics , Data cleaning of forms , For example, remove null values , Remove duplicate values , Or the missing value is supplemented by the median .
2. Partition data X,Y
This is supervised learning ,X Data characteristics , Namely feature,Y by target, That is, the result of whether it is overdue .
3. Divide training set and test set
# Import package required from sklearn.model_selection import train_test_split # Divide training set and test set
X_train, X_test, y_train, y_test = train_test_split(feature, target,
4 use xgboost Model for prediction
import xgboost as xgb xgb_model = xgb.XGBClassifier(learning_rate=0.001,
n_estimators=1000, max_depth=6) #XGBClassifier() Can be set in parentheses xgboost Parameters of the model , You can set it yourself as needed .
xgb_model.fit(X_train, y_train) #fit() Set training parameters in parentheses , You can set it yourself as needed
5 model prediction
The first 4 Step to get the trained model , You can now enter the same format X, Namely feature, You can use the model to predict . with X_test take as an example .
xgb_pre = xgb_model.predict(X_test) # You need to enter the same data format as during training in parentheses
6 Result evaluation
Compare the predicted results with the real results , Evaluate the quality of the model .
from sklearn.metrics import roc_auc_score auc_score = roc_auc_score(y_test,