preface :

 
  Recently, I am working on projects with overdue credit , Used xgboost Model , Roughly record the process here , The data details will not be expanded . User data information required for credit overdue projects to predict whether it is overdue , It is essentially a problem of two classifications , Use here xgboost Model for prediction .

1. Data cleaning

      According to data characteristics , Data cleaning of forms , For example, remove null values , Remove duplicate values , Or the missing value is supplemented by the median .

2. Partition data X,Y

    This is supervised learning ,X Data characteristics , Namely feature,Y by target, That is, the result of whether it is overdue .

3. Divide training set and test set
# Import package required from sklearn.model_selection import train_test_split # Divide training set and test set
X_train, X_test, y_train, y_test = train_test_split(feature, target,
test_size=0.2)
4 use xgboost Model for prediction
import xgboost as xgb xgb_model = xgb.XGBClassifier(learning_rate=0.001,
n_estimators=1000, max_depth=6) #XGBClassifier() Can be set in parentheses xgboost Parameters of the model , You can set it yourself as needed .
xgb_model.fit(X_train, y_train) #fit() Set training parameters in parentheses , You can set it yourself as needed
5 model prediction

      The first 4 Step to get the trained model , You can now enter the same format X, Namely feature, You can use the model to predict . with X_test take as an example .
xgb_pre = xgb_model.predict(X_test) # You need to enter the same data format as during training in parentheses
6 Result evaluation

    Compare the predicted results with the real results , Evaluate the quality of the model .
from sklearn.metrics import roc_auc_score auc_score = roc_auc_score(y_test,
xgb_pre)

Technology
©2019-2020 Toolsou All rights reserved,
C++ of string of compare usage Python Study notes ( one )evo Tool usage problems ——Degenerate covariance rank, Umeyama alignment is not possibleRISC-V_GD32VF103-TIMER0 timer interrupt java Array subscript variable _Java Basic grammar : array be based on stm32 Control four-wheel trolley motor drive ( one ) be based on redis Design of liking function Software engineering career planning mysql Query random data by conditions _MySQL Random query of several qualified records centos7 install RabbitMq