preface :

    I was working on projects with overdue credit , Used xgboost Model , Details are in previous articles .

   
Now I'm working on the project of Telecom fraud , The user data information to be provided in this project determines whether there is fraud information , Similar to projects with overdue credit , It is essentially a problem of two classifications , There are only some differences in the processing of data . Use separately xgboost Model ,lightgbm Model for prediction . Experimental effect display ,lightgbm The effect of the model is better than xgboost Model , Record here lightgbm Model .

experience :

   
  On the premise that the parameters are within the normal range , Model tuning , The results predicted by the model will not be significant . in my opinion , There are roughly two solutions :1. Replacement model , Perhaps the model currently used is not the most suitable model for the data set , Change other types of models , Such as random forest, etc .2. Select better data features for training , Selecting good data features can significantly improve the prediction results . 

      to make a long story short , Good data and good models will get the best prediction results .

1. Data cleaning

      According to data characteristics , Data cleaning of forms , For example, remove null values , Remove duplicate values , Or the missing value is supplemented by the median .
It should be noted that , The data needs to be normalized . After normalization , The prediction results will improve , Better effect .

2. Partition data X,Y

    This is supervised learning ,X Data characteristics , Namely feature,Y by target, That is, whether it is the result of fraud . Fraud as 1, Otherwise 0.

3. Divide training set and test set
# Import package required from sklearn.model_selection import train_test_split # Divide training set and test set
X_train, X_test, y_train, y_test = train_test_split(feature, target,
test_size=0.2)
4 use lightgbm Model for prediction
import lightgbm as lgb lgb_train = lgb.Dataset(X_train, y_train) lgb_eval =
lgb.Dataset(X_test, y_test, reference = lgb_train) #lightgbm Model parameter setting , Adjust according to your own needs
params = { 'task':'train', 'boosting_type':'gbdt', 'objective':'binary',
'metric':{'12','auc','binary_logloss'}, 'num_leaves':40, 'learning_rate':0.05,
'feature_fraction':0.9, 'bagging_fraction':0.8, 'bagging_freq':5, 'verbose':0,
'is_unbalance':True } # Training parameter setting gbm =
lgb.train(params,lgb_train,num_boost_round=1000,valid_sets=lgb_eval,early_stopping_rounds=100)
5 model prediction

      The first 4 Step to get the trained model , You can now enter the same format X, Namely feature, You can use the model to predict . with X_test take as an example .
lgb_pre = gbm.predict(X_test) # You need to enter the same data format as during training in parentheses
6 Result evaluation

    Compare the predicted results with the real results , Evaluate the quality of the model .
from sklearn.metrics import roc_auc_score auc_score = roc_auc_score(y_test,
lgb_pre)
7 Model saving and loading

    Save the trained model , Load the model directly where needed , No need to retrain
# Model saving gbm.save_model('model.txt') # Model loading import lightgbm as lgb gbm =
lgb.Booster(model_file = 'model.txt')

Technology
©2019-2020 Toolsou All rights reserved,
Dynamic Simple registration login interface HTML+CSS+JQCSS Implement overflow display ellipsis 802.11 CCA and NAV mechanism Programmer refused due to low salary offer,HR become shame , Netizens instantly blew up ..abaqus Value of mass scaling factor _ABAQUS Mass scaling for Java Student information management system console version C Classic topics of language —— Insert a number into the sorted array Computer level 1 multi-point , How many points can I pass the computer test level 1 VINS-Fusion run kitti stereo and stereo+GPS data TS stay vue2 Writing in the project