There are three main methods of model evaluation ： Set aside method , cross validation , Self help law
. These three methods need to divide the data set into test set and training set , Using the “ Test error ” To approximate the generalization error of the model . Of course, test set and training set should be mutually exclusive as far as possible , Only in this way can the model get better generalization performance .

1. Set aside method

Set aside method is to set aside the data set D Two sets directly divided into mutually exclusive sets , As training set S And test set T. Using test set T To estimate the generalization error of the model .

In general, the ratio of data division is ：2/3~4/5 A proportional sample is used for training , The remaining samples are used for testing .

The consistency of data distribution should be kept as much as possible during the sample division , Keep at least the similarity of the class proportions , Avoid introducing bias . A sample that retains the class scale is often referred to as “ Stratified sampling ”（stratified sampling）.

Given the proportion of sample partition , data set D There can be many ways to divide , In order to make the estimation result more stable and reliable , Repeated tests shall be carried out , Take the average value of multiple evaluation results .

It can be used sklearn Medium StratifiedShuffleSplit Function , Of course, I can also write a simple one .

#!/usr/bin/env python # -*- coding: utf-8 -*- from sklearn.model_selection
import StratifiedShuffleSplit import numpy as np X = np.array([[0, 1], [0, 2],
[0, 3], [1, 8], [1,9], [1,10]]) y = np.array([0, 0, 0, 1, 1, 1])
# The proportion of the test set is 0.333, Stratified sampling was used . And randomly 5 second sss = StratifiedShuffleSplit(n_splits=5,
test_size=0.333) for train_index, test_index in sss.split(X, y):
print("TRAIN:", train_index, "TEST:", test_index) X_train, X_test =
X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index]

result ：

2. cross validation

The cross validation method is to combine the data sets D Divided into k Mutually exclusive subsets with approximate share size , At the same time, the data distribution should be consistent as much as possible （ Stratified sampling ）.

Every time k-1 Subsets for training , Then use the remaining one as the test set , So it can be done k Training and testing , Final return k Average of test results . This cross validation method is called “k Fold cross validation ”（k-fold
cross validation）. Commonly used are 10 Fold cross validation method , There is a special way to stay （Leave- One-Out cross validation, LOOCV）.

（1）10 Fold cross validation

Divide the dataset into 10 share , Take it every time 9 For training ,1 Copies for testing , Finally, the average value of ten test results is obtained （ Outer cross validation ）.

Every time 9 When training with data , If the model needs to be adjusted , Cross validation can be further carried out in the training process （ Inner layer ）. such as ： Every time 9 Share 8 Data for training , The remaining one is for verification , use 9 To evaluate the current parameters . Here is a 3 Columns of fold cross validation , among C Is the parameter to be optimized .

It can be used sklearn Medium StratifiedKFold Function to partition the dataset , realization 10 Fold cross validation . Here is the SVM Classifier as an example ：

#!/usr/bin/env python # -*- coding: utf-8 -*- import loadData import copy
import numpy as np from sklearn import svm from sklearn.model_selection import
StratifiedKFold from sklearn.metrics import accuracy_score feature ,label =
loadData.loadData() # Self written function , Load data smps = len(label) # Number of samples # Sample index for each discount
foldsList = [] ss = StratifiedKFold(n_splits=10, shuffle = True) for
train_index, test_index in ss.split(feature, label): print("TEST:",
test_index)# Get index value foldsList.append(test_index) test_accur_list = [] for i in
xrange(10):# Outer circulation ,9 Training , A test train_index = list(set(range(0, smps)) -
set(foldsList[i])) test_index = foldsList[i] train_fea, test_fea =
feature[train_index], feature[test_index] train_lab, test_lab =
label[train_index], label[test_index] foldLi = copy.deepcopy(foldsList)
# Delete the test copy del foldLi[i] # take foldLi It's inside list merge foldL = [x for p in foldLi for x in
p] print 'for %s time gridSearch process:' % i c_can = np.logspace(-15, 15, 10,
base = 2)# assume SVM Parameters in C Value of n_search = len(c_can) bestC = 0
# For this time 9 Training data , Through the following inner layer cross validation to determine the best C value bestAccur = 0 # The best test precision of inner cross validation , Namely C take bestC Time for j in
xrange(n_search):# For parameters C Test one by one Accur = 0 for n in xrange(9):#C The value is c_can[j] Inner cross validation based on
train_i = list(set(foldL) - set(foldLi[n])) test_i= foldLi[n] train_f, test_f =
feature[train_i], feature[test_i]# Values corresponding to training set train_l, test_l = label[train_i],
label[test_i]# The value corresponding to the category set clf = svm.SVC(C = c_can[j], kernel='linear')
clf.fit(train_f, train_l) y_hat = clf.predict(test_f) Accur +=
accuracy_score(test_l, y_hat) / 9 print ' Accur:%s' % Accur if Accur >
bestAccur:# According to the results of inner cross validation , lookup bestC bestAccur = copy.deepcopy(Accur) bestC =
copy.deepcopy(c_can[j]) print ' Best validation accuracy on current dataset
split:', bestAccur print ' Best para C:', bestC # find bestC after , And then it's the outer layer ： train , test . clf
= svm.SVC(C = bestC, kernel='linear') clf.fit(train_fea, train_lab) y_hat =
clf.predict(test_fea) test_accur_list.append(accuracy_score(test_lab, y_hat))
print ' test accur:', test_accur_list[i] print '\n' # Finally, the result of ten fold cross validation is obtained print
'average test accur:', sum(test_accur_list) / len(test_accur_list)

The above implementation is one time 10 Fold cross validation , In order to avoid the introduction of additional error due to different sample partition , The above process should be repeated , The mean value of the test results was taken , as 10 second 10 Fold cross validation .

（2） Keep one (LOOCV）

Leave one method is a special case in cross validation . If data set D contain m Samples , One way to stay is to k Take as m What happened . The left one method has only one result when dividing samples , That is, each subset contains a sample , Therefore, the results of the left one method are not affected by the random division of samples . in addition , The training set and original data set used by the left one method D Only one sample is missing , So the left one model is very similar to the expectation model , Assessment results are often considered to be more accurate . But in the dataset D Larger , The left one method also has the disadvantage of high computational complexity . In the process of practice , Take one sample at a time as the test set , The rest is the training set （ Outer layer ）. If the model needs to be adjusted , It can be used during training 10 The parameters were determined by cross validation （ Inner layer ）. It's still here SVM Classifier as an example ：

#!/usr/bin/env python # -*- coding: utf-8 -*- import loadData import copy
import numpy as np from sklearn import svm from sklearn.model_selection import
StratifiedKFold from sklearn.metrics import accuracy_score feature ,label =
loadData.loadData() # Self written function , Load data smps = len(label) # Number of samples test_accur_list = []
for i in xrange(smps):# Outer circulation ,LOOCV: One sample test , Remaining training train_index = list(set(range(0,
smps)) - set([i])) test_index = [i] train_fea, test_fea = feature[train_index],
feature[test_index] train_lab, test_lab = label[train_index], label[test_index]
foldLi = [] ss = StratifiedKFold(n_splits=10, shuffle = True) for train_i,
test_i in ss.split(train_fea, train_lab): print("TEST:", test_i)# Get index value
foldLi.append(test_i) # take foldLi It's inside list merge foldL = [x for p in foldLi for x in p]
print 'for %s time gridSearch process:' % i c_can = np.logspace(-15, 15, 10,
base = 2)# assume SVM Parameters in C Value of n_search = len(c_can) bestC = 0
# For this time 9 Training data , Through the following inner layer cross validation to determine the best C value bestAccur = 0 # The best test precision of inner cross validation , Namely C take bestC Time for j in
xrange(n_search):# For parameters C Test one by one Accur = 0 for n in
xrange(10):#C The value is c_can[j] Inner cross validation based on train_i = list(set(foldL) - set(foldLi[n]))
test_i= foldLi[n] train_f, test_f = feature[train_i], feature[test_i]# Values corresponding to training set
train_l, test_l = label[train_i], label[test_i]# The value corresponding to the category set clf = svm.SVC(C =
c_can[j], kernel='linear') clf.fit(train_f, train_l) y_hat =
clf.predict(test_f) Accur += accuracy_score(test_l, y_hat) / 10 print '
Accur:%s' % Accur if Accur > bestAccur:# According to the results of inner cross validation , lookup bestC bestAccur =
copy.deepcopy(Accur) bestC = copy.deepcopy(c_can[j]) print ' Best validation
accuracy on current dataset split:', bestAccur print ' Best para C:', bestC
# find bestC after , And then it's the outer layer ： train , test . clf = svm.SVC(C = bestC, kernel='linear')
clf.fit(train_fea, train_lab) y_hat = clf.predict(test_fea)
test_accur_list.append(accuracy_score(test_lab, y_hat)) print ' test accur:',
test_accur_list[i] print '\n' #LOOCV The results of print 'average test accur:',
sum(test_accur_list) / len(test_accur_list)
3. Self help law

one side , The left one method has the disadvantage of high computational complexity ; on the other hand , If more test sets are reserved , However, there is a certain deviation between the training model and the expectation model . Self help method is a better solution . Hypothetical data set D Yes m Samples , Every time from D A sample is randomly selected , implement m second , Get contained m Data set of samples D'. obviously ,D Some samples in the D' Many times , The other part will not . A sample is here m The probability of not being collected in sub sampling is about 0.368, So the initial dataset D About 36.8% The sample of does not appear in the D' in . such , Both the actual evaluation model and the expected evaluation model will be used m Training samples , And we do 1/3 The samples can be used for training .

Self help method is often used for small data sets , It is difficult to divide training effectively / Test set time . Since the self-help method can generate different training sets from the original data set each time , These training sets are similar, but not completely different , So it can be used in ensemble learning algorithm . But bootstrap sampling will change the distribution of the initial data set , Introducing estimation bias , So when there's enough data , Try to use the set aside method and cross validation method .

Sample code ：

#!/usr/bin/env python # -*- coding: utf-8 -*- import loadData import copy
import numpy as np from sklearn import svm from sklearn.model_selection import
StratifiedKFold from sklearn.metrics import accuracy_score feature ,label =
# Using self-service sampling method , Repeat execution 10 second for i in xrange(10): train_index = [] # Self help sampling , Random sampling with return samps Samples
for i in xrange(int(1.0 * samps)): train_index.append(random.randint(0,samps -
1)) test_index = list(set(list(range(0, samps))) - set(train_index))
print("TRAIN:", train_index, "TEST:", test_index)# Get index value train_fea, test_fea =
feature[train_index], feature[test_index]# Values corresponding to training set train_lab, test_lab =
label[train_index], label[test_index]# The value corresponding to the category set ''' ''' Cross validation using training samples , Get parameters :C =
bestC ''' ''' clf = svm.SVC(C = bestC, kernel='linear') clf.fit(train_fea,
train_lab) y_hat = clf.predict(test_fea) accur = accuracy_score(test_lab,
y_hat) print i, ' time run testAccuracy:', accur testAccur_list.append(accur)
print '10 times run testAccur_list:', testAccur_list print 'average testAccur
value:', sum(testAccur_list) / 10

reference ：

1.《 machine learning 》, Zhou Zhihua .

2.Statnikov A, Aliferis C F, Tsamardinos I, et al. A comprehensive evaluation
of multicategory classification methods for microarray gene expression cancer
diagnosis[J]. Bioinformatics, 2004, 21(5): 631-643.

Technology
Daily Recommendation