After half a day and another night ..
I learned it for a short time TensorFlow, But it didn't work at all in today's practice match ... The first stage has been finished a long time ago , Today is the second stage , What are they all about
Skewness again , It's the coefficient again , It's fitting again KNN, It's really disgusting , Basically, they are learning and using now , I looked up the information for half a day , I've done it a million times csdn There's a clue
Eight questions on structured data （ The third stage can be started only by making eight ways ）
Four of them are masked ... Because these four questions are basically output 0 or 1, Although there are times of submission , But still tenacious try out ...
Make four 1,2,3,5
Finding the sum of skewness boxcox1p Change has a fixed function , Apply it directly
Second question ： Calculate the skewness of body weight
import scipy.stats as st import pandas as pd pd.options.display.max_columns =
None pd.options.display.max_rows = None path1 =
"/home/kesci/input/liver_df9751/ Structured data training camp .csv" # chipotle.tsv df = pd.read_csv(
path1) df.head(30) aveTime = df['Weight\n weight '].median() chipo['Weight\n weight '].
nunique() df2 = df.fillna(aveTime) col = df2.iloc[:, 3] arrs = col.values ##
print(arrs) w=st.skew(arrs) # Calculation of skewness ## 0.7565543738808015 print('%.4f'%w)
boxcox1p Transformation ：boxcox1p（）
use boxcox1p Change your weight ,lambda=0.1, What is the skewness of the changed data ?
import scipy.stats as st import pandas as pd from scipy.special import
boxcox1p pd.options.display.max_columns = None pd.options.display.max_rows =
None path1= "/home/kesci/input/liver_df9751/ Structured data training camp .csv" # chipotle.tsv df = pd
.read_csv(path1) aveTime = df['Weight\n weight '].median() wt = df['Weight\n weight '].
fillna(aveTime) lam=0.1 wt = boxcox1p(wt, lam) w=st.skew(wt.values) # Calculation of skewness ##
I really want to talk about the fifth question , It's amazing , Online questions ：
Use the same data as above KNN（K=5）, How many of the classification results are inconsistent with the real results ?
What do you mean by the same data ： Only age was selected , Weight and ALF（ Missing values were filled with median , No extra treatment ）
I went through the data over and over again , I've seen countless KNN My article , Important to find in an article KNN Usage of ,KNN There are also functions KNeighborsClassifier(n_neighbors=5), Inside n_neighbors That is, in the title K
We get the classification results , But it's not enough , What is the real result ? We didn't get it , Because we train all the data as a training set KNN, namely KNN Data is needed to generate , Then you need data to test , But the title gives only one set of data , Later I wondered if it was self generated KNN To test yourself ? That is to use the classification results generated by the training set to test their own classification results
So my classmates and I output the classification results and the results of the original data itself , Inequality found , The difference is the answer
import scipy.stats as st import pandas as pd import regex as re pd.options.
display.max_columns = None pd.options.display.max_rows = None path1 =
"/home/kesci/input/liver_df9751/ Structured data training camp .csv" # chipotle.tsv #
path2="/home/kesci/inputver_df9751/ Structured data training camp test suite .csv" data = pd.read_csv(path1) #
data_test=pd.read_csv(path1) col_names=list(data.columns) col= for i in range(
len(col_names)): if re.findall(r"\u2028(.+)",col_names[i])!=: col.append(re.
findall(r"\u2028(.+)",col_names[i])) elif re.findall(r"\n(.+)",col_names[i])
!=: col.append(re.findall(r"\n(.+)",col_names[i])) else: col.append(
col_names[i]) ## modify dataframe Listing data.columns=col feature1 = [' weight ',' Age ','ALF'] for
i in feature1: ave=data[i].median() data[i] = data[i].fillna(ave) print(data[i]
.values) a_zi= for i in range(len(data)): c=[data[' weight '][i],data[' Age '][i]] a_zi
.append(c) from sklearn.neighbors import KNeighborsClassifier neigh =
KNeighborsClassifier(n_neighbors=5) neigh.fit(a_zi, data['ALF']) cnt=0 for i in
range(len(a_zi)): if(neigh.predict([a_zi[i]])==data['ALF'][i]): cnt+=1 print(len
It's really amazing ...
We're going to continue tomorrow