Create a new excel form (table1.csv) For case explanation :

one , Guide library
import pandas as pd import numpy as np
two , Read data
df = pd.read_excel('table1.xlsx') # Relative path # df =
pd.read_excel(r'E:\Anaconda\hc\dataScience\table1.csv') # Absolute path
three , Display data

1. Displays the number of rows and columns of data
df.shape (6, 5)
2. Display data format dtpyes
df.dtypes Name object Age int64 Sex int64 Class int64 Score float64 dtype:
object
3. Show column names
df.columns Index(['Name', 'Age', 'Sex', 'Class', 'Score'], dtype='object')
4. Before data display 2 that 's ok
df.head(2)
 NameAgeSexClassScore
0Tom19116206180.0
1Jack20115205190.0
5. After displaying data 3 that 's ok
df.tail(3)
 NameAgeSexClassScore
3Tony18017046290.0
4Tim190162051NaN
5Bob22113202280.0
6. Display data unique value (unique function )
df['Score'].unique() array([ 80. 90. 100. nan])
7. Do not read the row data
# No read of page 2 that 's ok df1 = pd.read_excel('table1.csv',skiprows=[2] ) df1
 NameAgeSexClassScore
0Tom19116206180.0
1Alan181170461100.0
2Tony18017046290.0
3Tim190162051NaN
4Bob22113202280.0
8. Identify missing values
# All missing values are displayed as True df.isnull()
 NameAgeSexClassScore
0FalseFalseFalseFalseFalse
1FalseFalseFalseFalseFalse
2FalseFalseFalseFalseFalse
3FalseFalseFalseFalseFalse
4FalseFalseFalseFalseTrue
5FalseFalseFalseFalseFalse
four , Cleaning data

1. Delete null value (dropna function )
df2 = df.dropna(how='any') df2
 NameAgeSexClassScore
0Tom19116206180.0
1Jack20115205190.0
2Alan181170461100.0
3Tony18017046290.0
5Bob22113202280.0
2. Fill in null values (fillna function )
df3 = df.fillna(value=0) df3
 NameAgeSexClassScore
0Tom19116206180.0
1Jack20115205190.0
2Alan181170461100.0
3Tony18017046290.0
4Tim1901620510.0
5Bob22113202280.0
3. Fill the null with the mean
df4 = df['Score'].fillna(df['Score'].mean()) df4 0 80.0 1 90.0 2 100.0 3 90.0
4 88.0 5 80.0 Name: Score, dtype: float64
4. Change data format
df1['Score'].astype('int64') 0 80 1 90 2 100 3 90 5 80 Name: Score, dtype:
int64
( notes : If a null value exists , Changing the data format will result in an error !)

5. Change column name
df5 = df.rename(columns={'Score': 'score'}) df5
 NameAgeSexClassscore
0Tom19116206180.0
1Jack20115205190.0
2Alan181170461100.0
3Tony18017046290.0
4Tim190162051NaN
5Bob22113202280.0
6. Replace the values in the list (replace function )
df6 = df['Name'].replace('Bob', 'bob') df6 0 Tom 1 Jack 2 Alan 3 Tony 4 Tim 5
bob Name: Name, dtype: object
five , Data preprocessing

1. Sort data
df.sort_values(by=['Score'])
 NameAgeSexClassScore
0Tom19116206180.0
5Bob22113202280.0
1Jack20115205190.0
3Tony18017046290.0
2Alan181170461100.0
4Tim190162051NaN
( notes : Default ascending order , And the null value follows )

2. Data grouping

① Single conditional grouping
# If Score The value of the column >=85,Score Column display high, Otherwise display low # group Add column df['group'] =
np.where(df['Score'] > 85,'high','low') df
 NameAgeSexClassScoregroup
0Tom19116206180.0low
1Jack20115205190.0high
2Alan181170461100.0high
3Tony18017046290.0high
4Tim190162051NaNlow
5Bob22113202280.0low
② Multiple conditional grouping
# utilize loc function , Multi column query # sign Add columns for df.loc[(df['Sex'] == 1) & (df['Age']>= 19),
'sign']=1 df
 NameAgeSexClassScoregroupsign
0Tom19116206180.0low1.0
1Jack20115205190.0high1.0
2Alan181170461100.0highNaN
3Tony18017046290.0highNaN
4Tim190162051NaNlowNaN
5Bob22113202280.0low1.0
six , Data extraction

1. Extract by label (loc function )
df.loc[0:3]
 NameAgeSexClassScoregroupsign
0Tom19116206180.0low1.0
1Jack20115205190.0high1.0
2Alan181170461100.0highNaN
3Tony18017046290.0highNaN
2. Extract by location (iloc function )

① Extract by Region
df.iloc[:4, :5]
 NameAgeSexClassScore
0Tom19116206180.0
1Jack20115205190.0
2Alan181170461100.0
3Tony18017046290.0
② Extract by location
#[0, 2, 5] Represents the specified row ,[0, 1, 5] Represents the specified column df.iloc[[0, 2, 5],[0, 1, 5]]
 NameAgegroup
0Tom19low
2Alan18high
5Bob22low
3. Extract by condition (isin And loc function )

① use isin Function to judge
# judge Sex Is it 1 df['Sex'].isin([1]) 0 True 1 True 2 True 3 False 4 False 5 True
Name: Sex, dtype: bool  
 NameAgeClass
0Tom19162061
1Jack20152051
2Alan18170461
5Bob22132022
② use loc Function to judge
# Sex by 1, Score greater than 85 df1.loc[(df1['Sex'] == 1) & (df1['Score'] > '85'),
['Name','Age','Class']]
③ Judge the result first , Set the result as True Extraction of
# Judge first Score Whether the column contains 80 and 90, Then extract the data of composite conditions . df.loc[df['Score'].isin(['80','90'])]
 NameAgeSexClassScoregroupsign
0Tom19116206180.0low1.0
1Jack20115205190.0high1.0
3Tony18017046290.0highNaN
5Bob22113202280.0low1.0
 

 

 
 

Technology
©2019-2020 Toolsou All rights reserved,
Android Using wechat in H5 Payment result refresh during payment shock !!C++ Can make a sound ! Basic operation of single linked list C Language explanation Java Implement an epidemic number management system 2021 year 11 World programming language ranking linux upper mysql Invalid command _linux lower mysql The command is useless Java project : Campus dormitory management system (java+jsp+javaweb+mysql+ajax) Wechat applet development project directory linux ubuntu Which version ,Ubuntu Which version is the best ?python Code painting cherry blossoms - How to use Python Draw a beautiful cherry blossom