<>Pandas Common methods of data analysis

<>1, Read data pd.read()

* Specify index column
*
eg1:
# index: Specify index ,columns: Specify the column name pd.DataFrame(np.arange(12,24).reshape(3,4),index=['a'
,'b','c'],columns=['w','x','y','z'])

*
eg2:
catering_sale = "catering_sale.xls" #
Read data , appoint " date " Column as index column (DataFrame be-all Series Share a column index ) data = pd.read_excel(catering_sale,
index_col=' date ')

*
names : Column name array , Default value None, The specified column name for reading data

eg1:
catering_sale = "catering_sale.xls"
data = pd.read_excel(catering_sale, names=[‘date’,‘sale’])

[ Transfer of external chain image failed , The source station may have anti-theft chain mechanism , It is recommended to save the pictures and upload them directly (img-sMTqDTNL-1593225260019)(D:\ I \MyBlog\Pandas Common methods of data analysis .assets\image-20200624181257685.png)]
## 2, Index by row iloc And loc ```python data2=pd.DataFrame(data1,columns=('a','b','c'))
#columns Defining fields data2 a b c 0 0 2 4 1 6 8 10 2 12 14 16 3 18 20 22 4 24 26 28
<>2.1 iloc

*
Select the specified single row and multiple columns
data2.iloc[2] a 12 b 14 c 16 Name: 2, dtype: int32
*
iloc[start:end], Indexes [ start : end ], Left closed right open
data2.iloc[1:4] a b c 1 6 8 10 2 12 14 16 3 18 20 22
*
Specify the line , Column gets a number
data2.iloc[2,2] #[ That's ok , column ] 16
*
Can't take field directly , Report value error
data2.iloc[2,'c'] #[ That's ok , column ] #ValueError: Location based indexing can only have
[integer, integer slice (START point is INCLUDED, END point is EXCLUDED),
listlike of integers, boolean array] types
*
You can use slices to get the specified rows and columns
data2.iloc[2:3,0:] a b c 2 12 14 16
<>2.2 loc

*
Select the specified row , Multi column , And iloc equally
data2.loc[2] a 12 b 14 c 16 Name: 2, dtype: int32
*
loc[start:end], Indexes [ start : end ], Left closed, right closed –> And iloc Different
data2.loc[1:4] a b c 1 6 8 10 2 12 14 16 3 18 20 22 4 24 26 28
*
It can't be like that iloc The same as the direct access to the specified line , Report type error
data2.loc[2,3] ##[ That's ok , column ] report errors #TypeError: cannot do label indexing on <class
'pandas.core.indexes.base.Index'> with these indexers [2] of <class 'int'>
*
You can go to specific rows and columns , Get a specified number
data2.loc[2,'c'] 16
*
Take designation data3 Specify the line , Specify the column , There are fields
data3=data2.loc[1:4] data4=data3[['a','b']] print(data4) print(type(data4))
# type DataFrame a b 1 6 8 2 12 14 3 18 20 4 24 26 <class
'pandas.core.frame.DataFrame'>
*
take data3 Specify row and column , There are fields , type DataFrame
data5=data3[['a']] print(data5) print(type(data5)) a 1 6 2 12 3 18 4 24 <class
'pandas.core.frame.DataFrame'>
*
take data3 Specify row and column , There are no fields , type Series
data6=data3['a'] print(data6) print(type(data6)) 1 6 2 12 3 18 4 24 Name: a,
dtype: int32 <class 'pandas.core.series.Series'>
<>2.3 summary

1.loc

Index row data by row label
example :loc[n] The index is represented by n That's ok (index It's an integer )
loc[‘d’] The index is represented by ’d’ That's ok (index It's a character )
Row index can have no field value , However, there must be a row index before a field value can be obtained ,
Moreover, the row index can only be obtained in the form of label index , It cannot be taken in the form of slices .
Single slice can be used , Only the index is left closed and right closed .

2.iloc

Get row data through row index , Cannot be a character , Index must be taken in slice form , Can't press label , This is in relation to loc The difference between . The index is left closed and right open .iloc You can also take the specified column , It's just a slice index , You can't do it directly with the tag index .

3. proposal

When using row index , Try to use it iloc To index ; And when you use the tag index loc .

<>3, Matplotlib mapping

<>1. plot Line chart
plt.plot(x,y,S)
*
draw y about x Line chart of

*
String parameter S Specifies the type of drawing when drawn , Style and color

* Commonly used :“b” It's blue ,“r” It's red ,"o" For the circle ,”-“ It is a solid line ,”–“ Is the dotted line
eg1:
import matplotlib.pyplot as plt x = np.linspace(0,2*np.pi,50) y = np.sin(x) plt
.plot(x,y,"b--")

<>2. pie Pie chart
plt.pie(size)
*
size It's a list , Record the area ratio column of each sector ,pie There are rich parameters

eg1:
labels = ['Frogs', 'Hogs', 'Dogs', 'Logs'] # Defining labels sizes = [15, 30, 45, 10] #
Proportion of each piece colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral'] # The color of each piece
explode= [0, 0.1, 0, 0] # highlight , Only the second block is highlighted here plt.pie(sizes, explode=explode,
labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=90) plt
.axis('equal') # Show as circle ( Avoid scaling to ellipse ) plt.show()

<>3.hist Bar chart
plt.hist(x,y)
*
x Is a one-dimensional array of histograms to be drawn

*
y It's a positive number , The uniform distribution is y group , It can also be a list , Each number in the list is the boundary point of grouping ( That is, manually specify the demarcation point ).

eg1:
x = np.random.randn(1000) # 1000 Random numbers with normal distribution plt.hist(x,10) plt.show()

<>4.boxplot Box diagram
D.boxplot() D.plot(kind='box')
*
One way is to call directly DataFrame Of boxplot() method ;

*
The other is to call Series perhaps DataFrame Of plot() method , Combined use kind Parameter specified box diagram (box);

eg1:
x = np.random.randn(1000) # 1000 Random numbers with normal distribution D = pd.DataFrame([x,x+1]).T #
Construct two column DataFrame D.plot(kind='box') # call Series Built in drawing method , use kind Parameter specified box diagram (box) plt.show()
One way is to call directly DataFrame Of boxplot() method ;

*
The other is to call Series perhaps DataFrame Of plot() method , Combined use kind Parameter specified box diagram (box);

eg1:
x = np.random.randn(1000) # 1000 Random numbers with normal distribution D = pd.DataFrame([x,x+1]).T #
Construct two column DataFrame D.plot(kind='box') # call Series Built in drawing method , use kind Parameter specified box diagram (box) plt.show()

Technology
©2019-2020 Toolsou All rights reserved,
SQL Server Database Glossary CSS Animation effect dedecms Website is hacked How to solve hijacking to other websites Count the number of letters (java Language implementation )Java Basics ( Three ) String In depth analysis The difference between static method and non static method And storage location Django Personal blog building tutorial --- Time classified archiving Keras Save and load model (JSON+HDF5)hive Summary of processing methods for a large number of small files Website mobile phone number capture method