<>Pandas Common methods of data analysis

<>1, Read data pd.read()

* Specify index column
*
eg1:
# index: Specify index ,columns: Specify the column name pd.DataFrame(np.arange(12,24).reshape(3,4),index=['a'
,'b','c'],columns=['w','x','y','z'])

*
eg2:
catering_sale = "catering_sale.xls" #
Read data , appoint " date " Column as index column (DataFrame be-all Series Share a column index ) data = pd.read_excel(catering_sale,
index_col=' date ')

*
names : Column name array , Default value None, The specified column name for reading data

eg1:
catering_sale = "catering_sale.xls"
data = pd.read_excel(catering_sale, names=[‘date’,‘sale’])

[ Transfer of external chain image failed , The source station may have anti-theft chain mechanism , It is recommended to save the pictures and upload them directly (img-sMTqDTNL-1593225260019)(D:\ I \MyBlog\Pandas Common methods of data analysis .assets\image-20200624181257685.png)]
## 2, Index by row iloc And loc ```python data2=pd.DataFrame(data1,columns=('a','b','c'))
#columns Defining fields data2 a b c 0 0 2 4 1 6 8 10 2 12 14 16 3 18 20 22 4 24 26 28
<>2.1 iloc

*
Select the specified single row and multiple columns
data2.iloc[2] a 12 b 14 c 16 Name: 2, dtype: int32
*
iloc[start:end], Indexes [ start : end ], Left closed right open
data2.iloc[1:4] a b c 1 6 8 10 2 12 14 16 3 18 20 22
*
Specify the line , Column gets a number
data2.iloc[2,2] #[ That's ok , column ] 16
*
Can't take field directly , Report value error
data2.iloc[2,'c'] #[ That's ok , column ] #ValueError: Location based indexing can only have
[integer, integer slice (START point is INCLUDED, END point is EXCLUDED),
listlike of integers, boolean array] types
*
You can use slices to get the specified rows and columns
data2.iloc[2:3,0:] a b c 2 12 14 16
<>2.2 loc

*
Select the specified row , Multi column , And iloc equally
data2.loc[2] a 12 b 14 c 16 Name: 2, dtype: int32
*
loc[start:end], Indexes [ start : end ], Left closed, right closed –> And iloc Different
data2.loc[1:4] a b c 1 6 8 10 2 12 14 16 3 18 20 22 4 24 26 28
*
It can't be like that iloc The same as the direct access to the specified line , Report type error
data2.loc[2,3] ##[ That's ok , column ] report errors #TypeError: cannot do label indexing on <class
'pandas.core.indexes.base.Index'> with these indexers [2] of <class 'int'>
*
You can go to specific rows and columns , Get a specified number
data2.loc[2,'c'] 16
*
Take designation data3 Specify the line , Specify the column , There are fields
data3=data2.loc[1:4] data4=data3[['a','b']] print(data4) print(type(data4))
# type DataFrame a b 1 6 8 2 12 14 3 18 20 4 24 26 <class
'pandas.core.frame.DataFrame'>
*
take data3 Specify row and column , There are fields , type DataFrame
data5=data3[['a']] print(data5) print(type(data5)) a 1 6 2 12 3 18 4 24 <class
'pandas.core.frame.DataFrame'>
*
take data3 Specify row and column , There are no fields , type Series
data6=data3['a'] print(data6) print(type(data6)) 1 6 2 12 3 18 4 24 Name: a,
dtype: int32 <class 'pandas.core.series.Series'>
<>2.3 summary

1.loc

Index row data by row label
example :loc[n] The index is represented by n That's ok (index It's an integer )
loc[‘d’] The index is represented by ’d’ That's ok (index It's a character )
Row index can have no field value , However, there must be a row index before a field value can be obtained ,
Moreover, the row index can only be obtained in the form of label index , It cannot be taken in the form of slices .
Single slice can be used , Only the index is left closed and right closed .

2.iloc

Get row data through row index , Cannot be a character , Index must be taken in slice form , Can't press label , This is in relation to loc The difference between . The index is left closed and right open .iloc You can also take the specified column , It's just a slice index , You can't do it directly with the tag index .

3. proposal

When using row index , Try to use it iloc To index ; And when you use the tag index loc .

<>3, Matplotlib mapping

<>1. plot Line chart
plt.plot(x,y,S)
*
draw y about x Line chart of

*
String parameter S Specifies the type of drawing when drawn , Style and color

* Commonly used :“b” It's blue ,“r” It's red ,"o" For the circle ,”-“ It is a solid line ,”–“ Is the dotted line
eg1:
import matplotlib.pyplot as plt x = np.linspace(0,2*np.pi,50) y = np.sin(x) plt
.plot(x,y,"b--")

<>2. pie Pie chart
plt.pie(size)
*
size It's a list , Record the area ratio column of each sector ,pie There are rich parameters

eg1:
labels = ['Frogs', 'Hogs', 'Dogs', 'Logs'] # Defining labels sizes = [15, 30, 45, 10] #
Proportion of each piece colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral'] # The color of each piece
explode= [0, 0.1, 0, 0] # highlight , Only the second block is highlighted here plt.pie(sizes, explode=explode,
labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=90) plt
.axis('equal') # Show as circle ( Avoid scaling to ellipse ) plt.show()

<>3.hist Bar chart
plt.hist(x,y)
*
x Is a one-dimensional array of histograms to be drawn

*
y It's a positive number , The uniform distribution is y group , It can also be a list , Each number in the list is the boundary point of grouping ( That is, manually specify the demarcation point ).

eg1:
x = np.random.randn(1000) # 1000 Random numbers with normal distribution plt.hist(x,10) plt.show()

<>4.boxplot Box diagram
D.boxplot() D.plot(kind='box')
*
One way is to call directly DataFrame Of boxplot() method ;

*
The other is to call Series perhaps DataFrame Of plot() method , Combined use kind Parameter specified box diagram (box);

eg1:
x = np.random.randn(1000) # 1000 Random numbers with normal distribution D = pd.DataFrame([x,x+1]).T #
Construct two column DataFrame D.plot(kind='box') # call Series Built in drawing method , use kind Parameter specified box diagram (box) plt.show()
One way is to call directly DataFrame Of boxplot() method ;

*
The other is to call Series perhaps DataFrame Of plot() method , Combined use kind Parameter specified box diagram (box);

eg1:
x = np.random.randn(1000) # 1000 Random numbers with normal distribution D = pd.DataFrame([x,x+1]).T #
Construct two column DataFrame D.plot(kind='box') # call Series Built in drawing method , use kind Parameter specified box diagram (box) plt.show()

Technology
©2019-2020 Toolsou All rights reserved,
C Review of basic language knowledge Go Language learning notes (GUI programming )Java Misunderstanding —— Method overloading is a manifestation of polymorphism ? How to achieve low cost and high stability for cloud native applications ?elementui Shuttle box el-transfer Display list content text too long C/C++ Memory model Element-Ui assembly Message Message prompt , alert Popup C# Making a simplified version of calculator Python In pycharm editor Interface style modification Tiktok refresh progress bar ( Two little balls turn ), The code is simple