<>Pandas: deformation

Prepare the environment
import numpy as np import pandas as pd df = pd.read_csv('data/table.csv') df.
<> One , Pivot table


In general state , Data in DataFrame Will be compressed (stacked) State storage , For example, above Gender, Two categories are superimposed in a column ,pivot Function to make a column new cols:
pd.pivot_table(df,index='ID',columns='Gender',values='Height').head() # multi-function %
timeit df.pivot(index='ID',columns='Gender',values='Height') %timeit pd.
pivot_table Common parameters
#aggfunc: Aggregate statistics within the group , Various functions can be passed in , The default is 'mean' pd.pivot_table(df,index='School',columns=
'Gender',values='Height',aggfunc=['mean','sum']).head() # margins: Aggregate marginal status pd.
'sum'],margins=True).head() # That's ok , column , Values can be multi-level pd.pivot_table(df,index=['School',
'Class'],columns=['Gender','Address'], values=['Height','Weight'])

A crosstab is a special pivot table , Typical uses are grouping statistics , For example, if you want to count the frequency of street and gender groups
pd.crosstab(index=df['Address'],columns=df['Gender']) # parameter #
values and aggfunc: Groups aggregate some data , These two parameters must appear in pairs pd.crosstab(index=df['Address'],columns
=df['Gender'],values=np.random.randint(1,20,df.shape[0]),aggfunc='min') #
Except for the marginal parameter margins External , It also introduced normalize parameter , Optional 'all','index','columns' Parameter value pd.crosstab(index=df
<> Two , Other deformation methods


melt Function can be considered as pivot Inverse operation of function , take unstacked Status data , Compressed into stacked, send “ wide ” Of DataFrame change “ narrow “
df_m = df[['ID','Gender','Math']] df_m.head() df.pivot(index='ID',columns=
'Gender',values='Math').head() # melt In function id_vars Represents a column that needs to be retained ,value_vars Indicate need stack A set of columns for
pivoted= df.pivot(index='ID',columns='Gender',values='Math') result = pivoted.
().set_index('ID').sort_index() # Check whether it is consistent with that before deployment df identical , The intermediate steps of these chain methods can be expanded separately , See what happens result

This is the most basic deformation function , There are only two parameters in total :level and dropna¶
df_s = pd.pivot_table(df,index=['Class','ID'],columns='Gender',values=['Height'
,'Weight']) df_s.groupby('Class').head(2) #
stack Function can be seen as placing the horizontal index in the vertical direction , Therefore, the function is similar to that of melt, parameter level You can specify which level of the variable column index is ( Or which floor , Need list ) df_stacked =
df_s.stack(0) df_stacked.groupby('Class').head(2) #
unstack:stack Inverse function of , Similar in function to pivot_table df_stacked.head() result = df_stacked.
unstack().swaplevel(1,0,axis=1).sort_index(axis=1) result.equals(df_s)
# Also in unstack Can be specified in level parameter
<> Three , Dummy variable and factorization

Dummy Variable

Dummy variable
df_d = df[['Class','Gender','Weight']] df_d.head() #
Now you want to convert the first two columns of the table above into dummy variables , And add the third column Weight numerical value pd.get_dummies(df_d[['Class','Gender']]).join
(df_d['Weight']).head() # Optional prefix Parameter add prefix ,prefix_sep Add separator

This method is mainly used for natural number coding , And missing values are recorded -1, among sort The parameter indicates whether to assign a value after sorting
codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'], sort=True) display(
codes) display(uniques)

©2019-2020 Toolsou All rights reserved,
java Four functional interfaces ( a key , simple )os Simple use of module Browser kernel ( understand ) Some East 14 Pay change 16 salary , Sincerity or routine ?HashMap Explain in detail It's unexpected Python Cherry tree (turtle The gorgeous style of Library )html Writing about cherry trees , Writing about cherry trees