Pandas It's famous Python Data analysis package , This makes it easier to read and convert data . stay Pandas Data deformation in means converting tables or vectors ( Namely DataFrame or Series) The structure of , Make it more suitable for other analysis . In this paper , Some of the most common ones will be illustrated Pandas Remodel function .

 

One ,Pivot

        pivot Function is used to create a new derived table from a given table ,pivot There are three parameters : Indexes , Columns and values . The details are as follows :
def pivot_simple(index, columns, values): """ Produce 'pivot' table based on 3
columns of this DataFrame. Uses unique values from index / columns and fills
with values. Parameters ---------- index : ndarray Labels to use to make new
frame's index columns : ndarray Labels to use to make new frame's columns
values : ndarray Values to use for populating new frame's values
     
  As the values of these parameters, you need to specify the corresponding column names in the original table . then ,pivot Function creates a new table , Its row and column indexes are the unique values of the corresponding parameters . Let's take a look at the following example :

        Suppose we have the following data :

                                        

        We read the data in :
from collections import OrderedDict from pandas import DataFrame import pandas
as pd import numpy as np data = OrderedDict(( ("item", ['Item1', 'Item1',
'Item2', 'Item2']), ('color', ['red', 'blue', 'red', 'black']), ('user', ['1',
'2', '3', '4']), ('bm', ['1', '2', '3', '4']) )) data = DataFrame(data)
print(data)
        The results are as follows :
item color user bm 0 Item1 red 1 1 1 Item1 blue 2 2 2 Item2 red 3 3 3 Item2
black 4 4
        next , We deform the above data :
df = data.pivot(index='item', columns='color', values='user') print(df)
        The results are as follows :
color black blue red item Item1 None 2 1 Item2 4 None 3
        be careful : The following methods can be used to query the original data and the transformed data :
# Original data set print(data[(data.item=='Item1') & (data.color=='red')].user.values) #
Transformed data set print(df[df.index=='Item1'].red.values)
        The result is :
['1'] ['1']
     
  In the example above , The converted data does not contain bm Information for , It only contains us in the pivot The information for the specified column in the . Let's extend the above example , Make it contain user Information also contains bm information .
df2 = data.pivot(index='item', columns='color') print(df2)
        The result is :
user bm color black blue red black blue red item Item1 None 2 1 None 2 1
Item2 4 None 3 4 None 3
     
  We can see from the results :Pandas A hierarchical column index was created for the new table . We can use these hierarchical column indexes to filter out the values of a single column , for example : use df2.user You can get it user The value in the column .

 

Two ,Pivot Table

        Here are some examples :
data = OrderedDict(( ("item", ['Item1', 'Item1', 'Item1', 'Item2']), ('color',
['red', 'blue', 'red', 'black']), ('user', ['1', '2', '3', '4']), ('bm', ['1',
'2', '3', '4']) )) data = DataFrame(data) df = data.pivot(index='item',
columns='color', values='user')
        The results are as follows :
ValueError: Index contains duplicate entries, cannot reshape
        therefore , Calling pivot Before function , We have to make sure that the columns and rows we specify do not have duplicate data . If we can't ensure that , We can use it pivot_table This method .

     
 pivot_table Method implements a similar pivot Function of the method , It can be used when there are duplicate columns and rows specified , We can use the mean , Median or other aggregate function to calculate a single value in a duplicate entry .

        first , Let's take a look first pivot_table() This method :
def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
fill_value=None, margins=False, dropna=True, margins_name='All'): """ Create a
spreadsheet-style pivot table as a DataFrame. The levels in the pivot table
will be stored in MultiIndex objects (hierarchical indexes) on the index and
columns of the result DataFrame Parameters ---------- data : DataFrame values :
column to aggregate, optional index : column, Grouper, array, or list of the
previous If an array is passed, it must be the same length as the data. The
list can contain any of the other types (except list). Keys to group by on the
pivot table index. If an array is passed, it is being used as the same manner
as column values. columns : column, Grouper, array, or list of the previous If
an array is passed, it must be the same length as the data. The list can
contain any of the other types (except list). Keys to group by on the pivot
table column. If an array is passed, it is being used as the same manner as
column values. aggfunc : function or list of functions, default numpy.mean If
list of functions passed, the resulting pivot table will have hierarchical
columns whose top level are the function names (inferred from the function
objects themselves) fill_value : scalar, default None Value to replace missing
values with margins : boolean, default False Add all row / columns (e.g. for
subtotal / grand totals) dropna : boolean, default True Do not include columns
whose entries are all NaN margins_name : string, default 'All' Name of the row
/ column that will contain the totals when margins is True.
        Let's take a look at an example :
data = OrderedDict(( ("item", ['Item1', 'Item1', 'Item1', 'Item2']), ('color',
['red', 'blue', 'red', 'black']), ('user', ['1', '2', '3', '4']), ('bm', ['1',
'2', '3', '4']) )) data = DataFrame(data) df = data.pivot_table(index='item',
columns='color', values='user', aggfunc=np.min) print(df)
        The result is :
color black blue red item Item1 None 2 1 Item2 4 None None
        actually ,pivot_table() yes pivot() Generalization of , It allows multiple values with the same target to be aggregated in the dataset .

 

Three ,Stack/Unstack

     
  in fact , Changing a table is just stacking DataFrame A special case of , Suppose we have one with multiple indexes on the row and column DataFrame. Stacking DataFrame This means that the innermost column index is moved to become the innermost row index , The reverse operation is called unstack , This means that the innermost row index is moved to the innermost column index . for example :
from pandas import DataFrame import pandas as pd import numpy as np # Building multiple row indexes
row_idx_arr = list(zip(['r0', 'r0'], ['r-00', 'r-01'])) row_idx =
pd.MultiIndex.from_tuples(row_idx_arr) # Building multiple column indexes col_idx_arr = list(zip(['c0',
'c0', 'c1'], ['c-00', 'c-01', 'c-10'])) col_idx =
pd.MultiIndex.from_tuples(col_idx_arr) # establish DataFrame d =
DataFrame(np.arange(6).reshape(2,3), index=row_idx, columns=col_idx) d =
d.applymap(lambda x: (x // 3, x % 3)) # Stack/Unstack s = d.stack() u =
d.unstack() print(s) print(u)
        The results are as follows :
c0 c1 r0 r-00 c-00 (0, 0) NaN c-01 (0, 1) NaN c-10 NaN (0, 2) r-01 c-00 (1,
0) NaN c-01 (1, 1) NaN c-10 NaN (1, 2) c0 c1 c-00 c-01 c-10 r-00 r-01 r-00 r-01
r-00 r-01 r0 (0, 0) (1, 0) (0, 1) (1, 1) (0, 2) (1, 2)
        actually ,Pandas Allows us to stack at any level of the index / Unstack . therefore , In the previous example , We can also stack at the outermost index level .
however , default ( The most typical situation ) It is stacked at the innermost index level / Unstack .

Technology
©2019-2020 Toolsou All rights reserved,
AWS Elastic Compute Cloud (Amazon EC2) Study notes ControllerAdvice Common scenes of PTA-MOOC《Python Programming Zhejiang University 》 Puzzle A Chapter 2 programming problems Dart In Isolatejetson Nano debugging CSI Interface camera , Combined use opencv Turn on the camera Knowledge points of preparing for the Blue Bridge Cup find method Force button 496 The next bigger element IBert/Transformer Calculation of model parameters 2020 year Java industry , General analysis of market environment and position Lambda Expression and functional interface