The usual answer is at the beginning
Why? sum To change join?
because join Speed ratio sum Several times faster !! If you want to run your code efficiently , Replacement recommended ~
Let's do it , See what happens
import pandas as pd from time import time df = pd.DataFrame(zip(range(1000000),
['test']*1000000),columns=['a','b']) df['c'] = df.apply(lambda x: str(x.a)[-1],
axis=1) start = time() for i in range(10): data = df[['b','c']].groupby('c',
as_index=False).sum() print('sum Time: {:5.2f}s'.format(time() - start)) def
is_join(data_df): res_str = "".join(map(str, list(data_df))) return res_str
start= time() for i in range(10): data = df[['b', 'c']].groupby('c', as_index=
False).agg(is_join)
Built a 100w OK df Test the data
The test results are as follows
Can be seen with join Function is more direct than adding strings sum Much faster , nearly 20 times
When the data volume is large, the effect is more obvious !
I am a moving ant , Hope to move forward together .
If it helps you a little , One like is enough , thank !
notes : If there are any mistakes and suggestions in this blog , Welcome to point out , esteem it a favor !!!
Technology
Daily Recommendation