- 2020-07-05 09:28
*views 1*- pandas

1.5.2 2. practice

1.5.2.1 【 Exercise one 】： There is a copy about diamonds Data set of , The number of carats is recorded in columns , colour , Mining depth , Price , Please solve the following problems ：

pd.read_csv(‘data/Diamonds.csv’).head()

carat color depth price

0 0.23 E 61.5 326

1 0.21 E 59.8 326

2 0.23 E 56.9 327

3 0.29 I 62.4 334

4 0.31 J 63.3 335

1.5.2.2 (a) At all weights over 1 In carats of diamonds , What is the price range ?

1.5.2.3 (b)

If the mining depth is 0.2\0.4\0.6\0.8 Quantile is the basis of grouping , Which is the most colorful diamond in each group ? Is this color the most expensive per unit weight in the group on average ?

1.5.2.4 © Group by weight (0-0.5,0.5-1,1-1.5,1.5-2,2+), Sort indexes by increasing depth , Find the maximum of the length of the continuous strictly increasing price series in each group .

1.5.2.5 (d) Please group by color , The regression coefficients of price with respect to carat number are calculated respectively .（ Simple linear regression of single variable , And only use Pandas and Numpy complete ）

1.5.2.6 【 Exercise 2 】： There's a copy about America 10 Year to 17 Illicit drug data sets for , Columns record the year separately , State （5 individual ）, county , Drug type , Number of reports , Please solve the following problems ：

pd.read_csv(‘data/Drugs.csv’).head()

YYYY State COUNTY SubstanceName DrugReports

0 2010 VA ACCOMACK Propoxyphene 1

1 2010 OH ADAMS Morphine 9

2 2010 PA ADAMS Methadone 2

3 2010 VA ALEXANDRIA CITY Heroin 5

4 2010 PA ALLEGHENY Hydromorphone 5

1.5.2.7 (a) Statistics by year , Which county has the most reports ? Was the county's state also the most reported in that year ?

1.5.2.8 (b) from 14 Year to year 15 year ,Heroin Which state has increased the most ? Is it the biggest increase of all drugs in the state ? If not , Please find the medicine that meets the condition .

1.5 The first 3 chapter ： Exercise one

1.5.1 (a) The range is 17561

df = pd.read_csv(‘data/Diamonds.csv’)

df.head()

carat color depth price

0 0.23 E 61.5 326

1 0.21 E 59.8 326

2 0.23 E 56.9 327

3 0.29 I 62.4 334

4 0.31 J 63.3 335

df_r = df.query(‘carat>1’)[‘price’]

df_r.max()-df_r.min()

17561

1.5.2 (b) 0-0.2 The most quantile range was ‘E’, The rest of the interval is ‘G’

bins = df[‘depth’].quantile(np.linspace(0,1,6)).tolist()

cuts = pd.cut(df[‘depth’],bins=bins) # Optional label Add custom label

df[‘cuts’] = cuts

df.head()

carat color depth price cuts

0 0.23 E 61.5 326 (60.8, 61.6]

1 0.21 E 59.8 326 (43.0, 60.8]

2 0.23 E 56.9 327 (43.0, 60.8]

3 0.29 I 62.4 334 (62.1, 62.7]

4 0.31 J 63.3 335 (62.7, 79.0]

color_result = df.groupby(‘cuts’)[‘color’].describe()

color_result

count unique top freq

cuts

(43.0, 60.8] 11294 7 E 2259

(60.8, 61.6] 11831 7 G 2593

(61.6, 62.1] 10403 7 G 2247

(62.1, 62.7] 10137 7 G 2193

(62.7, 79.0] 10273 7 G 2000

1.5.3 The first three quantiles do not satisfy the condition , The color with the most quantity in the last two intervals is indeed the most expensive in average weight price

df[‘ Average price ’]=df[‘price’]/df[‘carat’]

color_result[‘top’] == [i[1] for i in df.groupby([‘cuts’

,‘color’])[‘ Average price ’].mean().groupby([‘cuts’]).idxmax().values]

cuts

(43.0, 60.8] False

(60.8, 61.6] False

(61.6, 62.1] False

(62.1, 62.7] True

(62.7, 79.0] True

Name: top, dtype: bool

1.5.4 © The results are shown below ：

df = df.drop(columns=‘ Average price ’)

cuts = pd.cut(df[‘carat’],bins=[0,0.5,1,1.5,2,np.inf]) # Optional label Add custom label

df[‘cuts’] = cuts

df.head()

carat color depth price cuts

0 0.23 E 61.5 326 (0.0, 0.5]

1 0.21 E 59.8 326 (0.0, 0.5]

2 0.23 E 56.9 327 (0.0, 0.5]

3 0.29 I 62.4 334 (0.0, 0.5]

4 0.31 J 63.3 335 (0.0, 0.5]

def f(nums):

if not nums:

return 0

res = 1

cur_len = 1

for i in range(1, len(nums)):

if nums[i-1] < nums[i]:

cur_len += 1

res = max(cur_len, res)

else:

cur_len = 1

return res

for name,group in df.groupby(‘cuts’):

group = group.sort_values(by=‘depth’)

s = group[‘price’]

print(name,f(s.tolist()))

(0.0, 0.5] 8

(0.5, 1.0] 8

(1.0, 1.5] 7

(1.5, 2.0] 11

(2.0, inf] 7

1.5.5 (d) The results are as follows ：

for name,group in df[[‘carat’,‘price’,‘color’]].groupby(‘color’):

L1 =

np.array([np.ones(group.shape[0]),group[‘carat’]]).reshape(2,group.shape[0])

L2 = group[‘price’]

result = (np.linalg.inv(L1.dot(L1.T)).dot(L1)).dot(L2).reshape(2,1)

print(‘ When the color is %s Time , The intercept is ：%f, The regression coefficient is ：%f’%(name,result[0],result[1]))

When the color is D Time , The intercept is ：-2361.017152, The regression coefficient is ：8408.353126

When the color is E Time , The intercept is ：-2381.049600, The regression coefficient is ：8296.212783

When the color is F Time , The intercept is ：-2665.806191, The regression coefficient is ：8676.658344

When the color is G Time , The intercept is ：-2575.527643, The regression coefficient is ：8525.345779

When the color is H Time , The intercept is ：-2460.418046, The regression coefficient is ：7619.098320

When the color is I Time , The intercept is ：-2878.150356, The regression coefficient is ：7761.041169

When the color is J Time , The intercept is ：-2920.603337, The regression coefficient is ：7094.192092

1.6 The first 3 chapter ： Exercise 2

1.6.1 (a)

df = pd.read_csv(‘data/Drugs.csv’)

df.head()

YYYY State COUNTY SubstanceName DrugReports

0 2010 VA ACCOMACK Propoxyphene 1

1 2010 OH ADAMS Morphine 9

2 2010 PA ADAMS Methadone 2

3 2010 VA ALEXANDRIA CITY Heroin 5

4 2010 PA ALLEGHENY Hydromorphone 5

idx=pd.IndexSlice

for i in range(2010,2018):

county = (df.groupby([‘COUNTY’,‘YYYY’]).sum().loc[idx[:,i],:].idxmax()[0][0])

state = df.query(‘COUNTY == “%s”’%county)[‘State’].iloc[0]

state_true = df.groupby([‘State’,‘YYYY’]).sum().loc[idx[:,i],:].idxmax()[0][0]

if statestate_true:

print(‘ stay %d year ,%s Counties have the largest number of reports , The state it belongs to %s It is also the largest number of reports ’%(i,county,state))

else:

print(‘ stay %d year ,%s Counties have the largest number of reports , But the state it belongs to %s Not the most reported ,%s State reports the most ’%(i,county,state,state_true))

stay 2010 year ,PHILADELPHIA Counties have the largest number of reports , The state it belongs to PA It is also the largest number of reports

stay 2011 year ,PHILADELPHIA Counties have the largest number of reports , But the state it belongs to PA Not the most reported ,OH State reports the most

stay 2012 year ,PHILADELPHIA Counties have the largest number of reports , But the state it belongs to PA Not the most reported ,OH State reports the most

stay 2013 year ,PHILADELPHIA Counties have the largest number of reports , But the state it belongs to PA Not the most reported ,OH State reports the most

stay 2014 year ,PHILADELPHIA Counties have the largest number of reports , But the state it belongs to PA Not the most reported ,OH State reports the most

stay 2015 year ,PHILADELPHIA Counties have the largest number of reports , But the state it belongs to PA Not the most reported ,OH State reports the most

stay 2016 year ,HAMILTON Counties have the largest number of reports , The state it belongs to OH It is also the largest number of reports

stay 2017 year ,HAMILTON Counties have the largest number of reports , The state it belongs to OH It is also the largest number of reports

1.6.2 (b) OH States increased the most ,Heroin It's the largest increment , But the biggest increase was Acetyl fentanyl

df_b = df[(df[‘YYYY’].isin([2014,2015]))&(df[‘SubstanceName’]‘Heroin’)]

df_add = df_b.groupby([‘YYYY’,‘State’]).sum()

(df_add.loc[2015]-df_add.loc[2014]).idxmax()

DrugReports OH

dtype: object

df_b = df[(df[‘YYYY’].isin([2014,2015]))&(df[‘State’]==‘OH’)]

df_add = df_b.groupby([‘YYYY’,‘SubstanceName’]).sum()

display((df_add.loc[2015]-df_add.loc[2014]).idxmax()) # The feature of index alignment is used here

display((df_add.loc[2015]/df_add.loc[2014]).idxmax())

DrugReports Heroin

dtype: object

DrugReports Acetyl fentanyl

dtype: object

Technology

Daily Recommendation

views 4

views 2

views 2

©2019-2020 Toolsou All rights reserved,

keras from dataframe Read data and enhance the data for training （ classification + division ）unity Get the car speed and forward or backward Share has surpassed Ningde Era !LG Chemical confirmation to spin off battery business unit ( Essence )2020 year 7 month 15 day Wechat applet template Use of ( Essence )2020 year 6 month 26 day C# Class library Enum( Extension method )Hackbar Using the tutorial Vue Get the text and option value of the drop-down box Python realization switch method Mybatis Mapping file Mapper.xml in # and $ The difference between Thoughts on multi tenant system