- 2020-06-16 18:54
*views 2*- r language

The data was downloaded by the Bureau of statistics , Prepare for the final assignment .

Save data as csv pattern , OK, import to RGui in

particular year , Birth rate , Population mortality , natural population growth rate

2000,1.403,0.645,0.758

2001,1.338,0.643,0.695

2002,1.286,0.641,0.645

2003,1.241,0.64,0.601

2004,1.229,0.642,0.587

2005,1.24,0.651,0.589

2006,1.209,0.681,0.528

2007,1.21,0.693,0.517

2008,1.214,0.706,0.508

2009,1.195,0.708,0.487

2010,1.19,0.711,0.479

2011,1.193,0.714,0.479

2012,1.21,0.715,0.495

2013,1.208,0.716,0.492

2014,1.237,0.716,0.521

2015,1.207,0.711,0.496

2016,1.295,0.709,0.586

2017,1.243,0.711,0.532

2018,1.094,0.713,0.381

2019,1.048,0.714,0.334

（ Copy the data yourself ）

<> one , Mission purpose :

Comprehensive use of the statistical theory of this semester ,R Language programming skills and data analysis cases , According to personal interests , Collect your own data , Organize data , Display data , Analysis data , Mining data value . It can be used correctly and reasonably R Data analysis based on XML , And this case is organized into a data analysis report .

<> two , Mission requirements :

1. Data descriptive statistics （ average , standard deviation , median , Skewness degree ）, Reflect the characteristics of data , And the important data variables are shown in charts .

2. Data analysis ： Make clear what problems to solve through data analysis , What kind of analytical thinking should we use , Analysis methods and models , And finally draw a conclusion or effect .

3. In the process of analysis , According to data characteristics , Explain why we use xx function to solve the problem , And the relevant parameters obtained by the operation are interpreted .

for example ： Parameter estimation , The data are normally distributed σ unknown , sample size <30, It is a small sample , The mean value of the sample is subject to its own degree after standardization n-1 Of t distribution , So we use t.test function ……

Hypothesis testing , It needs to be specified “ hypothesis ”, Conclusion and corresponding code .

When modeling , Specific steps need to be defined , Interpretation of modeling parameters

<> Data descriptive statistics

<> Drawing line with box , Histogram of axonal line and density estimation

<> Nuclear density map

Birth rate ：

Population mortality ：

natural population growth rate ：

<> Time series diagram

<> Data analysis

* Determine the relationship between variables

First of all, birth rate and death rate ：

Declining birth rate , The mortality rate is increasing , Their observation points are distributed around a straight line , So it has a negative linear relationship . Two box line diagram display , Birth rate and death rate are not symmetrically distributed . From the fitted curve , It has some linear characteristics , It can be considered that there is a linear relationship between the two variables .

The second is the birth rate and natural growth rate

The code is the same as above

The birth rate has increased , The natural rate of population growth is also increasing , Their observation points are distributed around a straight line , So it has a positive linear correlation . Two box line diagram display , The birth rate and death rate have a certain symmetrical distribution . From the fitted curve , The nonlinear characteristic is not obvious , Shows that there is a linear relationship between two variables .

Finally, mortality and natural growth rate

The death rate of the population is decreasing , The natural growth rate of population is increasing , Their observation points are distributed around a straight line , So it has negative linear correlation . Two box line diagram display , Birth rate and death rate are not symmetrically distributed . From the fitted curve , The linear characteristic is not very obvious , It shows that there is no strong linear relationship between the two variables .

* On the calculation and test of correlation coefficient

* Regression model and regression equation

arrows Function to add an arrow to a chart , You only need to specify the start and end coordinates separately , You can add arrows , You can also change the shape of the arrow through some properties , Size .``

xo, yo That specifies the starting point x and y coordinate ,x1, y1 That specifies the endpoint x and y coordinate

arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4)

x0, y0,x1,y1 Multiple values can be set at one time , Draw multiple arrows at the same time

arrows(x0 = c(1, 1), y0 = c(1, 2), x1 = c(4, 4), y1 = c(4, 5))

length ： This parameter can only be set one value at a time , The default value is 0.25, To adjust the size of the different arrows , It is recommended to set them separately

par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",

main= "length = 0.1") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.1) plot(

1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "length = 0.5")

arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5) plot(1:5, 1:5, xlim = c(0,6

), ylim = c (0,6), type = "n", main = "length = 1") arrows(x0 = 1, y0 = 1, x1 =

4, y1 = 4, length = 1)

design sketch

code : Adjust the type of arrow , All in all 1,2,3,4 There are four types , This parameter can only be set one value at a time

par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",

main= "code = 1") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, code = 1) plot(1:5, 1:5

, xlim = c(0,6), ylim = c (0,6), type = "n", main = "code = 2") arrows(x0 = 1,

y0= 1, x1 = 4, y1 = 4, code = 2) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6),

type= "n", main = "code = 3") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, code = 3)

design sketch

angle : Sets the angle of the arrow , The default value is 45, This parameter can only be set one value at a time

par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",

main= "angle = 15") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle =

15) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "angle =

45") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle = 45) plot(1:5,

1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "angle = 60") arrows(x0 =

1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle = 60)

design sketch

Except for the above arrows In addition to the special parameters of , Some general parameters are also supported ,col , lty ,lwd etc.

Birth rate and natural growth rate

Mortality and natural growth rate

<> Goodness of fit of the model （ Take birth and death rates ）

Determinants of birth and death rates R^2=0.3447=34.47% It means that there is one in the total error of the value of birth rate 34.47% It can be explained by the linear relationship between birth rate and death rate , It can be seen that the fitting degree of the model is low

<> Standard error of residuals

The standard error of residuals for birth and death rates is 0.06245, The average prediction error of using mortality to predict birth rate is 6.245%

<> Model significance test —— Linear relation test （F test ）

#H0： Not significant ;H1: remarkable

#F=9.469 p=0.006496<0.05 Reject the original hypothesis , The linear relationship is significant

<> Test and inference of regression coefficient

Birth and death rates

H0：β1=0（ The influence of independent variables on dependent variables is not significant ）;H1:β2≠0（ remarkable )

t=-3.077 p=0.0065<0.05, Reject the original hypothesis , The independent variable has a significant effect on the dependent variable

<> Prediction by regression equation

# Predicted value of calculation point （pre_model）, confidence interval （con_int） And forecast interval （pre_int） model<-lm( Birth rate ~ Population mortality ,data=table

) x0<-table$ Population mortality pre_model<-predict(model) con_int<-predict(model,data.frame(

Population mortality =x0),interval="confidence",level=0.95) pre_int<-predict(model,data.frame(

Population mortality =x0),interval="prediction",level=0.95) pre<-data.frame( Birth rate =table$ Birth rate ,

Point prediction =pre_model, Lower confidence limit =con_int[,2], Upper confidence limit =con_int[,3], Lower prediction limit =pre_int[,2], Forecast upper limit =pre_int[,

3]) pre

<> Regression model diagnosis

# Calculate the predicted value （pre）, residual （res） And standardized residuals （zre）（ birth rate , mortality ） model<-lm( Birth rate ~ Population mortality ,data=table) pre<-

fitted(model) res<-residuals(model) zre<-model$residuals/(sqrt(deviance(model)/

df.residual(model))) mysummary<-data.frame( Birth rate =table$ Birth rate , Point prediction =pre, residual =res,

Standardized residuals =zre) mysummary

<> Test linear relationship

# Component residual diagram model_1<-lm( Birth rate ~ Population mortality ,data=table) library(car) par(mai=c(.7,.7,.1,.1),

cex=.8) crPlots(model_1)# linear # Testing normality par(mfrow=c(2,2),cex=0.8,cex.main=0.7) plot(

model_1) # Test homogeneity of variance library(car) ncvTest(model_1) # Draw scatter — Level chart spreadLevelPlot(model_1

) # Test the independence of residuals library(car) durbinWatsonTest(model_1)

Residual component diagram , The abscissa is the actual observed value of the independent variable , The ordinate is the sum of the dependent variable and the residual , It can be seen from the fitting curve , There is no obvious nonlinear model of birth rate and death rate , It shows that the linear relationship between them is true .

The graph in the upper right corner is the positive result of the normalized residuals Q-Q chart , Hypothesis of normality for testing residuals , It can be seen that , Most of the points are randomly distributed around the line , There is no fixed mold , therefore , In the linear model of birth rate and death rate ,ε The assumption of normality is basically true .

The graph in the upper left corner is the graph of residual value and fitting value

The graph in the lower left corner is a location scale graph

The bottom right is the residual and leverage diagram , Used to identify whether there are outliers in sample data , High leverage point and strong influence point .

The original hypothesis of variance homogeneity test is that the error term satisfies variance homogeneity p=0.98995 Accept the original hypothesis , It can be considered that the homogeneity of variance is satisfied .

The nonlinear characteristic of the graph is obvious , So it does not satisfy the assumption of homogeneity of variance

The original assumption is that the residuals have no autocorrelation p=0, Reject the original hypothesis , It shows that the residuals are autocorrelated

Technology

- Java425 articles
- Python242 articles
- Vue127 articles
- Linux119 articles
- MySQL100 articles
- javascript77 articles
- SpringBoot72 articles
- C++68 articles
- more...

Daily Recommendation

©2019-2020 Toolsou All rights reserved,

It's unexpected Python Cherry tree （turtle The gorgeous style of Library ）Unity3D of UGUI Basics -- Three modes of canvas os Simple use of module computer network --- Basic concepts of computer network （ agreement , system ）html Writing about cherry trees , Writing about cherry trees Some East 14 Pay change 16 salary , Sincerity or routine ?Unity-Demo Examples ✨ realization UI- Backpack equipment drag function 06【 Interpretation according to the frame 】 Data range filtering -- awesome java Four functional interfaces （ a key , simple ）