The data was downloaded by the Bureau of statistics , Prepare for the final assignment .
Save data as csv pattern , OK, import to RGui in

particular year , Birth rate , Population mortality , natural population growth rate
2000,1.403,0.645,0.758
2001,1.338,0.643,0.695
2002,1.286,0.641,0.645
2003,1.241,0.64,0.601
2004,1.229,0.642,0.587
2005,1.24,0.651,0.589
2006,1.209,0.681,0.528
2007,1.21,0.693,0.517
2008,1.214,0.706,0.508
2009,1.195,0.708,0.487
2010,1.19,0.711,0.479
2011,1.193,0.714,0.479
2012,1.21,0.715,0.495
2013,1.208,0.716,0.492
2014,1.237,0.716,0.521
2015,1.207,0.711,0.496
2016,1.295,0.709,0.586
2017,1.243,0.711,0.532
2018,1.094,0.713,0.381
2019,1.048,0.714,0.334
( Copy the data yourself )

<> one , Mission purpose :

Comprehensive use of the statistical theory of this semester ,R Language programming skills and data analysis cases , According to personal interests , Collect your own data , Organize data , Display data , Analysis data , Mining data value . It can be used correctly and reasonably R Data analysis based on XML , And this case is organized into a data analysis report .

<> two , Mission requirements :

1. Data descriptive statistics ( average , standard deviation , median , Skewness degree ), Reflect the characteristics of data , And the important data variables are shown in charts .
2. Data analysis : Make clear what problems to solve through data analysis , What kind of analytical thinking should we use , Analysis methods and models , And finally draw a conclusion or effect .
3. In the process of analysis , According to data characteristics , Explain why we use xx function to solve the problem , And the relevant parameters obtained by the operation are interpreted .
for example : Parameter estimation , The data are normally distributed σ unknown , sample size <30, It is a small sample , The mean value of the sample is subject to its own degree after standardization n-1 Of t distribution , So we use t.test function ……
Hypothesis testing , It needs to be specified “ hypothesis ”, Conclusion and corresponding code .
When modeling , Specific steps need to be defined , Interpretation of modeling parameters

<> Data descriptive statistics

<> Drawing line with box , Histogram of axonal line and density estimation

<> Nuclear density map

Birth rate :

Population mortality :

natural population growth rate :

<> Time series diagram

<> Data analysis

* Determine the relationship between variables
First of all, birth rate and death rate :


Declining birth rate , The mortality rate is increasing , Their observation points are distributed around a straight line , So it has a negative linear relationship . Two box line diagram display , Birth rate and death rate are not symmetrically distributed . From the fitted curve , It has some linear characteristics , It can be considered that there is a linear relationship between the two variables .
The second is the birth rate and natural growth rate
The code is the same as above


The birth rate has increased , The natural rate of population growth is also increasing , Their observation points are distributed around a straight line , So it has a positive linear correlation . Two box line diagram display , The birth rate and death rate have a certain symmetrical distribution . From the fitted curve , The nonlinear characteristic is not obvious , Shows that there is a linear relationship between two variables .
Finally, mortality and natural growth rate


The death rate of the population is decreasing , The natural growth rate of population is increasing , Their observation points are distributed around a straight line , So it has negative linear correlation . Two box line diagram display , Birth rate and death rate are not symmetrically distributed . From the fitted curve , The linear characteristic is not very obvious , It shows that there is no strong linear relationship between the two variables .
* On the calculation and test of correlation coefficient

* Regression model and regression equation

arrows Function to add an arrow to a chart , You only need to specify the start and end coordinates separately , You can add arrows , You can also change the shape of the arrow through some properties , Size .``
xo, yo That specifies the starting point x and y coordinate ,x1, y1 That specifies the endpoint x and y coordinate
arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4)
x0, y0,x1,y1 Multiple values can be set at one time , Draw multiple arrows at the same time
arrows(x0 = c(1, 1), y0 = c(1, 2), x1 = c(4, 4), y1 = c(4, 5))
length : This parameter can only be set one value at a time , The default value is 0.25, To adjust the size of the different arrows , It is recommended to set them separately
par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",
main= "length = 0.1") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.1) plot(
1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "length = 0.5")
arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5) plot(1:5, 1:5, xlim = c(0,6
), ylim = c (0,6), type = "n", main = "length = 1") arrows(x0 = 1, y0 = 1, x1 =
4, y1 = 4, length = 1)
design sketch
code : Adjust the type of arrow , All in all 1,2,3,4 There are four types , This parameter can only be set one value at a time
par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",
main= "code = 1") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, code = 1) plot(1:5, 1:5
, xlim = c(0,6), ylim = c (0,6), type = "n", main = "code = 2") arrows(x0 = 1,
y0= 1, x1 = 4, y1 = 4, code = 2) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6),
type= "n", main = "code = 3") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, code = 3)
design sketch
angle : Sets the angle of the arrow , The default value is 45, This parameter can only be set one value at a time
par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",
main= "angle = 15") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle =
15) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "angle =
45") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle = 45) plot(1:5,
1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "angle = 60") arrows(x0 =
1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle = 60)
design sketch
Except for the above arrows In addition to the special parameters of , Some general parameters are also supported ,col , lty ,lwd etc.

Birth rate and natural growth rate

Mortality and natural growth rate

<> Goodness of fit of the model ( Take birth and death rates )


Determinants of birth and death rates R^2=0.3447=34.47% It means that there is one in the total error of the value of birth rate 34.47% It can be explained by the linear relationship between birth rate and death rate , It can be seen that the fitting degree of the model is low

<> Standard error of residuals

The standard error of residuals for birth and death rates is 0.06245, The average prediction error of using mortality to predict birth rate is 6.245%

<> Model significance test —— Linear relation test (F test )

#H0: Not significant ;H1: remarkable
#F=9.469 p=0.006496<0.05 Reject the original hypothesis , The linear relationship is significant

<> Test and inference of regression coefficient

Birth and death rates
H0:β1=0( The influence of independent variables on dependent variables is not significant );H1:β2≠0( remarkable )
t=-3.077 p=0.0065<0.05, Reject the original hypothesis , The independent variable has a significant effect on the dependent variable

<> Prediction by regression equation
# Predicted value of calculation point (pre_model), confidence interval (con_int) And forecast interval (pre_int) model<-lm( Birth rate ~ Population mortality ,data=table
) x0<-table$ Population mortality pre_model<-predict(model) con_int<-predict(model,data.frame(
Population mortality =x0),interval="confidence",level=0.95) pre_int<-predict(model,data.frame(
Population mortality =x0),interval="prediction",level=0.95) pre<-data.frame( Birth rate =table$ Birth rate ,
Point prediction =pre_model, Lower confidence limit =con_int[,2], Upper confidence limit =con_int[,3], Lower prediction limit =pre_int[,2], Forecast upper limit =pre_int[,
3]) pre

<> Regression model diagnosis
# Calculate the predicted value (pre), residual (res) And standardized residuals (zre)( birth rate , mortality ) model<-lm( Birth rate ~ Population mortality ,data=table) pre<-
fitted(model) res<-residuals(model) zre<-model$residuals/(sqrt(deviance(model)/
df.residual(model))) mysummary<-data.frame( Birth rate =table$ Birth rate , Point prediction =pre, residual =res,
Standardized residuals =zre) mysummary

<> Test linear relationship
# Component residual diagram model_1<-lm( Birth rate ~ Population mortality ,data=table) library(car) par(mai=c(.7,.7,.1,.1),
cex=.8) crPlots(model_1)# linear # Testing normality par(mfrow=c(2,2),cex=0.8,cex.main=0.7) plot(
model_1) # Test homogeneity of variance library(car) ncvTest(model_1) # Draw scatter — Level chart spreadLevelPlot(model_1
) # Test the independence of residuals library(car) durbinWatsonTest(model_1)


Residual component diagram , The abscissa is the actual observed value of the independent variable , The ordinate is the sum of the dependent variable and the residual , It can be seen from the fitting curve , There is no obvious nonlinear model of birth rate and death rate , It shows that the linear relationship between them is true .


The graph in the upper right corner is the positive result of the normalized residuals Q-Q chart , Hypothesis of normality for testing residuals , It can be seen that , Most of the points are randomly distributed around the line , There is no fixed mold , therefore , In the linear model of birth rate and death rate ,ε The assumption of normality is basically true .
The graph in the upper left corner is the graph of residual value and fitting value
The graph in the lower left corner is a location scale graph
The bottom right is the residual and leverage diagram , Used to identify whether there are outliers in sample data , High leverage point and strong influence point .

The original hypothesis of variance homogeneity test is that the error term satisfies variance homogeneity p=0.98995 Accept the original hypothesis , It can be considered that the homogeneity of variance is satisfied .

The nonlinear characteristic of the graph is obvious , So it does not satisfy the assumption of homogeneity of variance

The original assumption is that the residuals have no autocorrelation p=0, Reject the original hypothesis , It shows that the residuals are autocorrelated

Technology
©2019-2020 Toolsou All rights reserved,
It's unexpected Python Cherry tree (turtle The gorgeous style of Library )Unity3D of UGUI Basics -- Three modes of canvas os Simple use of module computer network --- Basic concepts of computer network ( agreement , system )html Writing about cherry trees , Writing about cherry trees Some East 14 Pay change 16 salary , Sincerity or routine ?Unity-Demo Examples ✨ realization UI- Backpack equipment drag function 06【 Interpretation according to the frame 】 Data range filtering -- awesome java Four functional interfaces ( a key , simple )