The data was downloaded by the Bureau of statistics , Prepare for the final assignment .
Save data as csv pattern , OK, import to RGui in

particular year , Birth rate , Population mortality , natural population growth rate
2000,1.403,0.645,0.758
2001,1.338,0.643,0.695
2002,1.286,0.641,0.645
2003,1.241,0.64,0.601
2004,1.229,0.642,0.587
2005,1.24,0.651,0.589
2006,1.209,0.681,0.528
2007,1.21,0.693,0.517
2008,1.214,0.706,0.508
2009,1.195,0.708,0.487
2010,1.19,0.711,0.479
2011,1.193,0.714,0.479
2012,1.21,0.715,0.495
2013,1.208,0.716,0.492
2014,1.237,0.716,0.521
2015,1.207,0.711,0.496
2016,1.295,0.709,0.586
2017,1.243,0.711,0.532
2018,1.094,0.713,0.381
2019,1.048,0.714,0.334
（ Copy the data yourself ）

<> one , Mission purpose :

Comprehensive use of the statistical theory of this semester ,R Language programming skills and data analysis cases , According to personal interests , Collect your own data , Organize data , Display data , Analysis data , Mining data value . It can be used correctly and reasonably R Data analysis based on XML , And this case is organized into a data analysis report .

<> two , Mission requirements :

1. Data descriptive statistics （ average , standard deviation , median , Skewness degree ）, Reflect the characteristics of data , And the important data variables are shown in charts .
2. Data analysis ： Make clear what problems to solve through data analysis , What kind of analytical thinking should we use , Analysis methods and models , And finally draw a conclusion or effect .
3. In the process of analysis , According to data characteristics , Explain why we use xx function to solve the problem , And the relevant parameters obtained by the operation are interpreted .
for example ： Parameter estimation , The data are normally distributed σ unknown , sample size <30, It is a small sample , The mean value of the sample is subject to its own degree after standardization n-1 Of t distribution , So we use t.test function ……
Hypothesis testing , It needs to be specified “ hypothesis ”, Conclusion and corresponding code .
When modeling , Specific steps need to be defined , Interpretation of modeling parameters

<> Data descriptive statistics

<> Drawing line with box , Histogram of axonal line and density estimation

<> Nuclear density map

Birth rate ：

Population mortality ：

natural population growth rate ：

<> Time series diagram

<> Data analysis

* Determine the relationship between variables
First of all, birth rate and death rate ：

Declining birth rate , The mortality rate is increasing , Their observation points are distributed around a straight line , So it has a negative linear relationship . Two box line diagram display , Birth rate and death rate are not symmetrically distributed . From the fitted curve , It has some linear characteristics , It can be considered that there is a linear relationship between the two variables .
The second is the birth rate and natural growth rate
The code is the same as above

The birth rate has increased , The natural rate of population growth is also increasing , Their observation points are distributed around a straight line , So it has a positive linear correlation . Two box line diagram display , The birth rate and death rate have a certain symmetrical distribution . From the fitted curve , The nonlinear characteristic is not obvious , Shows that there is a linear relationship between two variables .
Finally, mortality and natural growth rate

The death rate of the population is decreasing , The natural growth rate of population is increasing , Their observation points are distributed around a straight line , So it has negative linear correlation . Two box line diagram display , Birth rate and death rate are not symmetrically distributed . From the fitted curve , The linear characteristic is not very obvious , It shows that there is no strong linear relationship between the two variables .
* On the calculation and test of correlation coefficient

* Regression model and regression equation

arrows Function to add an arrow to a chart , You only need to specify the start and end coordinates separately , You can add arrows , You can also change the shape of the arrow through some properties , Size .``
xo, yo That specifies the starting point x and y coordinate ,x1, y1 That specifies the endpoint x and y coordinate
arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4)
x0, y0,x1,y1 Multiple values can be set at one time , Draw multiple arrows at the same time
arrows(x0 = c(1, 1), y0 = c(1, 2), x1 = c(4, 4), y1 = c(4, 5))
length ： This parameter can only be set one value at a time , The default value is 0.25, To adjust the size of the different arrows , It is recommended to set them separately
par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",
main= "length = 0.1") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.1) plot(
1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "length = 0.5")
arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5) plot(1:5, 1:5, xlim = c(0,6
), ylim = c (0,6), type = "n", main = "length = 1") arrows(x0 = 1, y0 = 1, x1 =
4, y1 = 4, length = 1)
design sketch
code : Adjust the type of arrow , All in all 1,2,3,4 There are four types , This parameter can only be set one value at a time
par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",
main= "code = 1") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, code = 1) plot(1:5, 1:5
, xlim = c(0,6), ylim = c (0,6), type = "n", main = "code = 2") arrows(x0 = 1,
y0= 1, x1 = 4, y1 = 4, code = 2) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6),
type= "n", main = "code = 3") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, code = 3)
design sketch
angle : Sets the angle of the arrow , The default value is 45, This parameter can only be set one value at a time
par(mfrow = c(1,3)) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n",
main= "angle = 15") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle =
15) plot(1:5, 1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "angle =
45") arrows(x0 = 1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle = 45) plot(1:5,
1:5, xlim = c(0,6), ylim = c (0,6), type = "n", main = "angle = 60") arrows(x0 =
1, y0 = 1, x1 = 4, y1 = 4, length = 0.5, angle = 60)
design sketch
Except for the above arrows In addition to the special parameters of , Some general parameters are also supported ,col , lty ,lwd etc.

Birth rate and natural growth rate

Mortality and natural growth rate

<> Goodness of fit of the model （ Take birth and death rates ）

Determinants of birth and death rates R^2=0.3447=34.47% It means that there is one in the total error of the value of birth rate 34.47% It can be explained by the linear relationship between birth rate and death rate , It can be seen that the fitting degree of the model is low

<> Standard error of residuals

The standard error of residuals for birth and death rates is 0.06245, The average prediction error of using mortality to predict birth rate is 6.245%

<> Model significance test —— Linear relation test （F test ）

#H0： Not significant ;H1: remarkable
#F=9.469 p=0.006496<0.05 Reject the original hypothesis , The linear relationship is significant

<> Test and inference of regression coefficient

Birth and death rates
H0：β1=0（ The influence of independent variables on dependent variables is not significant ）;H1:β2≠0（ remarkable )
t=-3.077 p=0.0065<0.05, Reject the original hypothesis , The independent variable has a significant effect on the dependent variable

<> Prediction by regression equation
# Predicted value of calculation point （pre_model）, confidence interval （con_int） And forecast interval （pre_int） model<-lm( Birth rate ~ Population mortality ,data=table
) x0<-table\$ Population mortality pre_model<-predict(model) con_int<-predict(model,data.frame(
Population mortality =x0),interval="confidence",level=0.95) pre_int<-predict(model,data.frame(
Population mortality =x0),interval="prediction",level=0.95) pre<-data.frame( Birth rate =table\$ Birth rate ,
Point prediction =pre_model, Lower confidence limit =con_int[,2], Upper confidence limit =con_int[,3], Lower prediction limit =pre_int[,2], Forecast upper limit =pre_int[,
3]) pre

<> Regression model diagnosis
# Calculate the predicted value （pre）, residual （res） And standardized residuals （zre）（ birth rate , mortality ） model<-lm( Birth rate ~ Population mortality ,data=table) pre<-
fitted(model) res<-residuals(model) zre<-model\$residuals/(sqrt(deviance(model)/
df.residual(model))) mysummary<-data.frame( Birth rate =table\$ Birth rate , Point prediction =pre, residual =res,
Standardized residuals =zre) mysummary

<> Test linear relationship
# Component residual diagram model_1<-lm( Birth rate ~ Population mortality ,data=table) library(car) par(mai=c(.7,.7,.1,.1),
cex=.8) crPlots(model_1)# linear # Testing normality par(mfrow=c(2,2),cex=0.8,cex.main=0.7) plot(
model_1) # Test homogeneity of variance library(car) ncvTest(model_1) # Draw scatter — Level chart spreadLevelPlot(model_1
) # Test the independence of residuals library(car) durbinWatsonTest(model_1)

Residual component diagram , The abscissa is the actual observed value of the independent variable , The ordinate is the sum of the dependent variable and the residual , It can be seen from the fitting curve , There is no obvious nonlinear model of birth rate and death rate , It shows that the linear relationship between them is true .

The graph in the upper right corner is the positive result of the normalized residuals Q-Q chart , Hypothesis of normality for testing residuals , It can be seen that , Most of the points are randomly distributed around the line , There is no fixed mold , therefore , In the linear model of birth rate and death rate ,ε The assumption of normality is basically true .
The graph in the upper left corner is the graph of residual value and fitting value
The graph in the lower left corner is a location scale graph
The bottom right is the residual and leverage diagram , Used to identify whether there are outliers in sample data , High leverage point and strong influence point .

The original hypothesis of variance homogeneity test is that the error term satisfies variance homogeneity p=0.98995 Accept the original hypothesis , It can be considered that the homogeneity of variance is satisfied .

The nonlinear characteristic of the graph is obvious , So it does not satisfy the assumption of homogeneity of variance

The original assumption is that the residuals have no autocorrelation p=0, Reject the original hypothesis , It shows that the residuals are autocorrelated

Technology
Daily Recommendation