“ Does data analysis need to know database knowledge ?”
Old man Li often hears such questions recently , Many new people feel that if they only do business , No technical line , Is it possible not to study the database ? Is there no need to prepare for the interview ?
After all, many newcomers have just entered the company , As soon as you see the complex linking of various tables in the database , I started to have a headache , Even more so SQL I don't know the difference from the database , Not to mention the concepts of data warehouse .
however , To the disappointment of many people , Database knowledge is my most frequently asked question when interviewing new people , None of them .
Does data analysis need to understand databases ?
Knowledge of database , It depends on which level of data analysis you are positioning , I've seen some data analysts who don't understand databases , It's all based on Excel File source for data analysis , They are better at business .
however , They usually have to rely on IT Department support , need IT The Department provides some basic analysis data . With an understanding of the business , Understanding business data can also do a good job in data analysis .
There is also a class of data analysts , From the database itself , business intelligence BI The role of a pure data analyst , Through business understanding and data processing skills and knowledge, you can also do well in the field of analysis .
One of the advantages of them is that they don't need to rely too much IT department , Give them a certain amount of authority, they can directly face the unified data source to do data analysis , Sometimes one SQL It is a part of data analysis .
therefore , As a data analyst , I don't think it's important to have more skills , One more ability to obtain different data channels , Nature is good .
What is a database ?
First, before you understand the database , We need to understand how data is stored ?
We all know that , When our ancestors were still drinking in the wilderness , Learn to use the rope to remember things to store data , These are the knotted ropes “ data ”, Although this kind of data is difficult to keep , It's hard to extract .
Later, the ancestors used oracle bones , bamboo slip , Paper to store text data , The recorder was invented in modern times , The camera stores audio data , Although the data carrier has been changing , But the way of data storage has not changed a lot , All belong to the traditional storage mode .
Until the advent of the information age , The way of data storage has changed greatly and is developing in two directions ： Files and databases .
1, File is equivalent to storing data in Excel among , Form read-write files and store them , And then through the python And other tools to filter the file data , handle , extract ;
2, Database is to store data in computer according to its structure , Form a data set with a large amount of data , It is equivalent to a file cabinet for storing documents .
Using database to store data is the most popular way at present , Because the database has persistent storage , Reading and writing speed is also very high , More importantly, the database can greatly ensure the validity of the data , It's not like it is Excel It is easy to make modification errors .
Database classification ?
According to the early database theory , There are three popular database models , They are hierarchical databases , Mesh database and relational database . And in today's Internet companies , There are two most commonly used database models , Namely
Relational database and non relational database .
Relational database model is to reduce the complex data structure to simple binary relationship （ That is, two-dimensional table form ）. In a relational database , Almost all operations on data are based on one or more relational tables , Classification by these associated tables , merge , Connect or select operations to achieve data management .
The popular understanding of relational type is , Data exists as a two-dimensional array , You can think of it as the arrangement of books in the library .
bookshelf , Floor you can understand it as a relational data structure , Books exist as data , And all librarians are the process of the database , For different jobs . Someone put out the fire （ Data recovery , backups ）, Someone is tidying up the bookshelves , book （ Data arrangement , file ）, The user process refers to the customers who come to the library , They read books , Mobile books , And the administrator will face the maintenance .
Relational database has existed since its birth 40 For years , From theory generation to development to product realization , for example ： common MySQL and Oracle
,oracle In the database field rose to a dominant position , Forming a huge industrial market of tens of billions of US dollars every year , and MySQL It is also a database that can not be ignored , So much so that Oracle It's a big acquisition .
Non relational databases are also known as NoSQL database , It was meant to be “Not Only SQL”
, As an effective supplement to the traditional database .NoSQL Database can play an unimaginable high efficiency and high performance in specific scenarios .
Because with web2.0 The rise of websites , Massive data requires high storage capacity of relational database , Single machine cannot meet the demand , Cluster is often used to solve problems , Relational database is not enough .
Therefore, non relational database was born , In fact, non relational databases are for specific scenarios , Function specific database products for high performance and easy to use , such as Google Of BigTable And Amazon Of Dynamo.
SQL Relationship with database
At this time, many people will sql Confused with the concept of database , Many new people think Sql It's a database for storing data , Others think that sql It's a kind of database .
In order to facilitate our understanding , Let me give you an analogy ：
If the data are tables , We can put them in different folders according to different table relationships , This folder is the basic component of the database —— data sheet .
And when our folders are very complex , We can store the folder in the file cabinet according to the different composition , There may be a lot of categories in each cabinet to hold different folders , This file cabinet is equivalent to
And when we want to find a document from the filing cabinet , We need to follow certain rules , for instance “ The contract documents are on the shelves on the third floor and the fourth row ”, The implementation of this search rule requires
Database management system （DBMS） To achieve , Equivalent to a file manager , Help us manage the data in the database .
The most common database management systems include SqlServer,MySql,Oracle etc .
And if we want to give orders to the files , You need a language of communication , The language of communication is SQL, therefore SQL Is a structured query language , It is used to operate the database management system .
The relationship between them can be expressed in this way ：
In this way, we should not put any more SQL As a database, right ?