“ Does data analysis need to know database knowledge ?”

Old man Li often hears such questions recently , Many new people feel that if they only do business , No technical line , Is it possible not to study the database ? Is there no need to prepare for the interview ?

After all, many newcomers have just entered the company , As soon as you see the complex linking of various tables in the database , I started to have a headache , Even more so SQL I don't know the difference from the database , Not to mention the concepts of data warehouse .


however , To the disappointment of many people , Database knowledge is my most frequently asked question when interviewing new people , None of them .


Does data analysis need to understand databases ?

Knowledge of database , It depends on which level of data analysis you are positioning , I've seen some data analysts who don't understand databases , It's all based on Excel File source for data analysis , They are better at business .

however , They usually have to rely on IT Department support , need IT The Department provides some basic analysis data . With an understanding of the business , Understanding business data can also do a good job in data analysis .


There is also a class of data analysts , From the database itself , business intelligence BI The role of a pure data analyst , Through business understanding and data processing skills and knowledge, you can also do well in the field of analysis .

One of the advantages of them is that they don't need to rely too much IT department , Give them a certain amount of authority, they can directly face the unified data source to do data analysis , Sometimes one SQL It is a part of data analysis .


therefore , As a data analyst , I don't think it's important to have more skills , One more ability to obtain different data channels , Nature is good .

What is a database ?

First, before you understand the database , We need to understand how data is stored ?

We all know that , When our ancestors were still drinking in the wilderness , Learn to use the rope to remember things to store data , These are the knotted ropes “ data ”, Although this kind of data is difficult to keep , It's hard to extract .


Later, the ancestors used oracle bones , bamboo slip , Paper to store text data , The recorder was invented in modern times , The camera stores audio data , Although the data carrier has been changing , But the way of data storage has not changed a lot , All belong to the traditional storage mode .

Until the advent of the information age , The way of data storage has changed greatly and is developing in two directions : Files and databases .

1, File is equivalent to storing data in Excel among , Form read-write files and store them , And then through the python And other tools to filter the file data , handle , extract ;

2, Database is to store data in computer according to its structure , Form a data set with a large amount of data , It is equivalent to a file cabinet for storing documents .


Using database to store data is the most popular way at present , Because the database has persistent storage , Reading and writing speed is also very high , More importantly, the database can greatly ensure the validity of the data , It's not like it is Excel It is easy to make modification errors .

Database classification ?

According to the early database theory , There are three popular database models , They are hierarchical databases , Mesh database and relational database . And in today's Internet companies , There are two most commonly used database models , Namely
Relational database and non relational database .

Relational database model is to reduce the complex data structure to simple binary relationship ( That is, two-dimensional table form ). In a relational database , Almost all operations on data are based on one or more relational tables , Classification by these associated tables , merge , Connect or select operations to achieve data management .


The popular understanding of relational type is , Data exists as a two-dimensional array , You can think of it as the arrangement of books in the library .

bookshelf , Floor you can understand it as a relational data structure , Books exist as data , And all librarians are the process of the database , For different jobs . Someone put out the fire ( Data recovery , backups ), Someone is tidying up the bookshelves , book ( Data arrangement , file ), The user process refers to the customers who come to the library , They read books , Mobile books , And the administrator will face the maintenance .


Relational database has existed since its birth 40 For years , From theory generation to development to product realization , for example : common MySQL and Oracle
,oracle In the database field rose to a dominant position , Forming a huge industrial market of tens of billions of US dollars every year , and MySQL It is also a database that can not be ignored , So much so that Oracle It's a big acquisition .


Non relational databases are also known as NoSQL database , It was meant to be “Not Only SQL”
, As an effective supplement to the traditional database .NoSQL Database can play an unimaginable high efficiency and high performance in specific scenarios .

Because with web2.0 The rise of websites , Massive data requires high storage capacity of relational database , Single machine cannot meet the demand , Cluster is often used to solve problems , Relational database is not enough .


Therefore, non relational database was born , In fact, non relational databases are for specific scenarios , Function specific database products for high performance and easy to use , such as Google Of BigTable And Amazon Of Dynamo.

SQL Relationship with database

At this time, many people will sql Confused with the concept of database , Many new people think Sql It's a database for storing data , Others think that sql It's a kind of database .

In order to facilitate our understanding , Let me give you an analogy :

If the data are tables , We can put them in different folders according to different table relationships , This folder is the basic component of the database —— data sheet .


And when our folders are very complex , We can store the folder in the file cabinet according to the different composition , There may be a lot of categories in each cabinet to hold different folders , This file cabinet is equivalent to
database .


And when we want to find a document from the filing cabinet , We need to follow certain rules , for instance “ The contract documents are on the shelves on the third floor and the fourth row ”, The implementation of this search rule requires
Database management system (DBMS) To achieve , Equivalent to a file manager , Help us manage the data in the database .

The most common database management systems include SqlServer,MySql,Oracle etc .


And if we want to give orders to the files , You need a language of communication , The language of communication is SQL, therefore SQL Is a structured query language , It is used to operate the database management system .

The relationship between them can be expressed in this way :


In this way, we should not put any more SQL As a database, right ?

