“ Does data analysis need to know database knowledge ?”

Old man Li often hears such questions recently , Many new people feel that if they only do business , No technical line , Is it possible not to study the database ? Is there no need to prepare for the interview ?

After all, many newcomers have just entered the company , As soon as you see the complex linking of various tables in the database , I started to have a headache , Even more so SQL I don't know the difference from the database , Not to mention the concepts of data warehouse .


however , To the disappointment of many people , Database knowledge is my most frequently asked question when interviewing new people , None of them .


Does data analysis need to understand databases ?

Knowledge of database , It depends on which level of data analysis you are positioning , I've seen some data analysts who don't understand databases , It's all based on Excel File source for data analysis , They are better at business .

however , They usually have to rely on IT Department support , need IT The Department provides some basic analysis data . With an understanding of the business , Understanding business data can also do a good job in data analysis .


There is also a class of data analysts , From the database itself , business intelligence BI The role of a pure data analyst , Through business understanding and data processing skills and knowledge, you can also do well in the field of analysis .

One of the advantages of them is that they don't need to rely too much IT department , Give them a certain amount of authority, they can directly face the unified data source to do data analysis , Sometimes one SQL It is a part of data analysis .


therefore , As a data analyst , I don't think it's important to have more skills , One more ability to obtain different data channels , Nature is good .

What is a database ?

First, before you understand the database , We need to understand how data is stored ?

We all know that , When our ancestors were still drinking in the wilderness , Learn to use the rope to remember things to store data , These are the knotted ropes “ data ”, Although this kind of data is difficult to keep , It's hard to extract .


Later, the ancestors used oracle bones , bamboo slip , Paper to store text data , The recorder was invented in modern times , The camera stores audio data , Although the data carrier has been changing , But the way of data storage has not changed a lot , All belong to the traditional storage mode .

Until the advent of the information age , The way of data storage has changed greatly and is developing in two directions : Files and databases .

1, File is equivalent to storing data in Excel among , Form read-write files and store them , And then through the python And other tools to filter the file data , handle , extract ;

2, Database is to store data in computer according to its structure , Form a data set with a large amount of data , It is equivalent to a file cabinet for storing documents .


Using database to store data is the most popular way at present , Because the database has persistent storage , Reading and writing speed is also very high , More importantly, the database can greatly ensure the validity of the data , It's not like it is Excel It is easy to make modification errors .

Database classification ?

According to the early database theory , There are three popular database models , They are hierarchical databases , Mesh database and relational database . And in today's Internet companies , There are two most commonly used database models , Namely
Relational database and non relational database .

Relational database model is to reduce the complex data structure to simple binary relationship ( That is, two-dimensional table form ). In a relational database , Almost all operations on data are based on one or more relational tables , Classification by these associated tables , merge , Connect or select operations to achieve data management .


The popular understanding of relational type is , Data exists as a two-dimensional array , You can think of it as the arrangement of books in the library .

bookshelf , Floor you can understand it as a relational data structure , Books exist as data , And all librarians are the process of the database , For different jobs . Someone put out the fire ( Data recovery , backups ), Someone is tidying up the bookshelves , book ( Data arrangement , file ), The user process refers to the customers who come to the library , They read books , Mobile books , And the administrator will face the maintenance .


Relational database has existed since its birth 40 For years , From theory generation to development to product realization , for example : common MySQL and Oracle
,oracle In the database field rose to a dominant position , Forming a huge industrial market of tens of billions of US dollars every year , and MySQL It is also a database that can not be ignored , So much so that Oracle It's a big acquisition .


Non relational databases are also known as NoSQL database , It was meant to be “Not Only SQL”
, As an effective supplement to the traditional database .NoSQL Database can play an unimaginable high efficiency and high performance in specific scenarios .

Because with web2.0 The rise of websites , Massive data requires high storage capacity of relational database , Single machine cannot meet the demand , Cluster is often used to solve problems , Relational database is not enough .


Therefore, non relational database was born , In fact, non relational databases are for specific scenarios , Function specific database products for high performance and easy to use , such as Google Of BigTable And Amazon Of Dynamo.

SQL Relationship with database

At this time, many people will sql Confused with the concept of database , Many new people think Sql It's a database for storing data , Others think that sql It's a kind of database .

In order to facilitate our understanding , Let me give you an analogy :

If the data are tables , We can put them in different folders according to different table relationships , This folder is the basic component of the database —— data sheet .


And when our folders are very complex , We can store the folder in the file cabinet according to the different composition , There may be a lot of categories in each cabinet to hold different folders , This file cabinet is equivalent to
database .


And when we want to find a document from the filing cabinet , We need to follow certain rules , for instance “ The contract documents are on the shelves on the third floor and the fourth row ”, The implementation of this search rule requires
Database management system (DBMS) To achieve , Equivalent to a file manager , Help us manage the data in the database .

The most common database management systems include SqlServer,MySql,Oracle etc .


And if we want to give orders to the files , You need a language of communication , The language of communication is SQL, therefore SQL Is a structured query language , It is used to operate the database management system .

The relationship between them can be expressed in this way :


In this way, we should not put any more SQL As a database, right ?

©2019-2020 Toolsou All rights reserved,
Python Garbage collection and memory leak hive Summary of processing methods for a large number of small files The difference between memory overflow and memory leak , Causes and Solutions Create data mysql Library process You don't know ——HarmonyOS stay Vue Use in Web WorkerSparkSQL Achieve partition overlay write msf Generate Trojan horse attack android mobile phone Linux Page replacement algorithm C Language implementation Django Personal blog building tutorial --- Time classified archiving