Online troubleshooting , The following scenarios , Have you ever met ?


One , Understand the number of machine connections

problem : Of sshd The listening port of is 22, How to make statistics Of sshd Service various connection states (TIME_WAIT/ CLOSE_WAIT/
ESTABLISHED) Number of connections for .


Common methods :

netstat -n | grep | awk '/^tcp/ {++S[$NF]} END {for(a in S) print
a, S[a]}’

netstat -lnpta | grep ssh | egrep “TIME_WAIT | CLOSE_WAIT | ESTABLISHED”

n [ Alibaba cloud only ]


explain :netstat It is a common tool for tracing network connection problems , and grep/awk Combination is more a artifact , Of course, if it's on alicloud , There are more convenient ways .


Two , Query data from logs that have been backed up

problem : From backed up suyun.2019-06-26.log.bz2 In the log , Find the keywords How many logs are there in .


Common methods :

bzcat suyun.2019-06-26.log.bz2 | grep '' | wc -l

bzgrep '' suyun.2019-06-26.log.bz2 | wc -l

less suyun.2019-06-26.log.bz2 | grep '' | wc -l


explain : Online log files are generally bz2 Keep after compression , If you unzip the query , Very space and time consuming ,bzcat and bzgrep It's a tool for students to master .


Three , Backup service tips

problem : Pack and go backup /opt/web/suyun_web catalog , Exclude from catalog logs And contents , Packed files are stored in /opt/backup Directory .


Common methods :

tar -zcvf /opt/backup/shenjian.tar.gz \

    -exclude /opt/web/suyun_web/logs \



explain : This command is frequently used online , When a project needs to be packaged for migration , It is often necessary to exclude the log directory ,exclude It's a parameter to master .


Four , Number of query threads

problem : Query the number of busses the server runs , When the number of machine threads exceeds the alarm threshold , Can quickly find out related process and thread information .


Reference answer :

ps -eLf | wc -l

pstree -p | wc -l


Five , Disk alarm , Empty max file

problem : Find the server , A running tomcat A large number of exception logs generated , Find the file , And free up space . Let's assume that the file contains log keyword , And greater than 1G.


Common methods :

Step 1 , Find the file

find / -type f -name "*log*" | xargs ls -lSh | more 

du -a / | sort -rn | grep log | more

find / -name '*log*' -size +1000M -exec du -h {} \;


Step 2 , Empty file

Assume the file found is a.log

The right way to do this is to :

echo "">a.log

File space will be freed immediately .

Many students will use :

rm -rf a.log

In this way, although the file is deleted , But because tomcat Service is still running , Space will not be released immediately , Restart required tomcat To free up space .


Six , Show files , Filter comments

problem : display server.conf file , Shield off # Comment line beginning with


Common methods :

sed -n '/^[#]/!p' server.conf

sed -e '/^#/d' server.conf

grep -v "^#" server.conf


Seven , disk IO Troubleshooting

problem : disk IO How to troubleshoot exceptions , Similar to slow write or high current usage , Please find out the cause of the disk IO Abnormally high process ID.


Common methods :

Step 1 :

iotop -o

View all processes currently writing to disk ID information .

Step 2 : If the writing indexes are very low at this time , Basically no large write operations , Check the disk itself . Can view system



cat /var/log/message

See if there is any disk error , At the same time, it can be written to the slow disk touch Look at an empty file , Whether the disk fails to write .

I hope it can help the students who often operate online , If there is better practice , Welcome to share .

