- Making your servers a bit more "green" with smart system adminstration, part 3: the right tool for the right job
- Nagios checks for failed RAID disks
- Making your servers a bit more "green" with smart system adminstration (part 1)
- Minimizing Apache httpd downtime with logrotate
- Making your servers a bit more "green" with smart system adminstration, part 4: MySQL databases
Making your servers a bit more "green" with smart system adminstration, part 2: tools
August 12, 2009,
"Green IT" can only work if you have a combined approach of optimization on all levels. System administration can play an important role. What you need is a good set of tools.
The less time your programs run, the more your servers can idle, or the more programs you can run on your current servers (making extra servers less of a necessity). Finding performance bottlenecks in your programs can be made a lot easier, as long as you know what tools to run. We want to show you a few of our favourite Linux tools that we have used in the past.
The top tool will give you a good indication of system resources are used and what the load is. When using top it is good to know what you are looking at, especially if a machine has a high load. A high load does not always mean that the CPU is overloaded, it just indicates the amount of processes that are waiting.
Much more important is the combination of load and other resources, such as memory, diskspace and CPU usage. If there is a high load, but the CPU appears to be mostly idle, you know there is a bottleneck somewhere and it's not the CPU.
By default top sorts processes by CPU usage, but you can easily instruct it to sort by for example memory usage.
pstree is a neat little tool that displays the processes running on a system as a tree, with init as the root of the tree. This way you can quickly see how many subprocesses were started by some process.
A process can open a lot of files during its lifetime. A problem can occur if a program does not close files it does not need anymore. With lsof ('list open files') you can list which files a process has opened. By sorting them by size you can also easily spot the biggest open files on a system:
- lsof | grep REG | sort -k 7 -n -r | less
Every process makes system calls during its lifetime. In fact, most processes make a lot of system calls, like opening files, opening sockets, reading files, and so on. With strace you can capture these. The output for strace is very verbose, but contains a wealth of information. If you see that a process searches in one location for files where you know in advance it will never find them, you could for example fix the search order for directories. Another example is if an application issues a lot of DNS requests to look up names and/or IP addresses. If the system is using an external DNS server it might be a good idea to use a local caching DNS server, avoiding network traffic.
With netstat you can easily get a list of open Internet connections and Unix sockets for a process. If a process has a lot of unexpected open connections it is worth looking into, because it signals that something could be wrong (either inefficiency, or an attack, or something else). With the -p option you can relate open sockets directly to a process:
- netstat -pan | less
The tools described here are a good start if you want to look into performance problems on your system and can reveal problems you might not have expected. If needed there are more advanced tools, such as Valgrind and systemtap, which are really powerful and will give you a lot more information than the tools described here.