Just like an automobile, a computer system needs tuning and optimization in order to give the best performance given a certain system configuration and workload. Fortunately, Sun includes a number of utilities that can be used to monitor and change system settings to get optimum results from your installation. Here, I will focus on Solaris 10, although the methods described are applicable to earlier Solaris versions and in some cases other UNIX-based operating systems.
Some factors that can affect a computer system's performance are available memory, disk I/O, and CPU utilization. Solaris provides the vmstat and iostat utilities (among others) to monitor these system aspects.
vmstat is short for "virtual memory statistics". It can provide information on kernel threads, physical and virtual memory, disk I/O, and CPU utilization. When run without any command line options, it will return data that represents and average of system activity since the last boot. I find it more useful to run vmstat 1; this will give the average-since-last-boot on the first line, and then sample data once a second, stopping when CTRL-C is pushed. If such fine-grained detail is not needed, running vmstat 5 would give a system average and then one activity snapshot every five seconds. Here is example output from vmstat 1 running for four seconds:
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s0 s1 s6 in sy cs us sy id 0 0 0 1922440 1812392 36 232 115 5 5 0 0 0 0 0 0 315 310 198 10 3 87 0 1 0 1525552 1790400 0 185 1426 0 0 0 0 0 0 0 0 1396 1279 1099 0 1 99 0 0 0 1595216 1806368 8 173 1204 0 0 0 0 0 0 0 0 1282 1304 862 2 2 96 0 0 0 1595456 1806672 0 5 0 0 0 0 0 0 0 0 0 818 2150 897 1 1 98
From left to right, the fields show kernel thread activity (in the run queue, blocked for resources, and waiting), free virtual and real memory, page faults (transfers from disk to memory), disk activity, system trap/call/interrupts, and CPU utilization (in user, system, and idle time).
Diagnosing your system's troublespots
The first section to be concerned about is free memory. When a system runs low on physical memory, it will page (sometimes also called "swap") data from that RAM into its virtual memory on disk storage. Idle non-active processes and their address space in physical memory can also be paged out until they're needed again. When possible, always have adequate physical RAM in a system to avoid going into "swap." Using excessive amounts of paging space in lieu of actual memory can greatly degrade system performance because physical storage can be orders of magnitude slower than system RAM. If a system runs out of free physical memory and paging space, it can cease to function properly or lock up.
An old guideline for allocating virtual memory was "make it double the amount of physical memory," but as the cost of RAM has dropped, the amount of memory installed in systems today has increased. In fact, it has increased so much that in some cases using this guideline results in too much wasted paging space allocated. Consider a system's intended workload when allocating virtual memory. Software vendors will usually have minimum system requirements that should be met for their products. For general purpose systems (those with 4G or less of memory) I usually allocate 2G of swap.
The next section is page faults. A page fault happens when a system tries to access data in memory that has been paged out to disk. When this happens an interrupt is generated and the data must be retrieved from moving storage. Pay close attention to the "pi" (kilobytes paged in) and "po" (kilobytes paged out) indicators. Continual large numbers can be a symptom of inadequate physical memory for the system's workload.
After page faults comes disk activity, which is measured in disk operations (input or output) per second. Up to four disks will have their activity shown.
Following disk activity is CPU usage, shown as us (user time), sy (system time), and id (idle time). If you have multiple processors, the numbers shown are an average across all of the CPUs. High CPU usage is not a bad thing (it just means the processors are being utilized to their fullest), but high CPU usage combined with heavy disk activity and heavy swapping can indicate a problem with hardware configuration. Low CPU usage and high disk activity can mean that a system is possibly spending too much time waiting on disk activity. Heavy swapping with normal disk and CPU activity can be an indication that a system does not have enough physical RAM. All of these evaluations are very situation and application specific; a system that is perfectly configured for one workload can be severely underpowered to handle another.
Historical and real-time monitoring tools
For a more detailed look into disk activity, Solaris provides the iostat command. Similar to vmstat, it can be used to give a running log of disk usage. Here's sample output from running iostat 1 for four seconds (as with vmstat, the first line of output is an average of activity since the last system boot):
tty sd0 sd1 sd6 nfs1 cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 1 51 57 5 30 151 16 18 0 0 0 0 0 0 10 3 2 85 0 635 0 0 0 64 8 21 0 0 0 0 0 0 0 0 0 100 0 331 0 0 0 61 8 18 0 0 0 0 0 0 2 0 0 97 0 329 0 0 0 109 13 13 0 0 0 0 0 0 13 1 0 86
I/O statistics are given for terminal and disk activity in terms of kilobytes per second, transactions per second, and the average service time for each request in milliseconds. CPU activity is shown as user, system, wait, and idle time (similar to the output of vmstat).
Again, it's very application specific, but a good way to tune a system for performance is to make sure that its physical memory, virtual memory, disk (capacity and I/O), and CPU utilization are all evenly balanced, making sure that no single factor is loaded more than the rest. It is not good to have CPU sitting idle waiting on disk I/O, disk waiting on an overloaded CPU, or inadequate physical RAM causing heavy swapping that slows the rest of the system down. In some cases this is not always true; for example, a database server may have heavy disk activity that does not indicate misconfiguration.
For historical (rather than real-time) performance monitoring over a set period, Solaris includes sar, the System Activity Reporter. Data from sar is recorded into a binary file that is not easily readable, but many utilities exist to parse sar data into reports and graphs. The best known of these utilities is probably the commercial product SarCheck.
In most cases, a Solaris system is self-tuning with regards to internal system kernel parameters. However, if a problem is encountered that cannot be solved by adequate physical RAM, disk or CPU, manual kernel tuning may be needed. In these cases, an application vendor may indicate what parameters to change in the /etc/system file, or the values to change for specific features can be found in Sun's Solaris Tunable Parameters Reference Manual.
The next level after system tuning is application tuning. Unfortunately an in-depth look at app tuning is beyond the scope of this short article. There are plenty of online resources available:
Further reading for Solaris system optimization and tuning
The prstat utility, included with Solaris, can provide real-time views of a system's activity and resource consumption with regards to individual processes and threads.
truss is another Solaris utility used to trace system and library calls made by a process.
Solaris 10 introduced DTrace, which lets a system administrator or application developer dynamically trace system aspects and application processes without having to "wrap" an application such as truss around the process. The DTrace How To Guide gives a quick introduction, and the Solaris Dynamic Tracing Guide provides complete DTrace documentation.
A number of books have been written solely about Solaris system optimization and tuning. Here are some of my favorites:
Sun Performance Tuning: Java and the Internet is an older title, but still useful for general system optization concepts and older releases of Solaris.
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture and Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris were originally a single title, but the book ended up being so large that the authors split it up into two publications. The authors also maintain the Solaris Internals and Performance FAQ Wiki that contains how-to guides, best practices documents, scripts, and tutorials on Solaris system optimization.
This short article can only begin to scratch the surface of system tuning and optimization, but hopefully these basic guidelines and pointers to more advanced resources help you solve your system performance issues.