|
|
Statspack
David Clement
September 2004
Oracle versions 7-8
Statspack resembles Bstat and Estat in reporting system-wide performance
for a period of time. Unlike Bstat and Estat, Statspack can be run from
within SQL*Plus. It has its own user, perfstat, with default
password perfstat, and its own tables and views in the Tools
tablespace. The report that it generates is even longer and more complex
than the one that Estat generates, but it is much better organized. The most
informative data is on the first page, and the rest of the report provides a
finely detailed drill-down.
To install Statspack, connect as the system DBA and run the scripts that
drop and create the Statspack objects. For example, in a Unix or Linux
environment you could do this.
sqlplus "/ as sysdba" @$ORACLE_HOME/rdbms/admin/spdrop.sql
sqlplus "/ as sysdba" @$ORACLE_HOME/rdbms/admin/spcreate.sql
The second script is interactive; you will have to provide the perfstat
user's password and default and temporary tablespaces.
To start the Statspack performance analysis, issue the following commands
from the SQL*Plus prompt.
connect perfstat/perfstat
execute statspack.snap;
This reads the system performance metrics and stores them in tables for
later use. At the end of the sampling interval, whether it be 10 minutes
or an hour, run the Statspack snapshot procedure again, just as before.
Statspack does its analysis by comparing snapshots, so it requires at
least two of them.
You can run the procedure as often as you like. Statspack gurus (such as
Oracle's Tom Kyte) recommend running the snapshot procedure at fairly short
intervals of 10 to 15 minutes. You can list the snapshot IDs and times with
the following query.
select snap_id, snap_time from stats$snapshot;
If the Statspack tables grow large, they can be purged occasionally with
another script, sppurge.sql.
Once you have snapshots, you can generate Statspack reports with the script
spreport.sql. This is another interactive script, which expects snapshot
IDs so that it can tell which snapshots to compare.
The first page of the report is a summary page (see below for a sample
page). It has five sections, consisting of a header, the Cache Sizes, the
Load Profile, the Instance Efficiency Percentages, and the Top 5 Wait
Events. The first two sections are mainly for reference, so that the
analyst can tell on what server and for what time this report was run, and
how the SGA of the instance was configured.
The Load Profile shows what the system is doing. These numbers are more
meaningful in relation to the application design and to each other than in
isolation. For example, if the application design involves extensive
calculations and comparisons, the number of physical reads (which represent
hard disk i/o) should probably be low in comparison to the number of
logical reads. In such a design, there is little need for extensive reads
from disk. A data warehouse, on the other hand, which would typically
retrieve large amounts of data for ad hoc queries, would show a far higher
proportion of physical to logical reads.
The number of transactions per second should be high if the application is
a classic OLTP application, involving many quick updates of individual
records, but low if the application is processing substantial amounts of
data in (for instance) batch jobs.
One of the most revealing sets of numbers is the parsing percentages.
Applications submit predictable SQL statements; even large applications
have only a few thousand separate SQL statements. So a well-tuned
application should show little hard parsing. Hard parsing comes either from
unpredictable SQL -- that is, the users are making up queries on the fly,
as in a data warehouse -- or from a flaw in the application that is forcing
previously parsed statements to be parsed again.
A useful way of reading the Load Profile section is to compare it across
Statspack reports. This is a quick, reliable way of checking how stable the
Oracle instance is.
After the Load Profile comes the section on Instance Efficiency
Percentages. These numbers should be close to 100 percent. Figures lower
than about 95 to 97 percent are cause for concern, although naturally the
application architecture and performance determine what is unacceptable.
The shared pool statistics, under Instance Efficiency Percentages, describe
how SQL statements, once parsed, are maintained in the SGA. The first line
shows how much of the shared pool is in use, and the next two lines show
how much is used by SQL statements that are executed more than once during
the time the report covers. In general, the higher the reuse of SQL
statements, the more efficient the application. However, this is
completely dependent on how long the statements take to execute versus how
long the Statspack report covers.
The last of the major sections is the Top 5 Wait Events. Some Statspack
gurus argue that this is the most important section in the summary page. It
definitely reveals a lot about where the instance is spending its time.
This list excludes wait events that might show mere idling (as when the
instance is waiting for another server to transmit data). The wait events
are listed in descending order of the percentage of total wait time that
they consume. Some common wait events and their causes are listed in the
table after the sample summary page.
Statspack Report: Sample Summary Page
STATSPACK report for
DB Name DB Id Instance Inst Num Release OPS Host
------------ ----------- ------------ -------- ----------- --- ------------
TEST 3397982562 TEST 1 8.1.7.4.0 NO hpux5
Snap Id Snap Time Sessions
------- ------------------ --------
Begin Snap: 62 03-Jun-03 15:15:02 234
End Snap: 63 03-Jun-03 16:15:02 234
Elapsed: 60.00 (mins)
Cache Sizes
~~~~~~~~~~~
db_block_buffers 76800 log_buffer: 2097152
db_block_size 8192 shared_pool_size: 200000000
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 330,129.18 1,852.18
Logical reads: 39,292.95 220.45
Block changes: 2,080.05 11.67
Physical reads: 1,788.78 10.04
Physical writes: 133.85 0.75
User calls: 1,900.53 10.66
Parses: 304.20 1.71
Hard parses: 7.02 0.04
Sorts: 594.51 3.34
Logons: 0.06 0.00
Executes: 1,753.15 9.84
Transactions: 178.24
% Blocks changed per Read: 5.29 Recursive Call %: 59.25
Rollback per transaction %: 0.29 Rows per Sort: 13.14
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.97 Redo NoWait %: 100.00
Buffer Hit %: 95.45 In-memory Sort %: 100.00
Library Hit %: 99.15 Soft Parse %: 97.69
Execute to Parse %: 82.65 Latch Hit %: 99.94
Parse CPU to Parse Elapsd %: 71.30 % Non-Parse CPU: 96.91
Shared Pool Statistics Begin End
----- -----
Memory Usage %: 67.35 71.78
% SQL with executions>1: 41.03 39.86
% Memory for SQL w/exec>1: 40.43 39.58
Top 5 Wait Events
~~~~~~~~~~~~~~~~~
Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
db file sequential read 1,537,677 371,787 35.98
log file sync 265,557 270,484 26.18
log file parallel write 285,344 180,055 17.43
db file scattered read 930,129 116,226 11.25
latch free 98,534 61,813 5.98
-------------------------------------------------------------
Statspack Report: Some Wait Events
|
Name
|
Description
|
buffer busy wait
|
A server process is trying to access a buffer in the database buffer cache
that another server process is holding. For example, while a process is
writing to a buffer, other processes have to wait to read that buffer.
There are several causes of this event but it indicates that the system is
i/o-bound in some way.
|
db file parallel write
|
The DBWR process has waited on multiple simultaneous i/o requests (dirty
block buffer cache writes in this case).
|
db file scattered read
|
A server process is accessing data in multiple non-contiguous blocks in the
database buffer cache, and another process had to wait for this to finish.
Typically this results from a full table scan, or from a full index scan.
If this is the most frequent wait event, the system is probably healthy,
unless the database design is such that there should be few or no full
table scans, as in a classic OLTP design.
|
db file sequential read
|
A server process is accessing data in contiguous blocks in the database
buffer cache, and another process had to wait for this to finish. Typically
this results from contention for access to index segments. Again, if this
is a common wait event, it is probably not a cause for concern unless the
database was designed not to do this, as in a data warehouse.
|
direct path read (write)
|
A process waited for completion of a data read or write from a database
file directly into another process's PGA (process global area) bypassing
the SGA (system global area). Often this implies time-consuming sorts on
disk, because the SGA does not manage sort segments.
|
enqueue
|
A process is waiting for a row-level lock to be released so that it can
lock the same row (for an update, for example). This lock should be visible
in v$lock.
|
latch free
|
A process is waiting to acquire one of Oracle's many memory-access latches.
Because hard parsing and cursor allocation both use many latches,
unsharable SQL statements and poor cursor (closing too much, not closing
enough) are common causes of waits on latches.
|
library cache load lock
|
More than one process has tried to load an object into the library cache at
the same time.
|
library cache pin
|
A process is waiting to be able to pin an object in the library cache,
normally because it wants to parse or compile that object. The object might
be a view or a stored procedure.
|
log buffer space
|
The redo log buffer has filled up before LGWR has been able to clear it.
Typically, this means the application is generating more redo than the
system can handle as tuned.
|
log file parallel write
|
LGWR itself waited for redo blocks to be written to the whole redo group.
|
log file sequential read
|
A process waited for ARCH to finish archiving the redo logs.
|
log file sync
|
A process waited for LGWR to finish its writes to the redo logs: this
happens during a commit and is one result of overuse of the commit
statement.
|
SQL*Net more data from client
|
A process had to wait for multiple data packets to be reassembled;
this commonly means that long SQL statements have been submitted
in-line.
|
|
|