How is SQL Parsed?
David Clement
September 2004
Oracle versions 7-8
SQL parsing is a sequence of six steps in two phases called "soft
parsing" and "hard parsing." Soft parsing consists of the first three
steps of the process, and hard parsing of the last three.
The first step in the soft parse is a syntax check of the SQL
statement that has been submitted to the parser. This step
determines if SQL keywords are identifiable, if a legal name follows
the keyword "from," and so forth and so on.
The second step is a semantics check to determine if the tables,
columns, stored code, and other entities referenced exist in the
environment and if they are related to each other as the SQL
statement requires, that is, if the columns unambiguously belong to
the tables and so forth. At the same time, the parser checks
permissions to determine if the user can access the data structures
referenced. This design choice is basically why Oracle returns the
ORA-00942 error, "Table does not exist," both for a semantics
violation and for a permissions violation.
Now comes the outline check. At this point in the process, it is
evident that this particular SQL statement could have been processed
before, and so the parser now tries to find the statement in the SQL
cache (also called the library cache) in the shared pool of the
System Global Area (SGA). This cache contains recently processed SQL
statements. The parser does not actually compare SQL statements byte
by byte. Instead, it generates a hash value from the input literal
and compares this to the hash values stored in the SQL cache. The
hashing algorithm is affected by capitalization and even by white
space, so the result of the comparison is almost the same as the
result of a byte-by-byte or character-by-character comparison of
string literals. If the hash value is matched, then the cursor and
other memory structures previously allocated for the statement can
be reused, if they still exist.
If the SQL statement could not be found in the cache or the memory
structures have been lost (for example, because the cursor was
overwritten by another cursor that was more urgently needed) then
three more steps are required. Performing these three steps is what
distinguishes the "hard parse" from the "soft parse."
Step four of the parsing process is generating execution plans for
the SQL statement. The memory structures used for this purpose
cannot be allowed to be corrupted and are therefore latched, that
is, access to them is controlled by other memory structures called
"latches." The execution plan generator must check for the latches
and wait until they are released and acquire them before it can
actually run.
Two Oracle initialization parameters,
optimizer_search_limit and
optimizer_max_permutations, limit the number of
execution plans that will be generated. Their default values (5! and
80000) represent a compromise between the need to find the best
execution plan and the need to finish generating execution plans in
a reasonable amount of time.
Next, step five, the optimizer can assign a cost to each step in
each execution plan. One of the optimizer's primary sources of
information is database statistics such as the number of rows in a
table, the table's high-water mark, the height of the indexes on
that table, and so forth. The other primary source of information is
operation ranks, assigned by Oracle engineers to the various
row-source operations in the knowledge that certain data access
methods and database kernel subroutines take longer than others.
From these two kinds of values, the optimizer calculates low costs
for simple operations on small tables, and high costs for complex
operations on large tables (unless something misleads it). Once this
calculation is done, the optimizer simply chooses the cheapest
execution plan from those available.
The final step is to generate a binary representation of the chosen
execution plan and to cache it for fast access.
The soft parse consumes much less time than the hard parse. Also,
since hard parsing requires memory latching, it scales worse than
soft parsing (since anything serialized scales worse than an
equivalent concurrent operation). For these reasons, it is important
to use coding techniques that reduce hard parsing to a
minimum.
|