Pretest all embedded SQL
Before
embedding SQL in an application program, you should test it using
SPUFI or QMF. This reduces the amount of program testing by ensuring
that all SQL code is syntactically correct and efficient before it is
placed in an application program.
Never use SELECT *
Explicitly
specify all column names in any SQL statement.
As
a general rule, a query should never ask DB2 for anything more than
is required to satisfy the desired task. For maximum flexibility and
efficiency, each query should access only the columns needed for the
function that will be performed.
Singleton SELECT versus A Cursor
To
return a single row, an application program can use a cursor or a
singleton SELECT. A cursor requires an OPEN, FETCH, and CLOSE to
retrieve one row, whereas a singleton SELECT requires only SELECT...
INTO. Usually, the singleton SELECT outperforms the cursor. Write the
SELECT to only return one row, or allow for the -811 SQL Return Code.
When
the selected row must be updated after it is retrieved, however,
using a cursor with the FOR UPDATE OF clause is recommended over a
singleton SELECT. The FOR UPDATE OF clause ensures the integrity of
the data in the row because it causes DB2 to hold an exclusive lock
on the page containing the row to be updated. The singleton select
provides no such locking.
Use FOR FETCH ONLY
When
a SELECT statement will be used only for retrieval, use the FOR FETCH
ONLY clause.
Avoid using DISTINCT
The
DISTINCT verb removes duplicate rows from an answer set. If
duplicates will not cause a problem, do not code distinct, because it
adds overhead by invoking a sort to remove the duplicates.
If
you need to use DISTINCT to get the correct answer, first make sure
that you don’t have an error in your keys.
If
your still need to use a DISTINCT, consider using a GROUP BY and
HAVING clauses.
Limit the data selected
Return
the minimum number of columns and rows needed by your application
program by making efficient use of the WHERE (SQL predicate) clause.
It
is almost always more efficient to allow DB2 to use the WHERE clause
to limit the data returned. This is because the big expense in DB2 is
actually moving the data off the DASD. Anything you can do to make
DB2 retrieve as little data as possible makes it faster.
Code predicates on indexed columns
DB2
usually performs more efficiently when it can satisfy a request using
an existing index rather than no index. Design all SQL statements to
take advantage of indexes.
Multicolumn indexes
If
a table has only multicolumn indexes, try to specify the high-level
column in the WHERE clause of your query. This results in an index
scan with at least one matching column.
Keep
in mind that underlying DB2 is VSAM. This means that it only uses
leading indexes. If you have a table with a five column
index and you supply keys for all the columns except the third one,
DB2 will only use the first two of the index for the direct DB2
access.
Order By Use
Use
ORDER BY only when sequence is important: Code the ORDER BY clause
when the sequence of rows being returned is important. Order only
those columns that are absolutely necessary in order to improve
efficiency.
Use equivalent data types
Use
the same data types and lengths when comparing column values to host
variables or literals. This eliminates the need for data conversion
and allows for the use of an index. For example, comparing a column
defined as CHAR(6) to a field which is CHAR(4) or CHAR(7) will cause
data conversion and should be avoided at all costs. The easiest way
to ensure data type consistency is to use the DCLGEN fields whenever
possible.
Use BETWEEN rather than <= and >=
BETWEEN
allows the optimizer to select a more efficient access path.
Use IN instead of LIKE
If
you know that only a certain number of occurrences exist, using IN
with the specific list is more efficient than using LIKE. The
functionality of LIKE can be imitated using a range of values. For
example, if you want to retrieve all employees with a last name
starting with "M," use
BETWEEN
'maaaaaaaaaaaaaaa' and 'mzzzzzzzzzzzzzzz' instead of LIKE 'm%'
Avoid using NOT (except with EXISTS)
Not
should only be used as an alternative to very complex predicates.
Code the most restrictive predicate first
When
you code predicates in a select statement, place the predicate that
will eliminate the greatest number of rows first.
Do not use arithmetic expressions in a predicate
DB2
will not use an index for a column when the column is in a predicate
that includes arithmetic. Perform calculations before the SQL
statement, then use the result in the query.
This
is only acceptable when using DB2 to do date calculations.
Avoid using the substring function in an SQL predicate.
Likewise
using the substring function in a predicate forces DB2 to ignore
indexes.
Use Date and Time arithmetic with care
DB2
provides the capability to add and subtract DATE, TIME, and TIMESTAMP
columns. Keep in mind the following rules:
=> both operands must be of the same data type (ie, both dates or
both times)
Example:
CURRENT DATE - 1 MONTH or CURRENT DATE + 1 DAY
=> if one operand is a timestamp, the other operand can be a time
or a date, but not a timestamp.
Example:
TIMESTAMP + 1 DAY or TIMESTAMP - 24 HOURS
=> date durations are expressed as DECIMAL(8,0). The valid date
durations are:
DAY,
MONTH, and YEAR and their plural equivalents.
=> time durations are expressed as DECIMAL(6,0). The valid time
durations are:
HOUR,
MINUTE, SECOND, and MICROSECOND and their plural equivalents.
Specify the number of rows to be returned
When
you are coding a cursor to fetch a predictable number of rows,
consider specifying the number of rows to be retrieved in the
OPTIMIZE FOR n ROWS clause of the CURSOR. This gives DB2 the
opportunity to select the optimal access path for the statement based
on actual use.
This
is also handy for using a SELECT instead of a CURSOR to retrieve
data. If you know you only need the first row of a CURSOR, write a
SELECT to retrieve only 1 row. See 2.3.
Try to sort only on indexed columns
When
using ORDER BY, GROUP BY, DISTINCT, and UNION, it is best to use only
indexed columns.
Joins instead of sub-queries
Joins
will give the DB2 optimizer more options for data access than a
sub-query.
The
exception is when retrieving the most current row when TIMESTAMP is
used for history. Then using a correlated sub-query for the MAX
TIMESTAMP value is the best choice.
Minimize number of tables used in a join
As
a general rule of thumb, avoid using more than 3 tables in a join.
Remember
to have the first table in the query limit the data as much as
possible. DB2 only joins two tables at a time. So if you have a three
table join, it joins the first two, creating a temporary table with
the result. The temporary table is then joined to the remaining
table. The problem is that temporary tables don’t have indexes and
always does a tablespace scan
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.