Friday, December 21, 2012

SQL Standards & Guidelines


Pretest all embedded SQL

Before embedding SQL in an application program, you should test it using SPUFI or QMF. This reduces the amount of program testing by ensuring that all SQL code is syntactically correct and efficient before it is placed in an application program.

Never use SELECT *

Explicitly specify all column names in any SQL statement.
As a general rule, a query should never ask DB2 for anything more than is required to satisfy the desired task. For maximum flexibility and efficiency, each query should access only the columns needed for the function that will be performed.

Singleton SELECT versus A Cursor

To return a single row, an application program can use a cursor or a singleton SELECT. A cursor requires an OPEN, FETCH, and CLOSE to retrieve one row, whereas a singleton SELECT requires only SELECT... INTO. Usually, the singleton SELECT outperforms the cursor. Write the SELECT to only return one row, or allow for the -811 SQL Return Code.
When the selected row must be updated after it is retrieved, however, using a cursor with the FOR UPDATE OF clause is recommended over a singleton SELECT. The FOR UPDATE OF clause ensures the integrity of the data in the row because it causes DB2 to hold an exclusive lock on the page containing the row to be updated. The singleton select provides no such locking.

Use FOR FETCH ONLY

When a SELECT statement will be used only for retrieval, use the FOR FETCH ONLY clause.

Avoid using DISTINCT

The DISTINCT verb removes duplicate rows from an answer set. If duplicates will not cause a problem, do not code distinct, because it adds overhead by invoking a sort to remove the duplicates.
If you need to use DISTINCT to get the correct answer, first make sure that you don’t have an error in your keys.
If your still need to use a DISTINCT, consider using a GROUP BY and HAVING clauses.

Limit the data selected

Return the minimum number of columns and rows needed by your application program by making efficient use of the WHERE (SQL predicate) clause.
It is almost always more efficient to allow DB2 to use the WHERE clause to limit the data returned. This is because the big expense in DB2 is actually moving the data off the DASD. Anything you can do to make DB2 retrieve as little data as possible makes it faster.

Code predicates on indexed columns

DB2 usually performs more efficiently when it can satisfy a request using an existing index rather than no index. Design all SQL statements to take advantage of indexes.

Multicolumn indexes

If a table has only multicolumn indexes, try to specify the high-level column in the WHERE clause of your query. This results in an index scan with at least one matching column.
Keep in mind that underlying DB2 is VSAM. This means that it only uses leading indexes. If you have a table with a five column index and you supply keys for all the columns except the third one, DB2 will only use the first two of the index for the direct DB2 access.

Order By Use

Use ORDER BY only when sequence is important: Code the ORDER BY clause when the sequence of rows being returned is important. Order only those columns that are absolutely necessary in order to improve efficiency.

Use equivalent data types

Use the same data types and lengths when comparing column values to host variables or literals. This eliminates the need for data conversion and allows for the use of an index. For example, comparing a column defined as CHAR(6) to a field which is CHAR(4) or CHAR(7) will cause data conversion and should be avoided at all costs. The easiest way to ensure data type consistency is to use the DCLGEN fields whenever possible.

Use BETWEEN rather than <= and >=

BETWEEN allows the optimizer to select a more efficient access path.

Use IN instead of LIKE

If you know that only a certain number of occurrences exist, using IN with the specific list is more efficient than using LIKE. The functionality of LIKE can be imitated using a range of values. For example, if you want to retrieve all employees with a last name starting with "M," use
BETWEEN 'maaaaaaaaaaaaaaa' and 'mzzzzzzzzzzzzzzz' instead of LIKE 'm%'

Avoid using NOT (except with EXISTS)

Not should only be used as an alternative to very complex predicates.

Code the most restrictive predicate first

When you code predicates in a select statement, place the predicate that will eliminate the greatest number of rows first.

Do not use arithmetic expressions in a predicate

DB2 will not use an index for a column when the column is in a predicate that includes arithmetic. Perform calculations before the SQL statement, then use the result in the query.
This is only acceptable when using DB2 to do date calculations.

Avoid using the substring function in an SQL predicate.

Likewise using the substring function in a predicate forces DB2 to ignore indexes.

Use Date and Time arithmetic with care

DB2 provides the capability to add and subtract DATE, TIME, and TIMESTAMP columns. Keep in mind the following rules:
=> both operands must be of the same data type (ie, both dates or both times)
Example: CURRENT DATE - 1 MONTH or CURRENT DATE + 1 DAY
=> if one operand is a timestamp, the other operand can be a time or a date, but not a timestamp.
Example: TIMESTAMP + 1 DAY or TIMESTAMP - 24 HOURS
=> date durations are expressed as DECIMAL(8,0). The valid date durations are:
DAY, MONTH, and YEAR and their plural equivalents.
=> time durations are expressed as DECIMAL(6,0). The valid time durations are:
HOUR, MINUTE, SECOND, and MICROSECOND and their plural equivalents.

Specify the number of rows to be returned

When you are coding a cursor to fetch a predictable number of rows, consider specifying the number of rows to be retrieved in the OPTIMIZE FOR n ROWS clause of the CURSOR. This gives DB2 the opportunity to select the optimal access path for the statement based on actual use.
This is also handy for using a SELECT instead of a CURSOR to retrieve data. If you know you only need the first row of a CURSOR, write a SELECT to retrieve only 1 row. See 2.3.

Try to sort only on indexed columns

When using ORDER BY, GROUP BY, DISTINCT, and UNION, it is best to use only indexed columns.

Joins instead of sub-queries

Joins will give the DB2 optimizer more options for data access than a sub-query.
The exception is when retrieving the most current row when TIMESTAMP is used for history. Then using a correlated sub-query for the MAX TIMESTAMP value is the best choice.

Minimize number of tables used in a join

As a general rule of thumb, avoid using more than 3 tables in a join.
Remember to have the first table in the query limit the data as much as possible. DB2 only joins two tables at a time. So if you have a three table join, it joins the first two, creating a temporary table with the result. The temporary table is then joined to the remaining table. The problem is that temporary tables don’t have indexes and always does a tablespace scan

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.