Thursday, December 8, 2011

DB2: Plans and packages


Let’s begin with what happened before we could create a package. You first needed a Database Request Module (DBRM), which is the input to the bind process and is created as one of the DB2 precompiler or SQL coprocessor outputs. However, to discuss the DBRM, we need to go back even further—to the first step in the program preparation process: the DB2 precompiler.

If your program is written in C, C++, COBOL, or PL/I, regardless of whether it uses static or dynamic SQL, that program must be “prepared” before you can execute it. The first step is determining how to handle the SQL. Many languages may not be ready to handle an SQL statement. In recent years, the option has been made available to use the SQL coprocessor in the compile step rather than running the program code through the DB2 pre-compiler. Both processes have the same effect: modified source and a DBRM.

Since DB2’s beginning, a method has been needed to “remove” SQL before the compiler could actually process source statements. The compiler had no idea how to process an SQL statement. The DB2 precompiler would replace the SQL statements with a CALL statement to the appropriate language interface in a modified source file. The actual SQL statement, in an internal format, would be added to a separate object, the DBRM. The modified source would serve as input to the compiler, and the DBRM as input to the bind.

Now, if you have the correct compiler release levels, there’s an SQL compiler option that lets the compile invoke the SQL coprocessor. This completely eliminates the precompiler step and lets the compiler “handle” an SQL statement and produce a DBRM. Both methods are acceptable and both create DBRMs. From here on, though, we’ll refer only to the precompiler step. The actual compiler step is executed, and an object module is produced as output of the compiler and passed to the linkage editor, which creates the executable representation of the application program, the load module. Now, let’s get back to that DBRM.

A DBRM is one of the outputs of the precompiler (or coprocessor) and contains all the program’s SQL in an internal format. The DBRM will eventually be input to the bind process. Bind creates the executable form of the SQL, similar to what the linkage editor does for the program. DBRMs can be bound into plans or packages. Our focus is on packages because no one would intentionally bind a DBRM directly into a plan anymore. There just isn’t any reason for that today.

Packages

A package is an executable representation of a single DBRM. Before the introduction of packages, the DBRMs representing the programs linked into a single load module were all bound into a single plan. Changes made to a single DBRM meant the entire plan had to be bound. If the DBRM was used in more than one plan, all plans containing that DBRM had to be bound. Depending on the number of DBRMs included in a plan, this could be a lengthy, disruptive process.

With packages, though, the process is simpler and offers some built-in fallback. Each package represents a single bound DBRM. A package is a direct, one-to-one relationship between an application program, the DBRM created by that application and used as input to the package BIND, and the eventual package itself. Even the package name is the same as the DBRM used to create it. A package name can’t even be specified at BIND time because there’s no package-name keyword on the BIND package statement. The package always assumes the name of the DBRM specified by the member or copy keyword. When a program is changed, creating a new DBRM, only that single package needs to be bound. At the next program execution, if versions aren’t being used, the plan will pick up the new package. No massive plan bind is needed.

You can’t specify a package name at bind time, but must specify a collection-id. A collection-id is a way of associating similar packages. However, a collection won’t exist on its own. It has meaning only when it’s used to identify a package. For example, a collection could contain all the payroll packages. To reference a specific package, PAYROLL.packagename would have to be coded, where PAYROLL is the collection-id. One advantage of using collections is referencing multiple packages from the same collection. A package list could simply be coded as PAYROLL.* rather than PAYROLL.PACKAGE1, PAYROLL.PACKAGE2, etc. In fact, if all packages used by this plan belong to the PAYROLL collection, PAYROLL.* would be the best-performing choice.

Another advantage of packages is versioning, the ability to keep multiple copies of the same package. You can maintain multiple versions by using package versions or by placing the same package name in multiple collections. A package version-id, if used, is created using the optional precompiler keyword, VERSION. A version can have a maximum length of 64 bytes and can be a user-defined string or generated automatically using the consistency token. If not specified, a blank string is used.

The combination of package name and version-id makes the package unique. By changing the version-id each time a program is precompiled, DB2 can have multiple copies of the same package available, each at a different “version.” By changing which load module is executed alters which package DB2 will reference. This same effect can also be accomplished by binding different copies of the same package into different collections. This method relies on manually moving packages from one collection to another, however, or changing the SET CURRENT PACKAGESET value. If you want to take advantage of versions, ensure you have a process to clean up older versions of packages no longer in use. Allowing the uncontrolled, unmanaged creation of multiple packages with different versions only increases the size of your catalog tables and will eventually become a management problem, if not a performance issue.

One difference between plans and packages is their use of the EXECUTE privilege. This can sometimes be confusing for the package user because DB2 has three different definitions, or implementations, for the same privilege keyword. When the EXECUTE privilege is granted on a package, the privilege allows only the authid to include that package in the package list of the plan being bound. If that same EXECUTE privilege is granted on a plan, the authid granted the privilege can execute that plan. In DB2 for z/OS, only plans can be executed. A package can’t normally be explicitly accessed except through a plan. Even packages bound on DB2 for z/OS from a workstation, where there’s no concept of a plan, still run under a special DB2 system plan called DISTSERV. Finally, DB2 allows the EXECUTE privilege to be granted over an entire collection. Granting EXECUTE in this fashion allows the authid to include any package in that collection into a plan’s package list and allows the inclusion of any packages added to this collection in the future.

The Search

The key to understanding how packages will affect your applications is to understand how DB2 decides which package in the package list it should be using. Whatever the connection type, a program always calls the DB2 language interface. This CALL statement is added to the source code by the DB2 precompiler replacing the corresponding SQL statement. A modified source file, with the SQL commented out and its new call statement added, is compiled and linked. The original SQL is extracted and placed into a DBRM and then used as input to the package bind discussed earlier.

The CALL statement’s parameter list contains the DBRM name, consistency token, SQL section number, and SQL statement number. The DBRM name is the name of the program containing the SQL statement. This also is the program name. The DBRM name is used to decide which package in the package list should be loaded when the program runs.

The consistency token is a unique identifier used to ensure that a load module uses the correct plan, package, or version of a package. Although the consistency token is sometimes referred to as a timestamp, it’s not. DB2 creates the consistency token from an MVS ISO timestamp, usually STCK format. However, that’s where the similarity ends. From there, DB2 manipulates the value into something called CONTOKEN.

Specifying the LEVEL keyword at precompile time can override the use of the timestamp when building the consistency token. When LEVEL is used, the value supplied by the LEVEL keyword is used as the consistency token instead of a timestamp. Users have been cautioned about using the LEVEL keyword. With packages, there’s yet another reason to avoid this option. When LEVEL is specified, and VERSION is automatically generated, bind will use the LEVEL as the VERSION. This defeats the purpose of using VERSION, which implies the use of multiple copies of the same package (while LEVEL implies reuse).

The SQL statement number in the parameter list corresponds to the state ment number of the modified call statement in the precompiler output listing PCLIST.LISTING. This number has nothing to do with the statement numbers in the original source listing. For now, we’re concerned only with the DBRM name and consistency token and how they’re used when passed to DB2. These are the only two pieces of information from the program usually used to find the correct DBRM/package in the plan.

Every plan contains a plan header, sometimes called a header page or section 0. This section contains a directory of all DBRMs bound directly into a plan followed by a list of all packages this plan can use. Using the DBRM name passed to DB2 from the call statement’s parameter list, the DBRM directory in section 0 is searched first for the existence of DBRMs bound directly into the plan. However, the search examples used in this article assume no DBRMs were bound directly into the plan. The search will therefore be limited to the package list in the plan header.

The list of packages in the plan header is searched serially. The order of the packages is the same as originally specified in the BIND PLAN command’s PKLIST keyword. The format of a package entry in the package list (PKLIST) is the concatenation of location, collection-id, and package-name, although location isn’t required. When the DBRM name passed from the call statement matches a package-name in the package list, a search value is built. It consists of the padded package list entry location, collection-id, and package-name with the consistency token from the call statement appended. This search argument is used against the DSNSPT01 index on the Skeleton Package Table (SKPT) to test for the package’s existence.

The first key column of the DSNSPT01 index is SPTPID. The three values used to form this column follow Distributed Relational Database Architecture (DRDA) rules. This means the location is 18 characters rather than DB2’s 16 (128 in V8) and package-name is also 18 characters rather than DB2’s eight (128 in V8). Collection-id is 18 characters, just as it is in DB2 (again 128 in V8). Besides the SPTPID, the index key also consists of the RDS number and the section number. If a “row not found” condition exists, the package list search continues with the next package that qualifies in the plan’s package list. An error is returned to the application process (-805) if no match is found after searching all the packages in the list. However, when the I/O is successful, the package header, or section 0 of the package, is moved into the EDM pool and used to verify that the SQL sections are contained in this package. When the appropriate section is needed, that section is moved into the EDM pool if it isn’t already present.

You can now build other scenarios. For example, what if you have a collection-id.*? Then any DBRM name from the call list will always qualify. The asterisk (*) works as a wild card for all packages in that collection. There’s no additional search necessary for that collection. If this package list contained only the one entry, collection-id.*, the shortest possible search is performed. If there were multiple collection-id.* entries in the package list, there’d still be only one search necessary for each of the collection entries. Contrast this with a long list of collection-id.package-name entries where the entire list must be searched until a package is found or a not found condition occurs. But even a long explicit list of packages doesn’t necessarily translate to poor performance. DB2 can sufficiently search a fairly long list of packages before it has a performance impact.

For DBRMs in a plan with or without packages, a binary search is used. For packages, a hashing algorithm is used to check if the package has already been allocated to the plan. If not, then the package list is searched in the order it was specified on BIND.

The CURRENT PACKAGESET special register affects the package search in two ways. First, it’s required if an * is used in place of a collection-id. A collection-id must be present for the search and an * doesn’t qualify. Whenever an * is used in place of the collection-id in the package list, the SET CURRENT PACKAGESET SQL statement must be coded in the program and executed before using any SQL in a package referenced by that collection. If not, the program will receive a negative SQLCODE (-812) at run-time. There’s one major drawback when using an * as the collection-id in the package list. DB2 verifies that the plan owner has the execute privilege on the package at run-time, not at plan bind. Performing authorization table lookups at run-time can be costly.

Second, when CURRENT PACKAGESET special register is set to a non-blank string, it limits the package search to the collection-id specified in CURRENT PACKAGESET. This can be useful when multiple collections are specified in the package list and the application wants to limit how much searching is performed. Changing the value of the CURRENT PACKAGESET special register is also a handy technique used to change which package is being referenced at run-time.

The version-id specified at precompiler time plays no direct part in this search. Only the DBRM name and consistency token are used. The version-id allows only multiple rows to be inserted into the package catalog tables for this package. Remember there’s always a one-to-one correlation between a version and consistency token. Because a version can only be created by the pre-compiler, every version will have a unique consistency token.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.