12/7/09

Hyperion SQR Performance Tuning

SQR Performance and SQL Statements
Whenever a program contains a BEGIN-SELECT, BEGIN-SQL, or EXECUTE command, it performs a SQL statement. Processing SQL statements typically consumes significant computing resources. Tuning
SQL statements typically yields higher performance gains than tuning any other part of your program.
This paper focuses on SQR tools for simplifying SQL statements and reducing the number of SQL executions. There are several techniques, including:
• Simplify a complex select paragraph.
• Use LOAD-LOOKUP to simplify joins.
• Improve SQL performance with dynamic SQL.
• Examine SQL cursor status.
• Avoid temporary database tables.
• Create multiple reports in one pass.
• Tune SQR numerics.
• Compile SQR programs and use SQR Execute.
• Set processing limits.
• Buffer fetched rows.
• Run programs on the database server.

Simplifying a Complex Select Paragraph
With relational database design, information is often normalized by storing data entities in separate tables. To display the normalized information, we must write a select paragraph that joins these tables together. With many database systems, performance suffers when you join more than three or four tables in one select paragraph.
With SQR, we can perform multiple select paragraphs and nest them. In this way, we can break a large join into several simpler selects. For example, we can break a select paragraph that joins the orders and the products tables into two selects. The first select retrieves the orders in which we are interested. For each order that is retrieved, a second select retrieves the products that were ordered. The second select is correlated to the first select by having a condition such as:
where order_num = &order_num
This condition specifies that the second select retrieves only products for the current order.
Similarly, if the report is based on products that were ordered, you can make the first select retrieve the products and make the second select retrieve the orders for each product.
This method improves performance in many cases, but not all. To achieve the best performance, we need to experiment with the different alternatives.

Using LOAD-LOOKUP to Simplify Joins
Database tables often contain key columns, such as a employee id or customer number. To retrieve a certain piece of information, we join two or more tables that contain the same column. For example, to obtain a product description, we can join the orders table with the products table by using the product_code column as the key.
With LOAD-LOOKUP, you can reduce the number of tables that are joined in one select. Use this command with LOOKUP commands.
The LOAD-LOOKUP command defines an array containing a set of keys and values and loads it into memory. The LOOKUP command looks up a key in the array and returns the associated value. In some programs, this technique performs better than a conventional table join.
We can use LOAD-LOOKUP in the SETUP section or in a procedure. If used in the SETUP section, it is processed only once. If used in a procedure, it is processed each time it is encountered.
LOAD-LOOKUP retrieves two fields from the database: the KEY field and the RETURN_VALUE field. Rows are ordered by KEY and stored in an array. The KEY field must be unique and contain no null values.
When the LOOKUP command is used, the array is searched (by using a binary search) to find the RETURN_VALUE field corresponding to the KEY that is referenced in the lookup.
The following code example illustrates LOAD-LOOKUP and LOOKUP:
begin-setup
load-lookup
name=NAMES
table=EMPNAME
key=EMPLOYEE_ID
return_value=EMPLOYEE_NAME
end-setup
...
begin-select
BUSINESS_UNIT (+1,1)
EMPLOYEE_ID
lookup NAMES &EMPLOYEE_ID $EMPLOYEE_NAME
print $EMPLOYEE_NAME (,15)
from JOB
end-select
In this code example, the LOAD-LOOKUP command loads an array with the EMPLOYEE_ID and EMPLOYEE_NAME columns from the EMPNAME table. The lookup array is named NAMES. The EMPLOYEE_ID column is the key and the EMPLOYEE_NAME column is the return value. In the select paragraph, a LOOKUP on the NAMES array retrieves the EMPLOYEE_NAME for each EMPLOYEE_ID. This technique eliminates the need to join the EMPNAME table in the select.
If the JOB and EMPNAME tables were joined in the select (without LOAD-LOOKUP), the code would look like this:
begin-select
BUSINESS_UNIT (+1,1)
JOB.EMPLOYEE_IDproduct_code
EMPLOYEE_NAME (,15)
from JOB, EMPNAME
where JOB.EMPLOYEE_ID = EMPNAME.EMPLOYEE_ID
end-select
Whether a database join or LOAD-LOOKUP is faster depends on the program. LOAD-LOOKUP improves performance when:
• It is used with multiple select paragraphs.
• It keeps the number of tables being joined from exceeding three or four.
• The number of entries in the LOAD-LOOKUP table is small compared to the number of rows in the select, and they are used often.
• Most entries in the LOAD-LOOKUP table are used.
Note. You can concatenate columns if you want RETURN_VALUE to return more than one column. The concatenation symbol is database specific.

Improving SQL Performance with Dynamic SQL
We can use dynamic SQL in some situations to simplify a SQL statement and gain performance:
begin-select
BUSINESS_UNIT
from JOB, LOCATION_TBL
where JOB.BUSINESS_UNIT = LOCATION_TBL.BUSINESS_UNIT
and ($state = 'CA' and EFF_DATE > $start_date
or $state != 'CA' and TRANSFER_DATE > $start_date)
end-select
In this example, a given value of $state, EFF_DATE, or TRANSFER_DATE is compared to $start_date. The OR operator in the condition makes such multiple comparisons possible. With most databases, an OR operator slows processing. It can cause the database to perform more work than necessary.
However, the same work can be done with a simpler select. For example, if $state is ‘CA,’ the following select works:
begin-select
BUSINESS_UNIT
from JOB, LOCATION_TBL
where JOB.BUSINESS_UNIT = LOCATION_TBL.BUSINESS_UNIT
and EFF_DATE > $start_date
end-select
Dynamic SQL enables you to check the value of $state and create the simpler condition:
if $state = 'CA'
let $datecol = 'EFF_DATE'
else
let $datecol = 'TRANSFER_DATE'
end-if
begin-select
BUSINESS_UNIT
from JOB, LOCATION_TBL
where JOB.BUSINESS_UNIT = LOCATION_TBL.BUSINESS_UNIT
and [$datecol] > $start_date
end-select
The [$datecol] substitution variable substitutes the name of the column to be compared with $state_date. The select is simpler and no longer uses an OR operator. In most cases, this use of dynamic SQL improves performance.

Examining SQL Cursor Status
Because SQR programs select and manipulate data from a SQL database, it is helpful to understand how SQR processes SQL statements and queries.
SQR programs can perform multiple SQL statements. Moreover, the same SQL statement can be run multiple times.
When a program runs, a pool of SQL statement handles—called cursors—is maintained. A cursor is a storage location for one SQL statement; for example, SELECT, INSERT, or UPDATE. Every SQL statement uses a cursor for processing. A cursor holds the context for the execution of a SQL statement.
The cursor pool contains 30 cursors, and its size cannot be changed. When a SQL statement is rerun, its cursor can be immediately reused if it is still in the cursor pool. When an SQR program runs more than 30 different SQL statement, cursors in the pool are reassigned.
To examine how cursors are managed, use the -S command-line flag. This flag displays cursor status information at the end of a run.
The following information appears for each cursor:
Cursor #nn:
SQL =
Compiles = nn
Executes = nn
Rows = nn
The listing also includes the number of compiles, which vary according to the database and the complexity of the query. With Oracle, for example, a simple query is compiled only once. With SYBASE, a SQL statement is compiled before it is first run and recompiled for the purpose of validation during the SQR compile phase. Therefore, you may see two compiles for a SQL statement. Later when the SQL is rerun, if its cursor is found in the cursor pool, it can proceed without recompiling.

Avoiding Temporary Database Tables
Programs often use temporary database tables to hold intermediate results. Creating, updating, and deleting database temporary tables is a very resource-consuming task, however, and can hurt your program’s performance. SQR provides two alternatives to using temporary database tables.
The first alternative is to store intermediate results in an SQR array. The second is to store intermediate results in a local flat file. Both techniques can bring about a significant performance gain. You can use the SQR language to manipulate data stored in an array or a flat file.
These two methods are explained and demonstrated in the following sections. Methods for sorting data in SQR arrays or flat files are also explained.
Using and Sorting Arrays
An SQR array can hold as many records as can fit in memory. During the first pass, when records are retrieved from the database, you can store them in the array. Subsequent passes on the data can be made without additional database access.
The following code example retrieves records, prints them, and saves them into an array named
ADDRESS_DETAILS_ARRAY:
create-array name=ADDRESS_DETAILS_ARRAY size=1000
field=ADDRESSLINE1:char field=ADDRESSLINE2:char
field=PINCODE:char field=PHONE:char
let #counter = 0
begin-select
ADDRESSLINE1 (,1)
ADDRESSLINE2 (,7)
PINCODE (,24)
PHONE (,55)
position (+1)
put &ADDRESSLINE1 &ADDRESSLINE2 &PINCODE &PHONE into ADDRESS_DETAILS_ARRAY(#counter)
add 1 to #counter
from ADDRESS
end-select
The ADDRESS_DETAILS_ARRAY array has four fields that correspond to the four columns that are selected from the ADDRESS table, and it can hold up to 1,000 rows. If the ADDRESS table had more than 1,000 rows, it would be necessary to create a larger array.
The select paragraph prints the data. The PUT command then stores the data in the array. You could use the LET command to assign values to array fields, however the PUT command performs the same work, with fewer lines of code. With PUT, you can assign all four fields in one command.
The #counter variable serves as the array subscript. It starts with zero and maintains the subscript of the next available entry. At the end of the select paragraph, the value of #counter is the number of records in the array.
The next code example retrieves the data from ADDRESS_DETAILS_ARRAY and prints it:
let #i = 0
while #i < #counter get &ADDRESSLINE1 &ADDRESSLINE2 &PINCODE &PHONE from ADDRESS_DETAILS_ARRAY(#i) print $ADDRESSLINE1 (,1) print $ADDRESSLINE2 (,7) print $PINCODE (,24) print $PHONE (,55) position (+1) add 1 to #i end-while In this code example, #i goes from 0 to #counter-1. The fields from each record are moved into the corresponding variables: $ADDRESSLINE1, $ADDRESSLINE2, $PINCODE and $PHONE. These values are then printed. Using and Sorting Flat Files An alternative to an array is a flat file. You can use a flat file when the required array size exceeds the available memory. As is the case with an array, you may need a sorting utility that supports NLS. The code example in the previous section can be rewritten to use a file instead of an array. The advantage of using a file is that the program is not constrained by the amount of memory that is available. The disadvantage of using a file is that the program performs more input/output (I/O). However, it may still be faster than performing another SQL statement to retrieve the same data. This program uses the UNIX/Linux sort utility to sort the file by name. This example can be extended to include other operating systems. The following code example is rewritten to use the cust.dat file instead of the array: Program ex25b.sqr begin-program do main end-programbegin-procedure main ! ! Open cust.dat ! open 'cust.dat' as 1 for-writing record=80:vary begin-select ADDRESSLINE1 (,1) ADDRESSLINE2 (,7) PINCODE (,24) PHONE (55) position (+1) ! Put data in the file write 1 from &ADDRESSLINE1:30 &ADDRESSLINE1:30 &PINCODE:6 &PHONE:10 from ADDRESS order by PINCODE end-select position (+2) ! ! Close cust.dat close 1 ! Sort cust.dat by name ! call system using 'sort cust.dat > cust2.dat' #status
if #status <> 0
display 'Error in sort'
stop
end-if
!
! Print ADDRESS (which are now sorted by name)
!
open 'cust2.dat' as 1 for-reading record=80:vary
while 1 ! loop until break
! Get data from the file
read 1 into $ADDRESSLINE1:30 $ADDRESSLINE2:30 $PINCODE:6 $PHONE:10
if #end-file
break ! End of file reached
end-if
print $ADDRESSLINE1 (,1)
print $ADDRESSLINE2 (,7)
print $PINCODE (,24)
print $PHONE (,55)
position (+1)
end-while
!
! close cust2.dat
close 1
end-procedure ! main
The program starts by opening a cust.dat file:
open 'cust.dat' as 1 for-writing record=80:vary
The OPEN command opens the file for writing and assigns it file number 1. You can open as many as 12 files in one SQR program. The file is set to support records of varying lengths with a maximum of 80 bytes (characters). For this example, you can also use fixed-length records.
As the program selects records from the database and prints them, it writes them to cust.dat:
write 1 from &ADDRESSLINE1:30 &ADDRESSLINE2:30 &PINCODE:6 &PHONE:10
The WRITE command writes the four columns into file number 1—the currently open cust.dat. It writes the name first, which makes it easier to sort the file by name. The program writes fixed-length fields. For example, &ADDRESSLINE1:30 specifies that the name column uses exactly 30 characters. If the actual name is shorter, it is padded with blanks. When the program has finished writing data to the file, it closes the file by using the CLOSE command.
The file is sorted with the UNIX sort utility:
call system using 'sort cust.dat > cust2.dat' #status
The sort cust.dat > cust2.datis command sent to the UNIX system. It invokes the UNIX sort command to sort cust.dat and direct the output to cust2.dat. The completion status is saved in #status; a status of 0 indicates success. Because name is at the beginning of each record, the file is sorted by name.
Next, we open cust2.dat for reading. The following command reads one record from the file and places the first 30 characters in $ADDRESSLINE1:
read 1 into $ADDRESSLINE1:30 $ADDRESSLINE2:30 $PINCODE:6 $PHONE:10
The next two characters are placed in $PINCODE and so on. When the end of the file is encountered, the #end-file reserved variable is automatically set to 1 (true). The program checks for #end-file and breaks out of the loop when the end of the file is reached. Finally, the program closes the file by using the CLOSE command.
Creating Multiple Reports in One Pass
Sometimes you must create multiple reports that are based on the same data. In many cases, these reports are similar, with only a difference in layout or summary. Typically, you can create multiple programs and even reuse code. However, if each program is run separately, the database has to repeat the query. Such repeated processing is often unnecessary.
With SQR, one program can create multiple reports simultaneously. In this method, a single program creates multiple reports, making just one pass on the data and reducing the amount of database processing.
Tuning SQR Numerics
SQR for PeopleSoft provides three types of numeric values:
• Machine floating point numbers
• Decimal numbers
• Integers
Machine floating point numbers are the default. They use the floating point arithmetic that is provided by the hardware. This method is very fast. It uses binary floating point and normally holds up to 15 digits of precision.
Some accuracy can be lost when converting decimal fractions to binary floating point numbers. To overcome this loss of accuracy, you can sometimes use the ROUND option of commands such as ADD, SUBTRACT, MULTIPLY, and DIVIDE. You can also use the round function of LET or numeric edit masks that round the results to the needed precision.
Decimal numbers provide exact math and precision of up to 38 digits. Math is performed in the software. This is the most accurate method, but also the slowest.
You can use integers for numbers that are known to be integers. There are several benefits for using integers because they:
• Enforce the integer type by not allowing fractions.
• Adhere to integer rules when dividing numbers.
Integer math is also the fastest, typically faster than floating point numbers.
If you use the DECLARE-VARIABLE command, the -DNT command-line flag, or the DEFAULT-NUMERIC entry in the Default-Settings section of the PSSQR.INI file, you can select the type of numbers that SQR uses. Moreover, you can select the type for individual variables in the program with the DECLARE-VARIABLE command. When you select decimal numbers, you can also specify the needed precision.
Selecting the numeric type for variables enables you to fine-tune the precision of numbers in your program. For most applications, however, this type of tuning does not yield a significant performance improvement, so it's best to select decimal. The default is machine floating point to provide compatibility with older releases of the product.
Setting Processing Limits
Use a startup file and the Processing-Limits section of PSSQR.INI to define the sizes and limitations of some of the internal structures that are used by SQR. An -M command-line flag can specify a startup file whose entries override those in PSSQR.INI. If you use the -Mb command-line flag, then corresponding sections of the file are not processed. Many of these settings have a direct affect on memory requirements.
Tuning of memory requirements used to be a factor with older, 16-bit operating systems, such as Windows 3.1. Today, most operating systems use virtual memory and tuning memory requirements normally do not affect performance in any significant way. The only case in which you might need to be concerned with processing limit settings is with large SQR programs that exceed default processing limit settings. In such cases you must increase the corresponding settings
Buffering Fetched Rows
When a BEGIN-SELECT command is run, records are fetched from the database server. To improve performance, they are fetched in groups rather than one at a time. The default is groups of 10 records. The records are buffered, and a program processes these records one at a time. A database fetch operation is therefore performed after every 10 records, instead of after every single record. This is a substantial performance gain. If the database server is on another computer, then network traffic is also significantly reduced.
Modify the number of records to fetch together by using the -B command-line flag or, for an individual BEGIN-SELECT command, by using its -B option. In both cases, specify the number of records to be fetched together. For example -B100 specifies that records be fetched in groups of 100. This means that the number of database fetch operations is further reduced.
This feature is currently available with SQR for ODBC and SQR for the Oracle or SYBASE databases.
Running Programs on the Database Server
To reduce network traffic and improve performance, run SQR programs directly on the database server machine. The SQR server is available on many server platforms including Windows NT and UNIX/Linux.
SQR Programming Principles
• Develop all SQRs using Structured Programming
Structured Programming
A technique for organizing and coding computer programs in which a hierarch of modules issued, each having a single entry and a single exit point, and in which control is passed downward through the structure without unconditional branches to higher levels of the structure.
Structured programming is often associated with a top-down approach to design. In this way designers map out the large scale structure of a program in terms of smaller operations, implement and test the smaller operations, and then tie them together into a whole program.
• Pseudo-Code
o Flowchart the program logic
o Research and ask a few questions:
􀂃 How much data will be pulled? Will it be100 rows or 1000 rows?
􀂃 How many rows will come from each table?
􀂃 What kind of data will be pulled? Student Data? Setup Data?
􀂃 What does the key structure look like for the tables being pulled?
􀂃 Is this table the parent or the child?
􀂃 How important is this table?
􀂃 What fields will be needed from this table? Are they available somewhere else?
o Write SQL and test in SQL Plus before coding SQR
• Use Linear Programming – easier to program and debug
• Comment
Summary
The following techniques can be used to improve the performance of your SQR programs:
• Simplify complex SELECT statements.
• Use LOAD-LOOKUP to simplify joins.
• Use dynamic SQL instead of a condition in a SELECT statement.
• Avoid using temporary database tables. Two alternatives to temporary
database tables are SQR arrays and flat files.
• Write programs that create multiple reports with one pass on the data.
• Use the most efficient numeric type for numeric variables (machine
floating point, decimal, or integer).
• Save compiled SQR programs and rerun them with SQR Execute.
• Adjust settings in the [Processing-Limits] section of SQR.INI or in a
startup file.
• Increase buffering of rows in SELECT statements with the -B flag.
• Execute programs on the database server machine.
• SQR Programming Principles