23 February 2007

Parallel Processing In PL/SQL

Parallel processing is to run the same process-usally a heavy process- simultaneously. It is called Parallel computing in computer science. Main idea of Parallel computing is to get faster responses. Wikipedi describes as


“Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination”


In database systems, Parallel processing takes role in huge datasets. For performance reasons, you can take advantages of Parallel processing. As Kyte mentions


“Parallel query is suitable for a certain class of large problems: very large problems that have no other solution. Parallel query is my last path of action for solving a performance problem; it's never my first course of action.”


Oracle has some Parallel procesing features as i gave in further readings section. You can investigate them. But the aim of this article is a question was asked by me in forums.oracle. According to business rules, i have to implement a kind of Parallel processing. The busines logic has implemented in pro C units. It was not possible to take logic into pl/sql because of business logic needs extra features that has to be implemented in C units. Business logic is a heavy process. To gain performance, it must be executed parallely. In case of updates, it is also prevented to get deadlocks. To achieve this situation, i find out some methods which i described below.


1.   Process

For demonstration a table called locks is used. Table is formed with an id and status columns.

drop table locks;
create table locks as SELECT rownum id FROM all_objects  WHERE rownum <= 100;
alter table locks add( status varchar2(16) );

Main process is taking records of locks table and updating them with given status.

UPDATE locks SET status = :new_status;

For simulating real world because of a heavy C routine executes, after every update statement, pl/sql engine forced to wait 3 seconds.

dbms_lock.sleep(3);

To analyze how much time spent to finih locks table, analyze_locks procedure will be used. This routine selects forever locks table until at least one status update is done. It takes the time and selects again locks table forever. When all rows are uptaded, time differenece is printed out.

Aim of this procedure to get exactly howmuch time passed to finish process. It will not be clear to taking out elapsed time SQL*Plus with set timing on because of Parallel processes that executes different sessions.

This procedure must be called when locks table is cleaned out and in seperate session before main process executes.

CREATE OR REPLACE PROCEDURE analyze_locks IS
  effected_row_count NUMBER := 0;
  start_time         NUMBER;
  diff               NUMBER;
BEGIN
  WHILE effected_row_count = 0 LOOP
    SELECT COUNT(*)
    INTO effected_row_count
    FROM locks
    WHERE status IS NULL;
  END LOOP;
  start_time := dbms_utility.get_time;
  WHILE effected_row_count > 0 LOOP
    SELECT COUNT(*)
    INTO effected_row_count
    FROM locks
    WHERE status IS NULL;
  END LOOP;
  diff := dbms_utility.get_time - start_time;
  DBMS_OUTPUT.PUT_LINE('Finished in ' || trunc(diff / 100) || ' ms.');
END;

Before processing starts, tables are recreated and analyze_locks procedure called.

2.   Methods

There are three methods called Usual, Bucket and Bounds mentioned

2.1.                   Usual


There is no Parallel process. Records of table processed as a single process. A cursor is opening and taking records one by one. Processing done with a procedure as shown below:

CREATE OR REPLACE PROCEDURE process_locks_table_usual_way(new_status IN VARCHAR2) AS

BEGIN

  FOR rec_locks IN (SELECT id, status
                      FROM locks
                     WHERE status IS NULL
                     ORDER BY id) LOOP
    UPDATE locks
       SET status = new_status || to_char(rec_locks.id)
     WHERE id = rec_locks.id;
    dbms_lock.sleep(3);
  END LOOP;
  COMMIT;

END;

2.1.1.   Processing

Because of there is no Parallel processes, only one session will be enough to execute procedure.

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec process_locks_table_usual_way('W');

PL/SQL procedure successfully completed

SQL>

Analyze results are

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec analyze_locks;

Finished in 367 ms.

PL/SQL procedure successfully completed

SQL>

The average execution time of this procedure after calling 3 times is 340 ms.

2.2.                   Bucket

In bucket way there is a pre process called bucketing. To making a Parallel process with bucketing, the records must be bucketed in a given size. This size can not be changed after bucketing. Every Parallel process will process only its records.

Bucketing can be renamed with packeting or grouping. When the table is bucketed, table is divided into groups with given size, logically. For instance when you give bucket size as 3, all records will be grouped in 3 different categories such as 1,2,3. The bucketing is done as balanced mode. That is all three categories has the same size(group 1 has 33 records, group 2 has 34 records, group 3 has 33 records)

Main drawback of method is, it is not possible to change bucket size when buckets are created. Processes must stopped and rebucketed the table with new bucket size. Another drawback is, when one process drops, it must be executed again with the same bucket number. Because other processes will not process records of dropped bucket number.

CREATE OR REPLACE PROCEDURE process_locks_table_bucket_way(
                                                 new_status IN VARCHAR2
                                                ,bucket_no  IN NUMBER) AS
BEGIN

  FOR rec_locks IN (SELECT id, status
                      FROM locks
                     WHERE bucket = bucket_no
                           AND status IS NULL
                     ORDER BY id) LOOP
    UPDATE locks
       SET status = new_status || to_char(rec_locks.id)
     WHERE id = rec_locks.id;
    dbms_lock.sleep(3);
  END LOOP;
  COMMIT;

END;

2.2.1.   Pre-Process : Bucketing


In order to bucketing it must be decided how many buckets will be created. Bucket size will determine number of Parallel processes. In this demonstration bucket size is defined as 3. Bucketing is done via rownum pseudocolumn and mod operator.

alter table locks add( bucket number );
update locks set bucket = mod( rownum, 3) + 1;
commit;

2.2.2.   Processing

Locks table is bucketed with size 3. That is, it is possible to execute 3 parallel process simultaneously in different sessions. On calling the procedure, it must be determined that which bucket will be processed in procedure.

In session 1

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec process_locks_table_bucket_way('A',1);

PL/SQL procedure successfully completed

SQL>

In session 2

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec process_locks_table_bucket_way('B',2);

PL/SQL procedure successfully completed

SQL>

In session 3

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec process_locks_table_bucket_way('C',3);

PL/SQL procedure successfully completed

SQL>

Analyze results are

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec analyze_locks;

Finished in 108 ms.

PL/SQL procedure successfully completed

The average execution time of this procedure after calling 3 times is 110 ms.


2.3.                   Bounds

Main idea of this method is to get bounds(especially primary key field of table) with a helper function. When one process starts, it will check whether more unprocessed records are left. If yes, it will get the bounds and process records only given range. Other process will take its own bounds and so on.

With Bounds method, there is no need of a predefined number of Parallel processes wheras Bucket method needs. Number of Parallel processes can be increased with requirements of business logic.When one of Parallel process dropped, other processes will process records. At least one process executes, it will process all records as a single process.

In this method an extra table called row_range is needed. This  table will store lower and upper bounds of locks table. It must be filled with 0,0 for initial bounds.

drop table row_range;
create table row_range(first_id number, last_id number );
INSERT INTO row_range VALUES( 0, 0);
commit;

A hepler procedure called get_bounds is called before one process start. Process takes a row range by the help of this procedure. row_count determines how many rows will be taken to create bounds. If it is specified with 1, every process will take only one record and when finished it will commit and take another record. LEAD analytic function is used for taking next row_countth record.

CREATE OR REPLACE PROCEDURE get_bounds(lower_bound OUT NUMBER
                                      ,upper_bound OUT NUMBER
                                      ,row_count   IN NUMBER) IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  upper         NUMBER;
  next_upper    NUMBER;
  default_upper NUMBER := 999999999;
BEGIN
  SELECT r.last_id INTO upper FROM row_range r FOR UPDATE;
  BEGIN
    SELECT n
    INTO next_upper
    FROM (SELECT lead(id, row_count, default_upper) over(ORDER BY id) n
          FROM locks
          WHERE id >= upper)
    WHERE rownum = 1;
  EXCEPTION
    WHEN no_data_found THEN
      next_upper := default_upper;
   
  END;

  lower_bound := upper;
  upper_bound := next_upper;
  UPDATE row_range r SET r.first_id = lower_bound, r.last_id = upper_bound;
  COMMIT;

END;

Every process will take its bounds and process them as given procedure. There is an outer while loop that determines whether there is more unprocessed records left.

CREATE OR REPLACE PROCEDURE process_locks_table(
       new_status          IN VARCHAR2
      ,processed_row_count IN NUMBER) AS
  lower         NUMBER;
  upper         NUMBER;
  default_upper NUMBER := 999999999;

BEGIN
  get_bounds(lower, upper, processed_row_count);
  WHILE lower <> default_upper LOOP
   
    FOR rec_locks IN (SELECT id, status
                      FROM locks
                      WHERE status IS NULL
                            AND id >= lower
                            AND id < upper
                      ORDER BY id) LOOP
      UPDATE locks
      SET status = new_status || to_char(rec_locks.id)
      WHERE id = rec_locks.id;
      dbms_lock.sleep(3);
    END LOOP;
    COMMIT;
    get_bounds(lower, upper, processed_row_count);
 
  END LOOP;
END;

2.3.1.   Processing

On calling procedure, it must be defined howmany records will be processed in every Parallel process. In this demonstration, only three Parallel process is executed simultaneously. It can be increased in order to business needs.

In session 1

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec process_locks_table('A',1);

PL/SQL procedure successfully completed

SQL>

In session 2

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec process_locks_table('B',1);

PL/SQL procedure successfully completed

SQL>

In session 3

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec process_locks_table('C',1);

PL/SQL procedure successfully completed

SQL>

Analyze results are

Connected to Oracle Database 10g Express Edition Release 10.2.0.1.0
Connected as hr

SQL> exec analyze_locks;

Finished in 145 ms.

PL/SQL procedure successfully completed

The average execution time of this procedure after calling 3 times is 119 ms.

3.   Conclusion

To compare methods the table below can be formed.


Procedure
Time(ms)
Pros
Cons
Usual Way
340
No developing time
No Parallel process
Bucket Way
110
Parallel process
Buckets are determined first, it is not possible to change, when one process drops all records in that bucket will not process.
Bound Way
119
Parallel process, bounds are taking dynamically, you can execute an extra process while others are processing, when one process drops other processes will take place and process the records.



As shown in table above bucket method and bounds method are nearly the same in terms of average execution time. But in usual method because of no Parallel process is used, it took a long time(nearly 3 times other two methods, 3 is Parallel process count ). It can be reduced average time of executions with incrementing Parallel process counts.

In bucket method, first you must decide how many buckets will be created. After deciding it is not possible to add an extra process. But in bounds method, it is possible to add an extra process that can not effect others.

In bounds method, every process takes its record range. In bucket method every process will take its own bucket. So it will be solved deadlock problems in bucket and bounds methods.

If one process drops in bucket method, it has to be reexecuted in order to process that bucket records whereas other processes will take place of dropped process in bounds method.

No comments: