The cost of DBMS_PARALLEL

I had an inquiry about the performance of DBMS_PARALLEL_EXECUTE today from a customer. Whilst the utility (as the name suggests) is designed to break up a large task into smaller chunks and then run those in parallel, the customer asked what was the overall cost in using the utility compared to not using it at all.

For example, if I could scan a large table in 100 seconds, and then I broke up the task into 10 smaller ones using DBMS_PARALLEL_EXECUTE, then how much overhead would there be if I ran the 10 tasks in serial. Would it take 100 seconds? or 110 seconds? or perhaps much much more?

Well, the answer to that is “All of the above”

Ultimately, in most use cases, we will converting a task that does a scan of a table along the lines of:


select *
from   my_table
where  [conditions]

into multiple tasks each running an occurrence of:


select *
from   my_table
where  [conditions]
and    [some extra conditions to segment the data]

The additional cost will be directly proportional to the processing cost of those extra conditions. The best scenario is most likely to be the operating mode where the scan of the large table is done via ROWID ranges. We can mimic the operation of DBMS_PARALLEL_EXECUTE to measure that overhead.

First I’ll create a tablespace that spans multiple files (to give an easy way to get some simple ROWID ranges)


SQL> create tablespace demo datafile
  2     'X:\ORACLE\ORADATA\DB19\PDB1\DEMO01.DBF' size 4g,
  3     'X:\ORACLE\ORADATA\DB19\PDB1\DEMO02.DBF' size 4g,
  4     'X:\ORACLE\ORADATA\DB19\PDB1\DEMO03.DBF' size 4g,
  5     'X:\ORACLE\ORADATA\DB19\PDB1\DEMO04.DBF' size 4g
  6  extent management local uniform size 32m;

Tablespace created.

And I’ll put a 1 billion row table into it. This will be our candidate for DBMS_PARALLEL_EXECUTE


SQL> create table tx nologging pctfree 0 tablespace demo as
  2  select d.*
  3  from
  4   ( select rownum x, rownum y from dual connect by level <= 100000 ) d,
  5   ( select 1 from dual connect by level <= 10000 );

Table created.

DBMS_PARALLEL_EXECUTE scans the data dictionary to produce ROWID ranges based on the number of chunks you nominate. I’ll mimic this with a simple query against DBA_EXTENTS and then use DBMS_ROWID to build some ROWIDs that will be used to do our scan.


SQL> with ext as
  2  (
  3      select o.data_object_id,e.relative_fno, min(e.block_id) start_block_id, max(e.block_id+e.blocks) end_block_id
  4      from dba_extents e
  5         , dba_objects o
  6      where e.owner = o.owner
  7      and e.segment_name = o.object_name
  8      and nvl(e.partition_name, '"') = nvl(o.subobject_name, '"')
  9      and e.segment_type = o.object_type
 10      and e.segment_type in ('TABLE', 'TABLE PARTITION', 'TABLE SUBPARTITION')
 11      and e.owner = user
 12      and e.segment_name = 'TX'
 13      group by
 14        data_object_id,
 15        relative_fno
 16  )
 17  select
 18   relative_fno,
 19   dbms_rowid.rowid_create(
 20        rowid_type=>1
 21       ,object_number=>data_object_id
 22       ,relative_fno=>relative_fno
 23       ,block_number=>start_block_id
 24       ,row_number=>0
 25    ) start_rowid,
 26   dbms_rowid.rowid_create(
 27        rowid_type=>1
 28       ,object_number=>data_object_id
 29       ,relative_fno=>relative_fno
 30       ,block_number=>end_block_id
 31       ,row_number=>32767
 32    ) end_rowid
 33  from ext ;

RELATIVE_FNO START_ROWID        END_ROWID
------------ ------------------ ------------------
         114 AAhFjhAByAAAACAAAA AAhFjhAByAABxCAH//
         115 AAhFjhABzAAAACAAAA AAhFjhABzAABwCAH//
         117 AAhFjhAB1AAAACAAAA AAhFjhAB1AABxCAH//
         116 AAhFjhAB0AAAACAAAA AAhFjhAB0AABxCAH//

I can now use a NESTED LOOP to loop through these 4 ROWID pairs and scan the 4 chunks of my large table.


SQL> set timing on
SQL> with ext as
  2  (
  3      select /*+ materialize */ o.data_object_id,e.relative_fno, min(e.block_id) start_block_id, max(e.block_id+e.blocks) end_block_id
  4      from dba_extents e
  5         , dba_objects o
  6      where e.owner = o.owner
  7      and e.segment_name = o.object_name
  8      and nvl(e.partition_name, '"') = nvl(o.subobject_name, '"')
  9      and e.segment_type = o.object_type
 10      and e.segment_type in ('TABLE', 'TABLE PARTITION', 'TABLE SUBPARTITION')
 11      and e.owner = user
 12      and e.segment_name = 'TX'
 13      group by
 14        data_object_id,
 15        relative_fno
 16  ),
 17  ranges as
 18  (
 19  select /*+ materialize */
 20   dbms_rowid.rowid_create(
 21        rowid_type=>1
 22       ,object_number=>data_object_id
 23       ,relative_fno=>relative_fno
 24       ,block_number=>start_block_id
 25       ,row_number=>0
 26    ) start_rowid,
 27   dbms_rowid.rowid_create(
 28        rowid_type=>1
 29       ,object_number=>data_object_id
 30       ,relative_fno=>relative_fno
 31       ,block_number=>end_block_id
 32       ,row_number=>32767
 33    ) end_rowid
 34  from ext
 35  )
 36  select /*+ leading(ranges tx) use_nl(tx) */ max(y)
 37  from ranges, tx
 38  where tx.rowid >= ranges.start_rowid
 39  and tx.rowid   <  ranges.end_rowid;

    MAX(Y)
----------
    100000

Elapsed: 00:00:29.63

It took just under 30 seconds to scan the table using the 4 ROWID chunks. Lets compare that to a single pass through the table


SQL> select  max(y)
  2  from tx;

    MAX(Y)
----------
    100000

Elapsed: 00:00:28.98
SQL>

Thus the overhead is negligible for the ROWID case. This makes sense because a full table scan is simply the database cycling through the list of extents for that segment, which maps very closely to starting/ending ROWIDs anyway.

However, if you opt for a different mechanism, for example, starting and ending values of a column in the table, then your performance is totally at the whim of how efficient those queries will be. If they map to partition keys for example, then you can expect good performance. But if they map to unindexed columns or even indexed columns that span large ranges of data, then you might be in for slower performance per chunk.

This still might give better overall performance because of course, the entire motivation for using DBMS_PARALLEL_EXECUTE is to run tasks in parallel, but you might end up chewing up more resources to do so, compared to the serial case.

The cost of DBMS_PARALLEL_EXECUTE