I had an inquiry about the performance of DBMS_PARALLEL_EXECUTE today from a customer. Whilst the utility (as the name suggests) is designed to break up a large task into smaller chunks and then run those in parallel, the customer asked what was the overall cost in using the utility compared to not using it at all.
For example, if I could scan a large table in 100 seconds, and then I broke up the task into 10 smaller ones using DBMS_PARALLEL_EXECUTE, then how much overhead would there be if I ran the 10 tasks in serial. Would it take 100 seconds? or 110 seconds? or perhaps much much more?
Well, the answer to that is “All of the above”
Ultimately, in most use cases, we will converting a task that does a scan of a table along the lines of:
select *
from my_table
where [conditions]
into multiple tasks each running an occurrence of:
select *
from my_table
where [conditions]
and [some extra conditions to segment the data]
The additional cost will be directly proportional to the processing cost of those extra conditions. The best scenario is most likely to be the operating mode where the scan of the large table is done via ROWID ranges. We can mimic the operation of DBMS_PARALLEL_EXECUTE to measure that overhead.
First I’ll create a tablespace that spans multiple files (to give an easy way to get some simple ROWID ranges)
SQL> create tablespace demo datafile
2 'X:\ORACLE\ORADATA\DB19\PDB1\DEMO01.DBF' size 4g,
3 'X:\ORACLE\ORADATA\DB19\PDB1\DEMO02.DBF' size 4g,
4 'X:\ORACLE\ORADATA\DB19\PDB1\DEMO03.DBF' size 4g,
5 'X:\ORACLE\ORADATA\DB19\PDB1\DEMO04.DBF' size 4g
6 extent management local uniform size 32m;
Tablespace created.
And I’ll put a 1 billion row table into it. This will be our candidate for DBMS_PARALLEL_EXECUTE
SQL> create table tx nologging pctfree 0 tablespace demo as
2 select d.*
3 from
4 ( select rownum x, rownum y from dual connect by level <= 100000 ) d,
5 ( select 1 from dual connect by level <= 10000 );
Table created.
DBMS_PARALLEL_EXECUTE scans the data dictionary to produce ROWID ranges based on the number of chunks you nominate. I’ll mimic this with a simple query against DBA_EXTENTS and then use DBMS_ROWID to build some ROWIDs that will be used to do our scan.
SQL> with ext as
2 (
3 select o.data_object_id,e.relative_fno, min(e.block_id) start_block_id, max(e.block_id+e.blocks) end_block_id
4 from dba_extents e
5 , dba_objects o
6 where e.owner = o.owner
7 and e.segment_name = o.object_name
8 and nvl(e.partition_name, '"') = nvl(o.subobject_name, '"')
9 and e.segment_type = o.object_type
10 and e.segment_type in ('TABLE', 'TABLE PARTITION', 'TABLE SUBPARTITION')
11 and e.owner = user
12 and e.segment_name = 'TX'
13 group by
14 data_object_id,
15 relative_fno
16 )
17 select
18 relative_fno,
19 dbms_rowid.rowid_create(
20 rowid_type=>1
21 ,object_number=>data_object_id
22 ,relative_fno=>relative_fno
23 ,block_number=>start_block_id
24 ,row_number=>0
25 ) start_rowid,
26 dbms_rowid.rowid_create(
27 rowid_type=>1
28 ,object_number=>data_object_id
29 ,relative_fno=>relative_fno
30 ,block_number=>end_block_id
31 ,row_number=>32767
32 ) end_rowid
33 from ext ;
RELATIVE_FNO START_ROWID END_ROWID
------------ ------------------ ------------------
114 AAhFjhAByAAAACAAAA AAhFjhAByAABxCAH//
115 AAhFjhABzAAAACAAAA AAhFjhABzAABwCAH//
117 AAhFjhAB1AAAACAAAA AAhFjhAB1AABxCAH//
116 AAhFjhAB0AAAACAAAA AAhFjhAB0AABxCAH//
I can now use a NESTED LOOP to loop through these 4 ROWID pairs and scan the 4 chunks of my large table.
SQL> set timing on
SQL> with ext as
2 (
3 select /*+ materialize */ o.data_object_id,e.relative_fno, min(e.block_id) start_block_id, max(e.block_id+e.blocks) end_block_id
4 from dba_extents e
5 , dba_objects o
6 where e.owner = o.owner
7 and e.segment_name = o.object_name
8 and nvl(e.partition_name, '"') = nvl(o.subobject_name, '"')
9 and e.segment_type = o.object_type
10 and e.segment_type in ('TABLE', 'TABLE PARTITION', 'TABLE SUBPARTITION')
11 and e.owner = user
12 and e.segment_name = 'TX'
13 group by
14 data_object_id,
15 relative_fno
16 ),
17 ranges as
18 (
19 select /*+ materialize */
20 dbms_rowid.rowid_create(
21 rowid_type=>1
22 ,object_number=>data_object_id
23 ,relative_fno=>relative_fno
24 ,block_number=>start_block_id
25 ,row_number=>0
26 ) start_rowid,
27 dbms_rowid.rowid_create(
28 rowid_type=>1
29 ,object_number=>data_object_id
30 ,relative_fno=>relative_fno
31 ,block_number=>end_block_id
32 ,row_number=>32767
33 ) end_rowid
34 from ext
35 )
36 select /*+ leading(ranges tx) use_nl(tx) */ max(y)
37 from ranges, tx
38 where tx.rowid >= ranges.start_rowid
39 and tx.rowid < ranges.end_rowid;
MAX(Y)
----------
100000
Elapsed: 00:00:29.63
It took just under 30 seconds to scan the table using the 4 ROWID chunks. Lets compare that to a single pass through the table
SQL> select max(y)
2 from tx;
MAX(Y)
----------
100000
Elapsed: 00:00:28.98
SQL>
Thus the overhead is negligible for the ROWID case. This makes sense because a full table scan is simply the database cycling through the list of extents for that segment, which maps very closely to starting/ending ROWIDs anyway.
However, if you opt for a different mechanism, for example, starting and ending values of a column in the table, then your performance is totally at the whim of how efficient those queries will be. If they map to partition keys for example, then you can expect good performance. But if they map to unindexed columns or even indexed columns that span large ranges of data, then you might be in for slower performance per chunk.
This still might give better overall performance because of course, the entire motivation for using DBMS_PARALLEL_EXECUTE is to run tasks in parallel, but you might end up chewing up more resources to do so, compared to the serial case.