Which two statements are true about the bloom filter in the execution plan?
A.
The bloom filter prevents all rows from table T1 that do not join T2 from being needlessly
distributed.
B.
The bloom filter prevents all rows from table T2 that do not join table T1 from being needlessly
distributed.
C.
The bloom filter prevents some rows from table T2 that do not join table T1 from being
needlessly distributed.
D.
The bloom filter is created in parallel by the set of parallel execution processes that scanned
table T2.
E.
The bloom filter is created in parallel by the set of parallel execution processes that later
perform join.
F.
The bloom filter is created in parallel by the set of parallel execution processes that scanned
table T1.
Explanation:
* PX JOIN FILTER CREATE
The bloom filter is created in line 4.
* PX JOIN FILTER USE
The bloom filter is used in line 11.
Note:
* You can identify a bloom pruning in a plan when you see :BF0000 in the Pstart and Pstop
columns of the execution plan and PART JOIN FILTER CREATE in the operations column.
* A Bloom filter is a probabilistic algorithm for doing existence tests in less memory than a full list
of keys would require. In other words, a Bloom filter is a method for representing a set of n
elements (also called keys) to support membership queries.
* The Oracle database makes use of Bloom filters in the following 4 situations:
– To reduce data communication between slave processes in parallel joins: mostly in RAC
– To implement join-filter pruning: in partition pruning, the optimizer analyzes FROM and WHERE
clauses in SQL statements to eliminate unneeded partitions when building the partition access list
– To support result caches: when you run a query, Oracle will first see if the results of that query
have already been computed and cached by some session or user, and if so, it will retrieve the
answer from the server result cache instead of gathering all of the database blocks
– To filter members in different cells in Exadata: Exadata performs joins between large tables and
small lookup tables, a very common scenario for data warehouses with star schemas. This is
implemented using Bloom filters as to determine whether a row is a member of the desired result
set.
B is incorrect. Bloom filter may produce false positives, that is it cannot prevent all rows.
C and E are corrects
Agree, C and E correct.
The example is taken from the article “Bloom filters” by Christian Antognini.
http://antognini.ch/papers/BloomFilters20080620.pdf
E & B are correct:
According to the PDF mentioned above, and also by looking at the execution plan, Q1,02 (which carries out the join eventually) creates the bloom filter.
B: The purpose of using the Bloom Filter is to avoid sending ANY/ALL rows from T2 that won’t find a match in T1, but it will also send the rows that MIGHT find a match; i.e. False Positive. Why should C be correct? Why should “some” of the T2 rows “that do not join T1” be sent to the Q1,02?! What’s the point of that?!
C & E are correct