Postgresql – Is it possible to optimize such a SQL query on a Postgres DB?

postgresql

Is there is a correct way to optimize such a query? Currently it takes around 250 ms for an execution. It is used for the duplication check. The execution plan is here https://explain.depesz.com/s/CRno. Thanks in advance.

select case when count(claim0_.clm_id_int)>0 then true else false end as col_0_0_ 
from claim_clm claim0_ 
inner join sup_mbr_sbp supplierbu1_ 
    on claim0_.clm_sbp_int_fk=supplierbu1_.sbp_id_int 
inner join merchant_program_mbr merchantprogr2_ 
    on supplierbu1_.sbp_mbr_int_fk=merchantprogr2_.mbr_id_int 
inner join merchant_buy merchant4_ 
    on merchantprogr2_.mbr_buy_int_fk=merchant4_.buy_id_int 
inner join supplier_sup supplier5_ 
    on supplierbu1_.sbp_sup_int_fk=supplier5_.sup_id_int 
inner join channel_cha channel3_ 
    on merchantprogr2_.mbr_cha_int_fk=channel3_.cha_id_int 
left outer join currency_cur currency6_ 
    on claim0_.clm_cur_int_fk=currency6_.cur_id_int 
where claim0_.clm_active_bln=true
and claim0_.clm_original_invoice_dtm='2019-11-25 00:00:00'
and claim0_.clm_original_maturity_dtm='2020-05-05 00:00:00'
and claim0_.clm_certified_amount_dbl=100
and supplierbu1_.sbp_id_int=2604 and currency6_.cur_id_int=3
and (claim0_.clm_status in ('NEW' ,  'VALIDATION_FAILED'))
and (claim0_.clm_supplier_claim_reference_str=NULL
    and supplier5_.sup_id_int=2620
    and supplier5_.sup_sup_clm_ref_in_dup_check_bln=true
    or claim0_.clm_merchant_claim_reference_str=''
    and merchant4_.buy_id_int=2055
    and merchant4_.buy_buy_clm_ref_in_dup_check_bln=true
    or channel3_.cha_duplication_check_type_str='CheckType1'
    and claim0_.clm_reference_str='TESTD00016121901' 
    and merchant4_.buy_id_int=2055
    or channel3_.cha_duplication_check_type_str='CheckType2'
    and claim0_.clm_reference_str='TESTD00016121901'
    and channel3_.cha_id_int=2007);

`

Best Answer

Taking a top down-approach, you have a nested loop which retrieves 1,728,571 rows, only to immediately filter them down to 11,010 in the next node up. Maybe you can use an index to eliminate them before going through the work of fetching them. It isn't clear what index that would be, or if it currently exists or not, or even if it is possible. So I'd probably start by doing set enable_mergejoin=off, to force it choose a different plan. Maybe you already have the index, and the planner is just not using it because of mis-estimations.

Taking a bottom up approach, the bitmap heap scan is pretty slow (although not apparently the single slowest step). If you look at the filter part of it:

Filter: (clm_active_bln AND (clm_sbp_int_fk = 2604) AND (clm_cur_int_fk = 3) AND (clm_original_invoice_dtm = '2019-11-25 00:00:00'::timestamp without time zone) AND (clm_original_maturity_dtm = '2020-05-05 00:00:00'::timestamp without time zone) AND (clm_certified_amount_dbl = '100'::numeric) AND ((clm_status)::text = ANY ('{NEW, VALIDATION_FAILED}'::text[])))
Rows Removed by Filter: 73395

Fetching that much data just to filter out most of it probably isn't very fast, and there is a lot of material in that filter which could be done by an index. A compound index on every column mentioned (maybe except the boolean) would be a good start.