Mysql subquery much faster than join -


i have following queries both return same result , row count:

select * (                select unix_timestamp(network_time) * 1000 epoch_network_datetime,                        hbrl.business_rule_id,                        display_advertiser_id,                        hbrl.campaign_id,                        truncate(sum(coalesce(hbrl.ad_spend_network, 0))/100000.0, 2) demand_ad_spend_network,                        sum(coalesce(hbrl.ad_view, 0)) demand_ad_view,                        sum(coalesce(hbrl.ad_click, 0)) demand_ad_click,                        truncate(coalesce(case when sum(hbrl.ad_view) = 0 0 else 100*sum(hbrl.ad_click)/sum(hbrl.ad_view) end, 0), 2) ctr_percent,                        truncate(coalesce(case when sum(hbrl.ad_view) = 0 0 else sum(hbrl.ad_spend_network)/100.0/sum(hbrl.ad_view) end, 0), 2) ecpm,                       truncate(coalesce(case when sum(hbrl.ad_click) = 0 0 else sum(hbrl.ad_spend_network)/100000.0/sum(hbrl.ad_click) end, 0), 2) ecpc                 hourly_business_rule_level hbrl                (publisher_network_id = 31534)                 , network_time between str_to_date('2017-08-13 17:00:00.000000', '%y-%m-%d %h:%i:%s.%f') , str_to_date('2017-08-14 16:59:59.999000', '%y-%m-%d %h:%i:%s.%f')                 , (network_time not null , display_advertiser_id > 0)                group network_time, hbrl.campaign_id, hbrl.business_rule_id                having demand_ad_spend_network > 0                or demand_ad_view > 0                or demand_ad_click > 0                or ctr_percent > 0                or ecpm > 0                or ecpc > 0                order epoch_network_datetime) atb        left join dim_demand demand on atb.display_advertiser_id = demand.advertiser_dsp_id         , atb.campaign_id = demand.campaign_id         , atb.business_rule_id = demand.business_rule_id  

ran explain extended, , these results:

+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+ | id | select_type | table                      | type | possible_keys                                                                 | key     | key_len | ref             | rows    | filtered |                                        | +----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+ |  1 | primary     | <derived2>                 |  | null                                                                          | null    | null    | null            | 1451739 |   100.00 | null                                         | |  1 | primary     | demand                     | ref  | primary,join_index                                                            | primary | 4       | atb.campaign_id |       1 |   100.00 | using                                  | |  2 | derived     | hourly_business_rule_level |  | _hourly_business_rule_level_supply_idx,_hourly_business_rule_level_demand_idx | null    | null    | null            | 1494447 |    97.14 | using where; using temporary; using filesort | +----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+ 

and other is:

select unix_timestamp(network_time) * 1000 epoch_network_datetime,         hbrl.business_rule_id,         display_advertiser_id,         hbrl.campaign_id,         truncate(sum(coalesce(hbrl.ad_spend_network, 0))/100000.0, 2) demand_ad_spend_network,         sum(coalesce(hbrl.ad_view, 0)) demand_ad_view,         sum(coalesce(hbrl.ad_click, 0)) demand_ad_click,         truncate(coalesce(case when sum(hbrl.ad_view) = 0 0 else 100*sum(hbrl.ad_click)/sum(hbrl.ad_view) end, 0), 2) ctr_percent,         truncate(coalesce(case when sum(hbrl.ad_view) = 0 0 else sum(hbrl.ad_spend_network)/100.0/sum(hbrl.ad_view) end, 0), 2) ecpm,         truncate(coalesce(case when sum(hbrl.ad_click) = 0 0 else sum(hbrl.ad_spend_network)/100000.0/sum(hbrl.ad_click) end, 0), 2) ecpc  hourly_business_rule_level hbrl join dim_demand demand on hbrl.display_advertiser_id = demand.advertiser_dsp_id  , hbrl.campaign_id = demand.campaign_id  , hbrl.business_rule_id = demand.business_rule_id  (publisher_network_id = 31534)  , network_time between str_to_date('2017-08-13 17:00:00.000000', '%y-%m-%d %h:%i:%s.%f') , str_to_date('2017-08-14 16:59:59.999000', '%y-%m-%d %h:%i:%s.%f')  , (network_time not null , display_advertiser_id > 0) group network_time, hbrl.campaign_id, hbrl.business_rule_id having demand_ad_spend_network > 0 or demand_ad_view > 0 or demand_ad_click > 0  or ctr_percent > 0 or ecpm > 0 or ecpc > 0 order epoch_network_datetime; 

and these results second query:

+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+ | id | select_type | table                      | type | possible_keys                                                                 | key     | key_len | ref                                                           | rows    | filtered |                                        | +----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+ |  1 | simple      | hourly_business_rule_level |  | _hourly_business_rule_level_supply_idx,_hourly_business_rule_level_demand_idx | null    | null    | null                                                          | 1494447 |    97.14 | using where; using temporary; using filesort | |  1 | simple      | demand                     | ref  | primary,join_index                                                            | primary | 4       | my6sense_datawarehouse.hourly_business_rule_level.campaign_id |       1 |   100.00 | using where; using index                     | +----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+ 

the first 1 takes 2 seconds while second 1 takes on 2 minutes!

why second query taking long? missing here?

thanks.

one possible reason number of rows have joined second table.

the group clause , having clause limit number of rows returned subquery. rows used join.

without subquery clause limiting number of rows join. join done before group , having clauses processed. depending on group size , selectivity of having conditions there more rows need joined.

consider following simplified example:

we have table users 1000 entries , columns id, email.

create table users(     id smallint auto_increment primary key,     email varchar(50) unique ); 

then have (huge) log table user_actions 1,000,000 entries , columns id, user_id, timestamp, action

create table user_actions(     id mediumint auto_increment primary key,     user_id smallint not null,     timestamp timestamp,     action varchar(50),     index (timestamp, user_id) ); 

the task find users have @ least 900 entries in log table since 2017-02-01.

the subquery solution:

select a.user_id, a.cnt, u.email (     select a.user_id, count(*) cnt     user_actions     a.timestamp >= '2017-02-01 00:00:00'     group a.user_id     having cnt >= 900 ) left join users u on u.id = a.user_id 

the subquery returns 135 rows (users). rows joined users table. subquery runs in 0.375 seconds. time needed join zero, full query runs in 0.375 seconds.

solution without subquery:

select a.user_id, count(*) cnt, u.email user_actions left join users u on u.id = a.user_id a.timestamp >= '2017-02-01 00:00:00' group a.user_id having cnt >= 900 

the condition filters table 866,081 rows. join has done 866k rows. after join group , having clauses processed , limit result 135 rows. query needs 0.815 seconds.

so can see, subquery can improve performance.

but let's make things worse , drop primary key in users table. way have no index can used join. first query runs in 0.455 seconds. second query needs 40 seconds - 100 times slower.

notes

it's difficult if same applies case. reasons are:

  • your queries quite complex , far away from beeing mvce.
  • i don't see beeng selected demand table - it's unclear why joining @ all.
  • you use left join in 1 query , inner join in one.
  • the relation between 2 tables unclear.
  • no information indexes. should provide create statements (show create table_name).

test setup

drop table if exists users; create table users(     id smallint auto_increment primary key,     email varchar(50) unique )     select seq id, rand(1) email     seq_1_to_1000 ;   drop table if exists user_actions; create table user_actions(     id mediumint auto_increment primary key,     user_id smallint not null,     timestamp timestamp,     action varchar(50),     index (timestamp, user_id) )     select seq id         , floor(rand(2)*1000)+1 user_id         #, '2017-01-01 00:00:00' + interval seq*20 second timestamp         , from_unixtime(unix_timestamp('2017-01-01 00:00:00') + seq*20) timestamp         , rand(3) action     seq_1_to_1000000 ; 

mariadb 10.0.19 sequence plugin.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -