Put many columns in group by clause in Oracle SQL -
in oracle 11g database, suppose have table, customer , payment follows
customer
customer_id | customer_name | customer_age | customer_creation_date -------------------------------------------------------------------- 001 john 30 1 jan 2017 002 jack 10 2 jan 2017 003 jim 50 3 jan 2017 payment
customer_id | payment_id | payment_amount | ------------------------------------------- 001 900 100.00 001 901 200.00 001 902 300.00 003 903 999.00 we want write sql columns table customer sum of payment of each customer. there many possible ways i ask 1 of following better.
solution 1
select c.customer_id , max(c.customer_name) customer_name , max(c.customer_age) customer_age , max(c.customer_creation_date) customer_creation_date , sum(p.payment_amount) total_payment_amount customer c join payment p on (p.customer_id = c.customer_id) group c.customer_id; solution 2
select c.customer_id , c.customer_name , c.customer_age , c.customer_creation_date , sum(p.payment_amount) payment_amount customer c join payment p on (p.customer_id = c.customer_id) group c.customer_id, c.customer_name, c.customer_age, c.customer_creation_date please notice in solution 1 use max not because want max results, because want "one" row columns know equal rows same customer_id
while in solution 2, avoid putting misleading max in select part putting columns in group by part instead.
with current knowledge, prefer solution 1 because more important comprehend logic in group by part in select part. put set of unique keys express intention of query, application can infer expected number of rows. don't know performance.
i ask question because reviewing code change of big sql put 50 columns in group by clause because editor want avoid max function in select part. know can refactor query in someway avoid putting irrelevant columns in both group by , select part, please discard option because affect application logic , require more time test.
update
i have done test on big query in both versions suggested. query complex, has 69 lines involving more 20 tables , execution plan more 190 lines, think not place show it.
my production data quite small now, has 4000 customers , query run against whole database. table customer , few reference table has table access full in execution plan, others tables have access indexes. execution plans both versions have little bit difference in join algorithm (hash group by vs sort aggregate) on part.
both versions use 13 minutes, no significant difference.
i have done test on simplified versions similar sql in question. both version has same execution plan , elapse time.
with current information, think reasonable answer is unpredictable unless test decide quality of both versions optimizer job. appreciate if give information convince or reject idea.
another option is
select c.customer_id , c.customer_name , c.customer_age , c.customer_creation_date , p.payment_amount customer c join ( select customer_id, sum(payment_amount) payment_amount payment group customer_id ) p on (p.customer_id = c.customer_id) to decide 1 of 3 better test them , see execution plans.
Comments
Post a Comment