customer

customer_id | customer_name | customer_age | customer_creation_date -------------------------------------------------------------------- 001                     john             30              1 jan 2017 002                     jack             10              2 jan 2017 003                      jim             50              3 jan 2017

payment

customer_id | payment_id | payment_amount |  ------------------------------------------- 001                   900            100.00 001                   901            200.00 001                   902            300.00 003                   903            999.00

we want write sql columns table customer sum of payment of each customer. there many possible ways i ask 1 of following better.

solution 1

select c.customer_id , max(c.customer_name) customer_name , max(c.customer_age) customer_age , max(c.customer_creation_date) customer_creation_date , sum(p.payment_amount) total_payment_amount customer c join payment p on (p.customer_id = c.customer_id) group c.customer_id;

solution 2

select c.customer_id , c.customer_name , c.customer_age , c.customer_creation_date , sum(p.payment_amount) payment_amount customer c join payment p on (p.customer_id = c.customer_id) group c.customer_id, c.customer_name, c.customer_age, c.customer_creation_date

please notice in solution 1 use max not because want max results, because want "one" row columns know equal rows same customer_id

while in solution 2, avoid putting misleading max in select part putting columns in group by part instead.

with current knowledge, prefer solution 1 because more important comprehend logic in group by part in select part. put set of unique keys express intention of query, application can infer expected number of rows. don't know performance.

i ask question because reviewing code change of big sql put 50 columns in group by clause because editor want avoid max function in select part. know can refactor query in someway avoid putting irrelevant columns in both group by , select part, please discard option because affect application logic , require more time test.

update

i have done test on big query in both versions suggested. query complex, has 69 lines involving more 20 tables , execution plan more 190 lines, think not place show it.

my production data quite small now, has 4000 customers , query run against whole database. table customer , few reference table has table access full in execution plan, others tables have access indexes. execution plans both versions have little bit difference in join algorithm (hash group by vs sort aggregate) on part.

both versions use 13 minutes, no significant difference.

i have done test on simplified versions similar sql in question. both version has same execution plan , elapse time.

with current information, think reasonable answer is unpredictable unless test decide quality of both versions optimizer job. appreciate if give information convince or reject idea.

another option is

select c.customer_id , c.customer_name , c.customer_age , c.customer_creation_date , p.payment_amount customer c join (  select customer_id, sum(payment_amount) payment_amount  payment   group customer_id ) p on (p.customer_id = c.customer_id)

to decide 1 of 3 better test them , see execution plans.

Search This Blog

LP

Put many columns in group by clause in Oracle SQL -

customer

payment

solution 1

solution 2

update

Comments

Post a Comment

Popular posts from this blog

python - Passing parameters between cells in a Jupyter Noteboook -

javascript - generate date range base on integers -

PHP and MySQL WP -