Find unique entities with multiple UUID identifiers in redshift -
having event table multiple types of uuid's per user, come way stitch uuids highest possible definition of single user.
for example:
uuid1 | uuid2 1 1 2 2 b 3 c 4 c there 2 users here, first 1 uuid1={1,2} , uuid2={a,b}, second 1 uuid1={3,4} , uuid2={c}. these chains potentially long. there no intersections (i.e. 1c doesn't exist) , rows timestamp ordered.
is there way in redshift generate these unique "guest" identifiers without creating immense query many joins?
thanks in advance!
create test data table
-- drop table uuid_test; create temp table uuid_test select 1 row_id, 1::int uuid1, 'a'::char(1) uuid2 union select 2 row_id, 1::int uuid1, 'a'::char(1) uuid2 union select 3 row_id, 2::int uuid1, 'a'::char(1) uuid2 union select 4 row_id, 2::int uuid1, 'b'::char(1) uuid2 union select 5 row_id, 3::int uuid1, 'c'::char(1) uuid2 union select 6 row_id, 4::int uuid1, 'c'::char(1) uuid2 union select 7 row_id, 4::int uuid1, 'd'::char(1) uuid2 union select 8 row_id, 5::int uuid1, 'e'::char(1) uuid2 union select 9 row_id, 6::int uuid1, 'e'::char(1) uuid2 union select 10 row_id, 6::int uuid1, 'f'::char(1) uuid2 union select 11 row_id, 7::int uuid1, 'f'::char(1) uuid2 union select 12 row_id, 8::int uuid1, 'g'::char(1) uuid2 union select 13 row_id, 8::int uuid1, 'h'::char(1) uuid2 ; the actual problem solved using strict ordering find every place unique user changes, capturing lookup table , applying original data.
-- create lookup table from-to range of ids each unique user unique_user ( -- calculate end of id range using lead() ahead -- use inline max() find ending id last entry select row_id from_id , nvl(lead(row_id,1) on (order row_id)-1, (select max(row_id) uuid_test) ) to_id , unique_uuid -- mark unique user change when there discontinuity in either uuid (select row_id ,case when nvl(lag(uuid1,1) on (order row_id), 0) <> uuid1 , nvl(lag(uuid2,1) on (order row_id), '') <> uuid2 md5(uuid1||uuid2) else null end unique_uuid uuid_test) t unique_uuid not null order row_id ) -- apply unique user value each row using range join lookup table select a.row_id, a.uuid1, a.uuid2, b.unique_uuid uuid_test join unique_user b on a.row_id between b.from_id , b.to_id order a.row_id ; here's output
row_id | uuid1 | uuid2 | unique_uuid --------+-------+-------+---------------------------------- 1 | 1 | | efaa153b0f682ae5170a3184fa0df28c 2 | 1 | | efaa153b0f682ae5170a3184fa0df28c 3 | 2 | | efaa153b0f682ae5170a3184fa0df28c 4 | 2 | b | efaa153b0f682ae5170a3184fa0df28c 5 | 3 | c | 5fcfcb7df376059d0075cb892b2cc37f 6 | 4 | c | 5fcfcb7df376059d0075cb892b2cc37f 7 | 4 | d | 5fcfcb7df376059d0075cb892b2cc37f 8 | 5 | e | 18a368e1052b5aa0388ef020dd9a1e20 9 | 6 | e | 18a368e1052b5aa0388ef020dd9a1e20 10 | 6 | f | 18a368e1052b5aa0388ef020dd9a1e20 11 | 7 | f | 18a368e1052b5aa0388ef020dd9a1e20 12 | 8 | g | 321fcc2447163a81d470b9353e394121 13 | 8 | h | 321fcc2447163a81d470b9353e394121
Comments
Post a Comment