Find unique entities with multiple UUID identifiers in redshift -


having event table multiple types of uuid's per user, come way stitch uuids highest possible definition of single user.

for example:

   uuid1 | uuid2    1         1         2         2      b    3      c    4      c 

there 2 users here, first 1 uuid1={1,2} , uuid2={a,b}, second 1 uuid1={3,4} , uuid2={c}. these chains potentially long. there no intersections (i.e. 1c doesn't exist) , rows timestamp ordered.

is there way in redshift generate these unique "guest" identifiers without creating immense query many joins?

thanks in advance!

create test data table

-- drop table uuid_test; create temp table uuid_test           select   1 row_id, 1::int uuid1, 'a'::char(1) uuid2 union select   2 row_id, 1::int uuid1, 'a'::char(1) uuid2 union select   3 row_id, 2::int uuid1, 'a'::char(1) uuid2 union select   4 row_id, 2::int uuid1, 'b'::char(1) uuid2 union select   5 row_id, 3::int uuid1, 'c'::char(1) uuid2 union select   6 row_id, 4::int uuid1, 'c'::char(1) uuid2 union select   7 row_id, 4::int uuid1, 'd'::char(1) uuid2 union select   8 row_id, 5::int uuid1, 'e'::char(1) uuid2 union select   9 row_id, 6::int uuid1, 'e'::char(1) uuid2 union select  10 row_id, 6::int uuid1, 'f'::char(1) uuid2 union select  11 row_id, 7::int uuid1, 'f'::char(1) uuid2 union select  12 row_id, 8::int uuid1, 'g'::char(1) uuid2 union select  13 row_id, 8::int uuid1, 'h'::char(1) uuid2 ; 

the actual problem solved using strict ordering find every place unique user changes, capturing lookup table , applying original data.

-- create lookup table from-to range of ids each unique user unique_user (  -- calculate end of id range using lead() ahead -- use inline max() find ending id last entry  select row_id from_id      , nvl(lead(row_id,1) on (order row_id)-1, (select max(row_id) uuid_test) ) to_id      , unique_uuid  -- mark unique user change when there discontinuity in either uuid (select row_id             ,case when nvl(lag(uuid1,1) on (order row_id),  0) <> uuid1                    , nvl(lag(uuid2,1) on (order row_id), '') <> uuid2              md5(uuid1||uuid2)              else null end unique_uuid       uuid_test) t unique_uuid not null order row_id )  -- apply unique user value each row using range join lookup table select a.row_id, a.uuid1, a.uuid2, b.unique_uuid uuid_test   join unique_user b   on a.row_id between b.from_id , b.to_id order a.row_id ; 

here's output

 row_id | uuid1 | uuid2 |           unique_uuid --------+-------+-------+----------------------------------       1 |     1 |     | efaa153b0f682ae5170a3184fa0df28c       2 |     1 |     | efaa153b0f682ae5170a3184fa0df28c       3 |     2 |     | efaa153b0f682ae5170a3184fa0df28c       4 |     2 | b     | efaa153b0f682ae5170a3184fa0df28c       5 |     3 | c     | 5fcfcb7df376059d0075cb892b2cc37f       6 |     4 | c     | 5fcfcb7df376059d0075cb892b2cc37f       7 |     4 | d     | 5fcfcb7df376059d0075cb892b2cc37f       8 |     5 | e     | 18a368e1052b5aa0388ef020dd9a1e20       9 |     6 | e     | 18a368e1052b5aa0388ef020dd9a1e20      10 |     6 | f     | 18a368e1052b5aa0388ef020dd9a1e20      11 |     7 | f     | 18a368e1052b5aa0388ef020dd9a1e20      12 |     8 | g     | 321fcc2447163a81d470b9353e394121      13 |     8 | h     | 321fcc2447163a81d470b9353e394121 

Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -