r - data.table: calculate percentile for all numeric variables -


i have data

set.seed(1) dt <- data.table(id = c("a", "a", "b", "b","c", "c"),               var1 = c(1:6),              var2 = rnorm(6))  > dt    id var1       var2 1:     1 -0.6264538 2:     2  0.1836433 3:  b    3 -0.8356286 4:  b    4  1.5952808 5:  c    5  0.3295078 6:  c    6 -0.8204684 

but with dozens of numeric variables. i'd calculate percentile each observation , every numeric variable using data.table, while keeping key identifier (id) intact. in dplyr this:

mutate_if(dt, is.numeric, function(x) { ecdf(x)(x) })    id      var1      var2 1  0.1666667 0.5000000 2  0.3333333 0.6666667 3  b 0.5000000 0.1666667 4  b 0.6666667 1.0000000 5  c 0.8333333 0.8333333 6  c 1.0000000 0.3333333 

i happy result including original var1 , var2.

what best way approach this?

thanks help!

you calculate ecdf numeric columns in separate data table this:

dt2 = as.data.table(lapply(dt,function(x){if(is.numeric(x)){ecdf(x)(x)}})) 

result:

> dt2         var1      var2 1: 0.1666667 0.8333333 2: 0.3333333 0.3333333 3: 0.5000000 0.6666667 4: 0.6666667 1.0000000 5: 0.8333333 0.1666667 6: 1.0000000 0.5000000 

if want cbind result original dt, change column names using paste0:

colnames(dt2) = paste0("centile_",colnames(dt2)) 

result:

> dt2    centile_var1 centile_var2 1:    0.1666667    0.8333333 2:    0.3333333    0.3333333 3:    0.5000000    0.6666667 4:    0.6666667    1.0000000 5:    0.8333333    0.1666667 6:    1.0000000    0.5000000 

Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -