r - How to properly return character values for dplyr's do? -

consider following code:

foo <- function() {   if (runif(1) < 0.5) {     return(data.frame(result="low"))   } else {     return(data.frame(result="high"))   } }  df = data.frame(val=c(1,2,3,4,5,6)) df %>% group_by(val) %>% do(foo()) 

it random, if there both "low" , "high" results returned, you'll see errors this:

warning messages: 1: in bind_rows_(x, .id) : unequal factor levels: coercing character 2: in bind_rows_(x, .id) :   binding character , factor vector, coercing character vector 3: in bind_rows_(x, .id) :   binding character , factor vector, coercing character vector 4: in bind_rows_(x, .id) :   binding character , factor vector, coercing character vector 5: in bind_rows_(x, .id) :   binding character , factor vector, coercing character vector 

i believe first value being returned (say, "low") converted factor 1 level, , when other level comes along, incurs dplyr's wrath.

what proper way code example avoid warnings?

edit: 1 solution this:

foo <- function() {   if (runif(1) < 0.5) {     return(data.frame(result=factor("low", levels=c("low", "high"))))   } else {     return(data.frame(result=factor("high", levels=c("low", "high"))))   } } 

but if don't know factor levels ahead of time?

also, more fundamentally, i'd return character vector, not factor.


  • use stringsasfactors=false: return(data.frame(..., stringsasfactors=false))


  • use data_frame: return(data_frame(...))

see ?data.frame more factor treatment.


Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -