r - Aggregate values for all combinations of factor levels including missing ones -
i'm trying find minimum value of dataframe based on multiple columns. i'm able using aggregate function below. however, result not contain combinations of factors there no data in input data frame.
what i've got:
# possibilities of fruits, cities, , vegetables: fruits<-c('apple','banana','grape') cities<-c('new york','chicago','los angeles') vegetables<-c('cucumber','mushroom') #my input (ie, sample test: inputdf<-data.frame(fruit=c('apple','apple','apple','banana','banana','banana','grape','grape','grape'),city=c('new york','new york','new york','new york','chicago','los angeles','chicago','chicago','chicago'),vegetable=c('cucumber','cucumber','mushroom','cucumber','mushroom','mushroom','cucumber','cucumber','cucumber'),value=c(5,3,4,6,5,7,2,7,4)) #my aggregation: outdf<-aggregate(value ~ fruit + city + vegetable,inputdf,function(x) min(x))
the output is:
fruit city vegetable value grape chicago cucumber 2 apple new york cucumber 3 banana new york cucumber 6 banana chicago mushroom 5 banana los angeles mushroom 7 apple new york mushroom 4
this correct, however, want rows correspond combinations of columns didnt exist @ in input df:
fruit city vegetable value apple new york cucumber 3 apple new york mushroom 4 apple chicago cucumber na apple chicago mushroom na apple los angeles cucumber na apple los angeles mushroom na banana new york cucumber 6 banana new york mushroom na banana chicago cucumber na banana chicago mushroom 5 banana los angeles cucumber na banana los angeles mushroom 7 grape new york cucumber na grape new york mushroom na grape chicago cucumber 2 grape chicago mushroom na grape los angeles cucumber na grape los angeles mushroom na
i'd able number of columns on combine. there simple way that? reason want output because need transform nas specific value , average values on same subsets again. thanks!
you can using expand.grid
generate combinations, using merge
:
outdf<-aggregate(value ~ fruit + city + vegetable,inputdf,function(x) min(x)) df=expand.grid(fruits, cities, vegetables) outdf=merge(outdf,df,by.x=c('fruit','city','vegetable'),by.y=c('var1','var2','var3'),all.y=t) > outdf fruit city vegetable value 1 apple chicago cucumber na 2 apple chicago mushroom na 3 apple los angeles cucumber na 4 apple los angeles mushroom na 5 apple new york cucumber 3 6 apple new york mushroom 4 7 banana chicago cucumber na 8 banana chicago mushroom 5 9 banana los angeles cucumber na 10 banana los angeles mushroom 7 11 banana new york cucumber 6 12 banana new york mushroom na 13 grape chicago cucumber 2 14 grape chicago mushroom na 15 grape los angeles cucumber na 16 grape los angeles mushroom na 17 grape new york cucumber na 18 grape new york mushroom na
Comments
Post a Comment