dataframe - R data.table: How to sum variables by group based on a condition? -
let's have following r data.table (though i'm happy work base r, data.frame well)
library(data.table) dt = data.table(category=c("first","first","first","second","third", "third", "second"), frequency=c(10,15,5,2,14,20,3), times = c(0, 0, 0, 3, 3, 1)) > dt category frequency times 1: first 10 0 2: first 15 0 3: first 5 0 4: second 2 3 5: third 14 3 6: third 20 1 7: second 3 0 if wished sum frequencies category, use following:
data[, sum(frequency), = category] however, let's wanted sum frequency category if , if times non-zero , not equal na?
how 1 make sum conditional based on values of separate column?
edit: apologies obvious question. quick addition: if elements of column strings?
e.g.
> dt category frequency times 1: first ten 0 2: first ten 0 3: first 5 0 4: second 5 3 5: third 5 3 6: third 5 1 7: second ten 0 sum() not calculate frequencies of ten versus five
remember logic of data.table: dt[i, j, by], take dt, subset rows using i, calculate j grouped by.
dt[times != 0 & !is.na(times), sum(frequency), = category] category v1 1: second 2 2: third 34
Comments
Post a Comment