dataframe - R data.table: How to sum variables by group based on a condition? -
let's have following r data.table
(though i'm happy work base r, data.frame well)
library(data.table) dt = data.table(category=c("first","first","first","second","third", "third", "second"), frequency=c(10,15,5,2,14,20,3), times = c(0, 0, 0, 3, 3, 1)) > dt category frequency times 1: first 10 0 2: first 15 0 3: first 5 0 4: second 2 3 5: third 14 3 6: third 20 1 7: second 3 0
if wished sum frequencies category, use following:
data[, sum(frequency), = category]
however, let's wanted sum frequency
category
if , if times
non-zero , not equal na
?
how 1 make sum conditional based on values of separate column?
edit: apologies obvious question. quick addition: if elements of column strings?
e.g.
> dt category frequency times 1: first ten 0 2: first ten 0 3: first 5 0 4: second 5 3 5: third 5 3 6: third 5 1 7: second ten 0
sum()
not calculate frequencies of ten
versus five
remember logic of data.table
: dt[i, j, by]
, take dt
, subset rows using i
, calculate j
grouped by
.
dt[times != 0 & !is.na(times), sum(frequency), = category] category v1 1: second 2 2: third 34
Comments
Post a Comment