r - Delete rows without a full year data -


i got big data set contains monthly returns of given stock. i'd delete rows not have full year data. subset of data shown below example:

date        return  year         9/1/2009    0.71447 2009 10/1/2009   0.48417 2009 11/1/2009   0.90753 2009 12/1/2009   -0.7342 2009 1/1/2010    0.83293 2010 2/1/2010    0.18279 2010 3/1/2010    0.19416 2010 4/1/2010    0.38907 2010 5/1/2010    0.37834 2010 6/1/2010    0.6401  2010 7/1/2010    0.62079 2010 8/1/2010    0.42128 2010 9/1/2010    0.43117 2010 10/1/2010   0.42307 2010 11/1/2010   -0.1994 2010 12/1/2010   -0.2252 2010 

ideally, code remove first 4 observations since don't have full year of observation.

the op has requested remove rows large data set of monthly values not make full year. although solution suggested wen seems working op suggest more robust approach.

wen's solution counts number of rows per year assuming there one row per month. more robust count number of unique months per year in case there duplicate entries in production data set. (from experience, 1 cannot careful enough when dealing production data , check assumptions).

library(data.table) # count number of unique months per year, # keep complete years, omit counts # result data.table 1 column year full_years <- dt[, uniquen(month(date)), = year][v1 == 12l, -"v1"] full_years 
   year 1: 2010 
# right join original table, rows belonging full year returned dt[full_years, on = "year"] 
          date   return year  1: 2010-01-01  0.83293 2010  2: 2010-02-01  0.18279 2010  3: 2010-03-01  0.19416 2010  4: 2010-04-01  0.38907 2010  5: 2010-05-01  0.37834 2010  6: 2010-06-01  0.64010 2010  7: 2010-07-01  0.62079 2010  8: 2010-08-01  0.42128 2010  9: 2010-09-01  0.43117 2010 10: 2010-10-01  0.42307 2010 11: 2010-11-01 -0.19940 2010 12: 2010-12-01 -0.22520 2010 

note approach avoids add count column each row of potentially large data set.

the code can written more concisely as:

dt[dt[, uniquen(month(date)), = year][v1 == 12l, -"v1"], on = "year"] 

it possible check data duplicate months, e.g.,

stopifnot(all(dt[, .n, = .(year, month(date))]$n == 1l)) 

this code counts number of occurrences each year , month , halts execution when there more one.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -