date - R: replace NA by mean of non-NA values based on a multiple conditions (Hour, Day of Week and Temperature range) -
i have data frame 3 columns: "date/time", "power", "temperature", represents values of power measured, 10 10 minutes, , temperature of outdoor air. entries correspond whole year of data acquiring, result on large number of strings. sampling example:
df1 <- data.frame("date/hour" = c("2016-08-09 12:00", "2016-08-09 12:10", "2016-08-09 12:15", "2016-08-09 12:20"), "power" = c(500.21, 500.23, 500.24, 500.30), "day_week" = c("tuesday", "tuesday", "tuesday", "tuesday"), "temp" = c(21.1, 21.5, 21.3, 21.0))
the problem @ periods, due equipment problem, power value got na values. fill gap, want use mean of non-na values satisfy next 3 conditions:
- occur @ same hour.
- occur @ same day of week.
- (the intriging one) occur @ range of temperature of +/- 0.5, , increases 0.1 if there's no non-na values replace.
to solve problem, first of all, created fourth column in df1, transforms values of "date/hour" column "hour_minute", ignoring date.
df1$hour_minute <- format(df1$`date/hour`, "%h:%m")
temperature range calculated based on fixed range +/- 0.5 stored in fifth , sixth column of dataframe:
df$upper_temp <- dado_total$temp+0.5 df$lower_temp <- dado_total$temp-0.5
due impossibility of data frame mean calculation, created numeric vector (index) df1 power column:
index <- subset(df1, select = c(power)) %>% unlist() %>% as.numeric(as.character())
so created seventh column (df1$eq_power) gets mean of index vector based on 3 mentioned conditions using loop:
for (i in 1:length(df1$power)) { df1$eq_power[i] <- mean(index[which( df1$hour_minute==df1$hour_minute[i] & df1$day_week==df1$day_week[i] & df1$temp<=df1$upper_temp[i] & df1$temp>=df1$lower_temp[i])]) }
for command returns me profitable result, though large time expenditure , missing values didn't manage use incremental temperature ranges. if me, appreciate lot.
Comments
Post a Comment