r - Using dplyr to fill in missing values (through a join?) -
i have data frame (df1) has missing values (city, state):
siteid city statebasedin lat lon var1 var2 4227 richmond ky -39 -113 6 0 4987 nashville tn -33 -97 7 0 4000 newark nj -39 -95 8 0 4925 miami fl -40 -99 0 0 4437 montgomery al -32 -117 4 1 4053 jonesboro ar -30 -98 8 1 df1 <- structure(list(siteid = c(4227l, 4987l, 4000l, 4925l, 4437l, 4053l, 4482l, 4037l, 4020l, 1787l, 2805l, 3025l, 3027l, 3028l, 3029l, 3030l, 3031l, 3033l), city = structure(c(10l, 7l, 8l, 5l, 6l, 4l, 2l, 9l, 3l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = c("", "arcata", "jackson", "jonesboro", "miami", "montgomery", "nashville", "newark", "portland", "richmond"), class = "factor"), statebasedin = structure(c(6l, 10l, 8l, 5l, 2l, 3l, 4l, 9l, 7l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = c("", "al", "ar", "ca", "fl", "ky", "ms", "nj", "or", "tn"), class = "factor"), lat = c(-39l, -33l, -39l, -40l, -32l, -30l, -38l, -31l, -35l, -38l, -30l, -39l, -38l, -32l, -39l, -31l, -38l, -34l), lon = c(-113l, -97l, -95l, -99l, -117l, -98l, -98l, -95l, -112l, -120l, -114l, -81l, -117l, -90l, -109l, -115l, -81l, -104l), var1 = c(6l, 7l, 8l, 0l, 4l, 8l, 1l, 8l, 0l, 3l, 3l, 7l, 4l, 8l, 0l, 8l, 1l, 3l), var2 = c(0l, 0l, 0l, 0l, 1l, 1l, 1l, 1l, 1l, 0l, 1l, 1l, 0l, 0l, 1l, 0l, 1l, 0l)), .names = c("siteid", "city", "statebasedin", "lat", "lon", "var1", "var2"), class = "data.frame", row.names = c(na, -18l))
and fill values in merging data frame (df2) has 3 of same columns, not of columns in df1:
siteid city statebasedin 1787 lusby md 2805 springdale ar 3025 saukville wi 3027 saukville wi 3028 saukville wi 3029 saukville wi df2 <- structure(list(siteid = c(1787l, 2805l, 3025l, 3027l, 3028l, 3029l, 3030l, 3031l, 3033l), city = structure(c("lusby", "springdale", "saukville", "saukville", "saukville", "saukville", "saukville", "mequon", "mequon"), .dim = c(9l, 1l)), statebasedin = structure(c("md", "ar", "wi", "wi", "wi", "wi", "wi", "wi", "wi"), .dim = c(9l, 1l))), row.names = c(na, -9l), class = "data.frame", .names = c("siteid", "city", "statebasedin"))
so retain of information in df1, , input missing values available df2. i'm not familiar of dplyr options yet, tried different 'join' options had no luck. tried use 'merge' in base package still no success. there way (preferably dplyr)?
this solution not stylish, @ least solution.
library(dplyr) library(magrittr) aux <- df1 %>% # filter missing values filter(city == "") %>% # delete city , statebasedin columns # not duplicates after join select(-c(city, statebasedin)) %>% # inner join second dataframe inner_join(df2, = "siteid") %>% # change order of columns select(siteid, city, statebasedin, lat, lon, var1, var2) df1 %<>% # filter rows values not missing filter(city != "") %>% # bind auxiliary dataframe rbind(aux)
results in:
siteid city statebasedin lat lon var1 var2 1 4227 richmond ky -39 -113 6 0 2 4987 nashville tn -33 -97 7 0 3 4000 newark nj -39 -95 8 0 4 4925 miami fl -40 -99 0 0 5 4437 montgomery al -32 -117 4 1 6 4053 jonesboro ar -30 -98 8 1 7 4482 arcata ca -38 -98 1 1 8 4037 portland or -31 -95 8 1 9 4020 jackson ms -35 -112 0 1 10 1787 lusby md -38 -120 3 0 11 2805 springdale ar -30 -114 3 1 12 3025 saukville wi -39 -81 7 1 13 3027 saukville wi -38 -117 4 0 14 3028 saukville wi -32 -90 8 0 15 3029 saukville wi -39 -109 0 1 16 3030 saukville wi -31 -115 8 0 17 3031 mequon wi -38 -81 1 1 18 3033 mequon wi -34 -104 3 0
Comments
Post a Comment