Calculating a Score via Maximum Value of one column depending on another in a list of data frames in R -
i have list of dataframes (former csv files) need format. did in formula. want calculate score value in each data frame. each data frame looks this:
country year [indicator] afghanistan 2007 10.000 andorra 2008 18.000 andorra 2009 20.000 anguilla 2007 15.000 anguilla 2009 1.000 ...
so there indicator each country , each year (some countries not have data every year)
i want add new column score per year. in 2007 country highest score gets 10 , others less. calculate via: score = (value of country in year / maximum value of countries in year) * 10
i've done this:
importger_max <- aggregate(import ~ year, data = importger_2, max) importger_max[which(importger_max$year == 2007),2] x <- 2006 while(x < 2016) {x <- x+1; importger_2$importgerrank[importger_2$year == x] <- importger_max[which(importger_max$year == x),2]} importger_2$importgerscore <- (importger_2$import/importger_2$importgerrank) * 10 this surely not elegant way, works. alas, 1 specific dataset, want work on list of dataframes.
my code creating list , formatting csv files follows:
mycsv <- dir(pattern="*csv") n <- length(mycsv) mylist <- vector("list", n) for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i], header = t, sep = ";") mylist <- lapply(mylist, function(x) { df_join <- join(x, translate, = c("country"), type = c("left"), match = "all") df_join2 <- join(template, df_join, = c("standard"), type = c("left"), match = "all") df_join2$country <- null df_join2[,3] <- as.numeric(as.character(df_join2[,3])) df_join2[is.na(df_join2)] <- -10000 return(df_join2) }) i not manage include code calculating score in formula data frames.
edit: use plyr package join function.
edit2: found solution myself, see code below if interested:
> mycsv <- dir(pattern="*csv") > n <- length(mycsv) > mylist <- vector("list", n) > > > > for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i], header = t, sep = ";") > > mylist <- lapply(mylist, function(x) { > > df_join <- join(x, translate, = c("country"), type = c("left"), match = "all") > df_join2 <- join(template, df_join, = c("standard"), type = c("left"), match = "all") > df_join2$country <- null > df_join2[,3] <- as.numeric(as.character(df_join2[,3])) > df_join2[is.na(df_join2)] <- -10000 > colnames(df_join2)[3] <- "indicator" > df_joinmax <- aggregate(indicator ~ year, data = df_join2, max) > > x <- 2006 > > while(x < 2020) {x <- x+1; > df_join2$rank[df_join2$year == x] <- df_joinmax[which(df_joinmax$year == x),2]} > > df_join2$score <- (df_join2$indicator/df_join2$rank) * 10 > > return(df_join2) > }) again, bet there better solutions start, gives output wanted.
does work?
# load library library(dplyr) # calculate score reformat.df <- function(file.name){ df <- read.csv(file.name, header = t, sep = ";") df %>% group_by(.[[2]]) %>% mutate(score = 10 * .[[3]]/max(.[[3]])) %>% ungroup() } # csv files mylist <- lapply(as.list(dir(pattern = ".csv")), reformat.df)
Comments
Post a Comment