Calculating a Score via Maximum Value of one column depending on another in a list of data frames in R -


i have list of dataframes (former csv files) need format. did in formula. want calculate score value in each data frame. each data frame looks this:

country year [indicator] afghanistan 2007 10.000 andorra 2008 18.000 andorra 2009 20.000 anguilla 2007 15.000 anguilla 2009 1.000 

...

so there indicator each country , each year (some countries not have data every year)

i want add new column score per year. in 2007 country highest score gets 10 , others less. calculate via: score = (value of country in year / maximum value of countries in year) * 10

i've done this:

importger_max <- aggregate(import ~ year, data = importger_2, max) importger_max[which(importger_max$year == 2007),2] x <- 2006  while(x < 2016) {x <- x+1; importger_2$importgerrank[importger_2$year == x] <- importger_max[which(importger_max$year == x),2]}  importger_2$importgerscore <- (importger_2$import/importger_2$importgerrank) * 10 

this surely not elegant way, works. alas, 1 specific dataset, want work on list of dataframes.

my code creating list , formatting csv files follows:

mycsv <- dir(pattern="*csv") n <- length(mycsv) mylist <- vector("list", n)  for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i], header = t, sep = ";")  mylist <- lapply(mylist, function(x) {    df_join <- join(x, translate, = c("country"), type = c("left"), match = "all")   df_join2 <- join(template, df_join, = c("standard"), type = c("left"), match = "all")   df_join2$country <- null   df_join2[,3] <- as.numeric(as.character(df_join2[,3]))   df_join2[is.na(df_join2)] <- -10000    return(df_join2)   }) 

i not manage include code calculating score in formula data frames.

edit: use plyr package join function.

edit2: found solution myself, see code below if interested:

>     mycsv <- dir(pattern="*csv") >     n <- length(mycsv) >     mylist <- vector("list", n) >      >      >     >     for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i], header = t, sep = ";") >      >     mylist <- lapply(mylist, function(x) { >        >       df_join <- join(x, translate, = c("country"), type = c("left"), match = "all") >       df_join2 <- join(template, df_join, = c("standard"), type = c("left"), match = "all") >       df_join2$country <- null >       df_join2[,3] <- as.numeric(as.character(df_join2[,3])) >       df_join2[is.na(df_join2)] <- -10000 >       colnames(df_join2)[3] <- "indicator" >       df_joinmax <- aggregate(indicator ~ year, data = df_join2, max) >        >       x <- 2006 >        >       while(x < 2020) {x <- x+1; >       df_join2$rank[df_join2$year == x] <- df_joinmax[which(df_joinmax$year == x),2]} >        >       df_join2$score <- (df_join2$indicator/df_join2$rank) * 10 >       >       return(df_join2) >       }) 

again, bet there better solutions start, gives output wanted.

does work?

# load library library(dplyr)  # calculate score reformat.df <- function(file.name){   df <- read.csv(file.name, header = t, sep = ";")    df %>%      group_by(.[[2]]) %>%      mutate(score = 10 * .[[3]]/max(.[[3]])) %>%      ungroup() }  # csv files mylist <- lapply(as.list(dir(pattern = ".csv")), reformat.df) 

Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -