r - transposing a dataframe with repeats -
i have data frame has 2 columns, 1 gene symbols, , functional pathways. pathways column has repeated values there number of genes belong each pathway. reorder dataset each column single pathway , each row in columns gene belongs in pathway.
starting dataframe:
data.frame(pathway = c("p1", "p1", "p1", "p1", "p2", "p2", "p2"), gene.symbol = c("g1", "g2", "g3", "g4", "g33", "g43", "g10"))
desired dataframe:
data.frame(p1 = c("g1", "g2", "g3", "g4"), p2 = c("g33", "g43", "g10", ""))
i know not columns same length, , having blank values preferable nas.
here option.
- split list using pathway splitting element
- get max length of each group, , set other groups same length
- turn data frame
here code.
mydf <- data.frame(pathway = c("p1", "p1", "p1", "p1", "p2", "p2", "p2"), gene.symbol = c("g1", "g2", "g3", "g4", "g33", "g43", "g10")) # function run on each element in list set_to_max_length <- function(x) { length(x) <- max.length return(x) } # 1. split list mydf.split <- split(mydf$gene.symbol, mydf$pathway) # 2.a max length of columns max.length <- max(sapply(mydf.split, length)) # 2.b set each list element max length mydf.split.2 <- lapply(mydf.split, set_to_max_length) # 3. combine df data.frame(mydf.split.2)
edit
here option using tidyverse - more succinct:
library(tidyverse) mydf <- data.frame(pathway = c("p1", "p1", "p1", "p1", "p2", "p2", "p2"), gene.symbol = c("g1", "g2", "g3", "g4", "g33", "g43", "g10")) mydf %>% group_by(pathway) %>% mutate(rownum = row_number()) %>% ungroup() %>% spread(pathway, gene.symbol) %>% select(-1)
Comments
Post a Comment