r - transposing a dataframe with repeats -


i have data frame has 2 columns, 1 gene symbols, , functional pathways. pathways column has repeated values there number of genes belong each pathway. reorder dataset each column single pathway , each row in columns gene belongs in pathway.

starting dataframe:

data.frame(pathway = c("p1", "p1", "p1", "p1", "p2", "p2", "p2"),  gene.symbol = c("g1", "g2", "g3", "g4", "g33", "g43", "g10")) 

desired dataframe:

data.frame(p1 = c("g1", "g2", "g3", "g4"), p2 = c("g33", "g43", "g10",  "")) 

i know not columns same length, , having blank values preferable nas.

here option.

  1. split list using pathway splitting element
  2. get max length of each group, , set other groups same length
  3. turn data frame

here code.

mydf <- data.frame(pathway = c("p1", "p1", "p1", "p1", "p2", "p2", "p2"),             gene.symbol = c("g1", "g2", "g3", "g4", "g33", "g43", "g10"))  # function run on each element in list set_to_max_length <- function(x) {   length(x) <- max.length   return(x) }  # 1. split  list mydf.split <- split(mydf$gene.symbol, mydf$pathway)  # 2.a max length of columns max.length <- max(sapply(mydf.split, length))  # 2.b set each list element max length mydf.split.2 <- lapply(mydf.split, set_to_max_length)  # 3. combine df data.frame(mydf.split.2) 

edit

here option using tidyverse - more succinct:

library(tidyverse) mydf <- data.frame(pathway = c("p1", "p1", "p1", "p1", "p2", "p2", "p2"),                     gene.symbol = c("g1", "g2", "g3", "g4", "g33", "g43", "g10"))  mydf %>%    group_by(pathway) %>%    mutate(rownum = row_number()) %>%    ungroup() %>%    spread(pathway, gene.symbol) %>%    select(-1) 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -