r - Find all indices of duplicates and write them in new columns -
i have data.frame single column, vector of strings.
these strings have duplicate values. want find character strings have duplicates in vector , write index of position in new column.
so example consider have:
dt<- data.frame(string=a,b,c,d,e,f,a,c,f,z,a)
i want get:
string match2 match2 match3 matchx.... 1 7 11 b 2 na na c 3 8 na d 4 na na e 5 na na f 6 9 na 1 7 11 c 3 8 na f 6 9 na z 10 na na 1 7 11
the string ways longer in example , not know amount of maximum columns need.
what effective way this? know there duplicate function not sure how combine result want here.
many thanks!
here 1 option data.table
. after grouping 'string', sequence (seq_len(.n)
) , row index (.i
), dcast
'wide' format , join original dataset on
'string'
library(data.table) dcast(setdt(dt)[, .(seq_len(.n),.i), string],string ~ paste0("match", v1))[dt, on = "string"] # string match1 match2 match3 # 1: 1 7 11 # 2: b 2 na na # 3: c 3 8 na # 4: d 4 na na # 5: e 5 na na # 6: f 6 9 na # 7: 1 7 11 # 8: c 3 8 na # 9: f 6 9 na #10: z 10 na na #11: 1 7 11
or option split
sequence of rows 'string', pad list
elements na
length less, , merge
original dataset (using base r
methods)
lst <- split(seq_len(nrow(dt)), dt$string) merge(dt, do.call(rbind, lapply(lst, `length<-`, max(lengths(lst)))), by.x = "string", by.y = "row.names")
data
dt<- data.frame(string=c("a","b","c","d","e","f","a","c", "f","z","a"), stringsasfactors=false)
Comments
Post a Comment