R read excel by column names -
so have bunch of excel files want loop through , read specific, discontinuous columns data frame. using readxl works basic stuff this:
library(readxl) library(plyr) wb <- list.files(pattern = "*.xls") dflist <- list() (i in wb){ dflist[[i]] <- data.frame(read_excel(i, sheet = "sheetname", skip=3, col_names = true)) } # put them data frame data <- ldply(dflist, data.frame, .id = null) this works (barely) problem excel files have 114 columns , want specific ones. not want allow r guess col_types because messes of them (eg string column, if first value starts number, tries interpret whole column numeric, , crashes). question is: how specify specific, discontinuous columns read? range argument uses cell_ranger package not allow reading discontinuous columns. alternative?
the read.xlsx function openxlsx package has parameter cols takes numeric index specifies columns read.
it seems read columns characters if @ least 1 column character.
edit: .xls files, see xlconnect package. installing rjava might tricky, thought. keep , drop parameters of readworksheet() accept column names too. parameter coltypes deals column types. way works me:
options(java.home = "c:\\program files\\java\\jdk1.8.0_74\\") library(rjava) library(xlconnect) workbook <- loadworkbook("test.xls") readworksheet(workbook, sheet = "sheet0", keep = c(1,2,5))
Comments
Post a Comment