Iterate over Variables for Linear Regression in R -
this question exact duplicate of:
i looking run linear regression on below data frame.
test<-data.frame(abc=c(2.4,3.2,8.9,9.8,10.0,3.2,5.4), city1_0=c(5.3,2.6,3,5.4,7.8,4.4,5.5), city1_1=c(2.3,5.6,3,2.4,3.6,2.4,6.5), city1_2=c(4.2,1.4,2.6,2,6,3.6,2.4), city1_3=c(2.4,2.6,9.4,4.6,2.5,1.2,7.5), city1_4=c(8.2,4.2,7.6,3.4,1.7,5.2,9.7), city2_0=c(4.3,8.6,6,3.7,7.8,4.7,5.8), city2_1=c(5.3,2.6,3,5.4,7.8,4.4,5.5))
dataframe "test" sample of data. original data frame contains 100 columns. want create script predicting values using linear regression. in case, want build many models different input variables.
for example, in given dataframe, abc y variable. want build 1 model city1_1,city1_2,city1_3,city1_4 (leaving city1_0, city2_0). other model city1_2,city1_3,city1_4 (leaving city1_0,city1_1,city2_0,city2_1) , 3rd model input variable city1_3,city1_4 (leaving city1_0,city1_1,city1_2,city2_0,city2_1), , on.
these variables input linear regression.
this have 40 dataframes. o/p variable name remains same every dataframe.
you create list of formulas using regular expression, , lapply
on list:
# create data test<-data.frame(abc=c(2.4,3.2,8.9,9.8,10.0,3.2,5.4), city1_0=c(5.3,2.6,3,5.4,7.8,4.4,5.5), city1_1=c(2.3,5.6,3,2.4,3.6,2.4,6.5), city1_2=c(4.2,1.4,2.6,2,6,3.6,2.4), city1_3=c(2.4,2.6,9.4,4.6,2.5,1.2,7.5), city1_4=c(8.2,4.2,7.6,3.4,1.7,5.2,9.7), city2_0=c(4.3,8.6,6,3.7,7.8,4.7,5.8), city2_1=c(5.3,2.6,3,5.4,7.8,4.4,5.5)) # create list of formulas myformulas <- list(as.formula(paste("abc", paste(grep("city1_[123456789]", names(test), value = true), collapse = " + "), sep = " ~ ")), as.formula(paste("abc", paste(grep("city1_[23456789]", names(test), value = true), collapse = " + "), sep = " ~ ")), as.formula(paste("abc", paste(grep("city1_[3456789]", names(test), value = true), collapse = " + "), sep = " ~ "))) # check formulas > myformulas [[1]] abc ~ city1_1 + city1_2 + city1_3 + city1_4 [[2]] abc ~ city1_2 + city1_3 + city1_4 [[3]] abc ~ city1_3 + city1_4 # loop on formulas mylms <- lapply(myformulas, function(x) lm(x, data = test)) # output of linear regressions > mylms [[1]] call: lm(formula = x, data = test) coefficients: (intercept) city1_1 city1_2 city1_3 city1_4 5.8987 -0.2480 0.6316 1.1810 -1.0420 [[2]] call: lm(formula = x, data = test) coefficients: (intercept) city1_2 city1_3 city1_4 4.8903 0.7114 1.1673 -1.0595 [[3]] call: lm(formula = x, data = test) coefficients: (intercept) city1_3 city1_4 7.909 1.047 -1.102
you prespecify grep()
patterns , create formulas loop:
mygreps <- c("city1_[123456789]", "city1_[23456789]", "city1_[3456789]") myformulas <- lapply(mygreps, function(x) as.formula(paste("abc", paste(grep(x, names(test), value = true), collapse = " + "), sep = " ~ ")))
edit:
you can define value range of city
variables , use paste()
generate strings.
example:
myranges <- lapply(1:16, function(x) x:16) myvars <- paste0("city", 1:10, "_")
then this, create formulas nested lapply()
call:
myformulas <- lapply(myvars, function(x) lapply(myranges, function(y) as.formula(paste("abc", paste(x, y, sep = "", collapse = " + "), sep = " ~ "))))
myformulas
include 10 lists (one each city1_
city10_
) 16 formulas in every list (each including decrementing amount of variables, beginning 16, , ending cityx_16
).
now loop on myformulas
list of linear regression output:
# loop on formulas mylms <- lapply(myformulas, function(x) lapply(x, function(y) lm(y, data = test)))
Comments
Post a Comment