Iterate over Variables for Linear Regression in R -


this question exact duplicate of:

i looking run linear regression on below data frame.

test<-data.frame(abc=c(2.4,3.2,8.9,9.8,10.0,3.2,5.4),              city1_0=c(5.3,2.6,3,5.4,7.8,4.4,5.5),              city1_1=c(2.3,5.6,3,2.4,3.6,2.4,6.5),              city1_2=c(4.2,1.4,2.6,2,6,3.6,2.4),              city1_3=c(2.4,2.6,9.4,4.6,2.5,1.2,7.5),              city1_4=c(8.2,4.2,7.6,3.4,1.7,5.2,9.7),              city2_0=c(4.3,8.6,6,3.7,7.8,4.7,5.8),                                           city2_1=c(5.3,2.6,3,5.4,7.8,4.4,5.5)) 

dataframe "test" sample of data. original data frame contains 100 columns. want create script predicting values using linear regression. in case, want build many models different input variables.

for example, in given dataframe, abc y variable. want build 1 model city1_1,city1_2,city1_3,city1_4 (leaving city1_0, city2_0). other model city1_2,city1_3,city1_4 (leaving city1_0,city1_1,city2_0,city2_1) , 3rd model input variable city1_3,city1_4 (leaving city1_0,city1_1,city1_2,city2_0,city2_1), , on.

these variables input linear regression.

this have 40 dataframes. o/p variable name remains same every dataframe.

you create list of formulas using regular expression, , lapply on list:

# create data test<-data.frame(abc=c(2.4,3.2,8.9,9.8,10.0,3.2,5.4),              city1_0=c(5.3,2.6,3,5.4,7.8,4.4,5.5),              city1_1=c(2.3,5.6,3,2.4,3.6,2.4,6.5),              city1_2=c(4.2,1.4,2.6,2,6,3.6,2.4),              city1_3=c(2.4,2.6,9.4,4.6,2.5,1.2,7.5),              city1_4=c(8.2,4.2,7.6,3.4,1.7,5.2,9.7),              city2_0=c(4.3,8.6,6,3.7,7.8,4.7,5.8),                                                         city2_1=c(5.3,2.6,3,5.4,7.8,4.4,5.5))  # create list of formulas myformulas <- list(as.formula(paste("abc", paste(grep("city1_[123456789]", names(test), value = true), collapse = " + "), sep = " ~ ")),                    as.formula(paste("abc", paste(grep("city1_[23456789]", names(test), value = true), collapse = " + "), sep = " ~ ")),                    as.formula(paste("abc", paste(grep("city1_[3456789]", names(test), value = true), collapse = " + "), sep = " ~ ")))  # check formulas > myformulas [[1]] abc ~ city1_1 + city1_2 + city1_3 + city1_4  [[2]] abc ~ city1_2 + city1_3 + city1_4  [[3]] abc ~ city1_3 + city1_4   # loop on formulas mylms <- lapply(myformulas, function(x) lm(x, data = test))   # output of linear regressions > mylms [[1]]  call: lm(formula = x, data = test)  coefficients: (intercept)      city1_1      city1_2      city1_3      city1_4        5.8987      -0.2480       0.6316       1.1810      -1.0420     [[2]]  call: lm(formula = x, data = test)  coefficients: (intercept)      city1_2      city1_3      city1_4        4.8903       0.7114       1.1673      -1.0595     [[3]]  call: lm(formula = x, data = test)  coefficients: (intercept)      city1_3      city1_4         7.909        1.047       -1.102   

you prespecify grep() patterns , create formulas loop:

mygreps <- c("city1_[123456789]", "city1_[23456789]", "city1_[3456789]")  myformulas <- lapply(mygreps, function(x) as.formula(paste("abc", paste(grep(x, names(test), value = true), collapse = " + "), sep = " ~ "))) 

edit:

you can define value range of city variables , use paste() generate strings.

example:

myranges <- lapply(1:16, function(x) x:16) myvars <- paste0("city", 1:10, "_") 

then this, create formulas nested lapply() call:

myformulas <- lapply(myvars, function(x) lapply(myranges, function(y) as.formula(paste("abc", paste(x, y, sep = "", collapse = " + "), sep = " ~ ")))) 

myformulas include 10 lists (one each city1_ city10_) 16 formulas in every list (each including decrementing amount of variables, beginning 16, , ending cityx_16).

now loop on myformulas list of linear regression output:

# loop on formulas mylms <- lapply(myformulas, function(x) lapply(x, function(y) lm(y, data = test))) 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -