how to exclude variables (columns) from an R

Question

This is the R code for logistic reg model,

> hrlogis1 <- glm(Attrition~. -Age -DailyRate -Department -Education
>                 -EducationField -HourlyRate -JobLevel
>                 -JobRole -MonthlyIncome -MonthlyRate
>                 -PercentSalaryHike -PerformanceRating
>                 -StandardHours -StockOptionLevel
>                 , family=binomial(link = "logit"),data=hrtrain)

where: Attrition is the dependent variable and rest are all the independent variables.

Below is the summary of the model:

Coefficients:

                                Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      1.25573    0.84329   1.489 0.136464    
BusinessTravelTravel_Frequently  1.86022    0.47410   3.924 8.72e-05 ***
BusinessTravelTravel_Rarely      1.28273    0.44368   2.891 0.003839 ** 
DistanceFromHome                 0.03869    0.01138   3.400 0.000673 ***
EnvironmentSatisfaction         -0.36484    0.08714  -4.187 2.83e-05 ***
GenderMale                       0.52556    0.19656   2.674 0.007499 ** 
JobInvolvement                  -0.59407    0.13259  -4.480 7.45e-06 ***
JobSatisfaction                 -0.37315    0.08671  -4.303 1.68e-05 ***
MaritalStatusMarried             0.23408    0.26993   0.867 0.385848    
MaritalStatusSingle              1.37647    0.27511   5.003 5.63e-07 ***
NumCompaniesWorked               0.16439    0.04034   4.075 4.59e-05 ***
OverTimeYes                      1.67531    0.20054   8.354  < 2e-16 ***
RelationshipSatisfaction        -0.23865    0.08726  -2.735 0.006240 ** 
TotalWorkingYears               -0.12385    0.02360  -5.249 1.53e-07 ***
TrainingTimesLastYear           -0.15522    0.07447  -2.084 0.037124 *  
WorkLifeBalance                 -0.30969    0.13025  -2.378 0.017427 *  
YearsAtCompany                   0.06887    0.04169   1.652 0.098513 .  
YearsInCurrentRole              -0.10812    0.04880  -2.216 0.026713 *  
YearsSinceLastPromotion          0.14006    0.04452   3.146 0.001657 ** 
YearsWithCurrManager            -0.09343    0.04984  -1.875 0.060834 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now I want to remove those which are not significant, here in this case "MaritalStatusMarried" is not significant. MaritalStatus is a variable(column) with two levels "Married" and "Single".

What do you mean remove? From the dataframe `hrtrain`? And what have the levels to do with it? — desertnaut, Jun 15 '18 at 08:46
Possible duplicate of [Drop data frame columns by name](https://stackoverflow.com/questions/4605206/drop-data-frame-columns-by-name) — desertnaut, Jun 15 '18 at 09:04
I want to exclude only "MaritalStatusMarried" because it's not a significant for the model. that's what i mean. — Bala, Jun 15 '18 at 09:07
So, you are just asking how to remove columns from an R dataframe... Your question is very poorly expressed (and it has nothing to do with logistic regression itself) - see answer in the link above — desertnaut, Jun 15 '18 at 09:14
It's not just a column. I will give an example: suppose a column "Gender" which contains two categories "male" and "female", and here say that male is not significant for the model. hence I need to exclude only male from the Gender column. — Bala, Jun 15 '18 at 09:21
So you have a factor with *two* levels and want to drop one of them. And expect the other to be significant? How is that possible? — Rui Barradas, Jun 15 '18 at 17:43

score 0 · Answer 1 · answered Jun 15 '18 at 17:07

0

How about:

data$MaritalStatus[data[,num]="Married"] <- NA

(where num = number of the column in the data)

The values for Married will be replaced for NA's and then you can run the glm model again.

answered Jun 15 '18 at 17:07

Érica Wong

109
6

am getting an error: Error: unexpected '=' in "hrdata$MaritalStatus[hrdata[,16]=" – Bala Jun 15 '18 at 17:31
Try this: data$MaritalStatus[data[,num]=="Married"] – Érica Wong Jun 15 '18 at 18:59

how to exclude variables (columns) from an R

1 Answers1