Question 4

Based on the clean data set from question 2, create a new column and name it predicted_price. Use the regression model that you developed in question 2 to predict the price of houses in the clean data set. Create another column and call it good_investment. If the predicted_price is more than the price variable good_investment variable is equal to 1 otherwise it is equal to 0. Choose four or less than four numerical variables, most relevant variables as independent variables to develop a logistic regression model with good_investment as the dependent variable.

  • Partition the data as 70 % and 30 % for the training and the test set, respectively. Present the logistic regression equation for the training and the test set. Comment on the logistic regression equation. Explain the procedure for selecting the most relevant variables. Please use less than 150. words for this section.

Note: try to delete all the irrelevant variables from the dataset and only include the four or less than four numeric independent variables and one dependent variable. Otherwise, you increase the chance crashing the app.


