Housing Price Forecasting (Coursework Sample)
Introduction.
The importance of Housing Price Forecasting is undeniable, as Real State is one of the most critical sectors in the economy. The 2009 subprime financial crisis revealed that as an asset class, Real State is interconnected to the rest of the economic system through the financial system due to leverage, collaterals, and the securitization of loans. Therefore, forecasting models should be widely studied.
Housing prices impact the formation and burst of bubbles, macroeconomic processes, such as business cycles, unemployment, aggregate consumption, etc. While other assets such as bonds, commodities, and currencies have different price dynamics, Real State has specific characteristics because of its heterogeneity due to property location and physical attributes. (Ghysels, Plazzi , Tourus, & Valkanov, 2013)
Housing Price Forecasting
Carlos Figueroa
Colorado State University Global
MIS470: Data Science Foundation
Kelly Wibbenmeyer
Due date
Housing Price Forecasting
Introduction.
The importance of Housing Price Forecasting is undeniable, as Real State is one of the most critical sectors in the economy. The 2009 subprime financial crisis revealed that as an asset class, Real State is interconnected to the rest of the economic system through the financial system due to leverage, collaterals, and the securitization of loans. Therefore, forecasting models should be widely studied.
Housing prices impact the formation and burst of bubbles, macroeconomic processes, such as business cycles, unemployment, aggregate consumption, etc. While other assets such as bonds, commodities, and currencies have different price dynamics, Real State has specific characteristics because of its heterogeneity due to property location and physical attributes. CITATION Ghy13 \l 2058 (Ghysels, Plazzi , Tourus, & Valkanov, 2013)
Housing price predictability faces many challenges. For a start as an asset class, Housing Prices are infrequently traded. Therefore, Real State data is relatively short, unlike other assets such as bonds, commodities, and currencies that generate yearly, monthly, daily data, and even by-minute data. Also, house prices face high transaction costs, and they are inherently illiquid. CITATION Ghy13 \l 2058 (Ghysels, Plazzi , Tourus, & Valkanov, 2013)
Acknowledging the possible limitations, we wanted to have a first approach to analyze and forecast Housing Prices based on the sale prices in Ames, Iowa. We wanted to know which factors are significant to our model and how good is our prediction. We use a data set of 1,000 observations to fit a linear regression model, and then with a different data set, we tested how well the predicted prices fit the observed prices.
-88097395348600-53436431297Figure SEQ Figure \* ARABIC 1.Training Data SummaryFigure 1.Training Data Summaryleft7242175001.Examining housing.testing.csv.
-1657356633210Figure SEQ Figure \* ARABIC 2. Testing Data SummaryFigure 2. Testing Data SummaryWe can observe that the minimum value is very different from the two data sets, so we should expect that the forecasted price differs for small house prices.
The 1st, 2nd(or median), and 3rd quantiles are very close to each other, so we should expect data to be similarly distributed across the price ranges. The mean and the maximum of the training data are higher than the mean from the testing data, but they are not significantly different.
Figure SEQ Figure \* ARABIC 3. Sale Price Histogram for Training Data
34120223842Figure SEQ Figure \* ARABIC 4. Sale Price Histogram for Testing Data00Figure 4. Sale Price Histogram for Testing Data
The histograms show that both data sets are left-skewed and similarly distributed, except for small values of SalePrice.
-156949368490Figure SEQ Figure \* ARABIC 5. Sale Price Histogram for Combined DataFigure 5. Sale Price Histogram for Combined Data2. Combining the two data sets.
The histogram from the combined data is similar to the training data histogram. The similarity is because the training data is a larger (1,000 observations) than the testing data set (460 observations).
3. Linear regression.
We calculated a linear regression model using the training data set. The SalePrice as a linear function of the rest of the variables:
SalePrice=β0+β1 MSSubClass + β2LotFrontage + β3LotArea +β4OverallQual + β5OverallCond + β6YearBuilt + β7YearRemodAdd + β8MasVnrArea + β9TotalBsmtSF +β10 GrLivArea + β11FullBath + β12HalfBath + β13BedroomAbvGr + β14KitchenAbvGr +β15 TotRmsAbvGrd +β16 Fireplaces + β17GarageYrBlt +β18 GarageCars + β19GarageArea + β20WoodDeckSF +β21 OpenPorchSF + β22MoSold + β23YrSold+e
As we said before, a linear regression model is the first approximation into predicting housing prices based on a linear model that uses the characteristics of each sale. More advanced models are very complex and use data-rich approaches, "which summarize a large amount of information in a relatively small number of estimated factors and use these to forecast house price fluctuations." CITATION Bor18 \l 2058 (Bork & Stig, 2018)
The previous does not mean that we dismiss the reach of our model, but that we understand that it can have limitations. Still, as we will see in part 5, the simple linear regression model has high predictive power.
225188232467Figure SEQ Figure \* ARABIC 6. Regression OutputFigure SEQ Figure \* ARABIC 6. Regression OutputThe regression output was:
Before we interpret our model, we want to make some considerations: The model specification was made to analyze whether the independent variables impact housing prices. SalePrice is a linear function on 23 variables. The possible difficulty that could arise from this model is the problem of overfitting our data, which means that model is highly predictive for the training data, but it fails to replicate in future samples. Overfitting a model means that the results are overly optimistic, and the "findings" will not replicate on the population. CITATION Bab04 \l 2058 (Babyak, 2004).
In this concern, it will be helpful to test our model's predictability with a testing data in part 5, which is different from the data from which we estimate our model (training data).
Additionally, we can observe that the R2=.847 and the adjusted-R2=0.8423. The R-square is a measure of the goodness of fit of the model but always increases when we add a new variable. On the other hand, adjusted-R- squared does not always goes up when a variable is added CITATION Hil11 \p 237 \l 2058 (Hill, Griffiths, & Lim, 2011, pág. 237). The previous means that even if many variables can penalize the adjusted-R-squared since the two indicators are very close, we have not overfitted the model significantly.
4. Model interpretation
We performed hypothesis testing on each of the regression coefficients to test whether they are statistically significant. The hypothesis testing was given by:
Ho:βi=0 The coefficient is statistically non-significant. The variable has no impact on the determination of Housing Prices.
H1:βi ≠0 The coefficient is statistically significant. The variable has an impact on Housing Prices.
To perform the hypothesis test, we used the p-value criteria. If the p-val< α, then the variable is statistically significant. We condensed the results in a table, according to their p-values, for different significance levels:
Significant at α=.001
Significant at α=.01
Significant at α=.05
Significant at α=.10
Non-significant at any level
LotArea
OverallQual
OverallCond
YearBuilt
MasVnrArea
TotalBsmtSF
GrLivArea
BedroomAbvGr
KitchenAbvGr
TotRmsAbvGrd
GarageArea
MSSubClass
Fireplaces
YearRemodAdd
...
Other Topics:
- 4P's of MarketingDescription: 4P's of marketing mix are product, price, place, and promotion, which are the four main elements that businesses use to market and sell their products or services. Understanding these four components and how they interact is essential for creating an effective marketing strategy. ...2 pages/≈550 words| 1 Source | APA | Literature & Language | Coursework |
- NursingDescription: Nursing Literature & Language Coursework...1 page/≈275 words| 3 Sources | APA | Literature & Language | Coursework |
- The Use of Analytics in Facebook and Google CompaniesDescription: The Use of Analytics in Facebook and Google Companies Literature & Language Coursework...2 pages/≈550 words| 2 Sources | APA | Literature & Language | Coursework |