Group Memo:

Home Sale Pricing Trends in Ames, IA

Professor: Dr. Janet Fraser

Co-authors: Nicholas Gmernicki, Hanna Traggiai

Statistical Methods I - Fall 2023

MEMORANDUM

TO: Dr. Janet Fraser

FROM: Veronique Nedeau

Nicholas Gmernicki

Hanna Traggiai

DATE: September 30, 2023

SUBJECT: Home Sale Pricing Trends in Ames, IA

objective

Develop a model for a property management company to estimate home price trends in Ames, Iowa, based on home sales from 2002 - 2011.

Variable Consideration

 While there is no single variable that will determine the value of a home, there are variables that have more bearing than others. Accordingly, Wise Group Analytics decided to focus on the following variables to develop an effective predictive model:

Figure 1: Scatterplots comparing log(Sale Price) to Square Footage and log(Lot Area)

Model Description and Interpretation

After investigating multiple models with different variables, the model was chosen based on the strong collinear relationship between the variables mentioned above (Figure 1). Wise Group Analytics noted the possible redundancy by including both square footage and lot area, where lot area includes the land the home is on. However, further investigation yielded results that demonstrated removing either of the variables reduced the strength of the model. It was determined that since square footage included living area above grade (above ground level) it was not necessary to remove either variable from the model.

The results of the model show that for every additional point gained in overall quality, there is a 17 percent increase in the sale price of the home (Table 1). An increase in lot area by 1 percent leads to a 15 percent increase in home sale price. For every year the remodel is done closer to the sale date, there is a 0.3 percent increase in the sale price. An additional square footage of living space (above grade level), there is a 0.003 percent increase in sale price.

Table 1: Model Results

Significance of Model

The model used for this memo was able to account for 80.6% of the variability in the home sale price data for Ames, IA. This 80.6% fit for the data demonstrates the model’s effectiveness to explain home prices if used by realtors.

The P-values of all the predictor variables are nearly zero, marked by the ‘***’ in Table 1. This indicates that the variables are statistically significant at all levels in relation to the dependent variable of log(Sale Price). However, the change in sales price relative to a change in square footage is statistically significant, however it is not necessarily as important as other indicators, as the results demonstrate an increase of only 0.003 percent.

What does the model mean?

While the team at Wise Group Analytics ultimately decided on the model described throughout this memo, additional variables from the dataset could have been utilized to increase the prediction level associated with home sale prices in Ames, IA. A prevalent factor considered throughout the analytical process was overfitting the data. As the prediction level increases, the likelihood that the prediction model may overfit the data also increases. Overfitting can unintentionally lead to the model’s inability to estimate home sale prices when new data is introduced.

The dependent variable, log(Sale Price), for this model was logarithmically transformed, yielding additional measures for model interpretation. The transformation was performed to reduce variance caused by a few extremely high sale price values contained within the dataset. As a result, estimates generated by the model are in log scale, which requires back-transforming these estimates to interpret them on the original scale for sales prices. This back-transformation would also need to be performed on the Lot Area predictor variable to bring its values back to the original scale. For example, to back-transform the sales price, you can use the formula:

The independent variables contained within the linear regression model discussed throughout this memo, are pertinent determinants of model performance. The Wise Group Analytics team determined overall quality, year remodeled, square footage, and lot area to have the greatest influence on home sale price in Ames, IA between the years of 2002-2011. The independent variables within the model are also known as predictor values because they are used to make predictions. The coefficients for each of these variables indicate how much the dependent variable is expected to change relative to a one-unit change in the predictor values. Before the team at Wise Group Analytics made a final decision regarding the variables to be included within their prediction model, additional statistical tests were performed to ensure significance.

Figure 2: Distribution of Sales by Month

To further analyze the model, the use of sub-setting was employed. The dataset was subset by month, which allowed the fit of smaller portions of the larger dataset to be analyzed. Sub-setting helps to simplify complex datasets and often provides enhanced visuals by minimizing data overcrowding. Upon sub-setting the data by month, the team at Wise Group Analytics was able to identify that the model had the best fit for the month of July. The goodness of fit is anchored in the number of data points contained within this month, as well as a higher R-squared value than the non-subset model.

sources

pdf format

Wise Group Memo.pdf