Accuracy of Home Energy Rating Systems

Chapter I: Executive Summary


Approximately 20% of all the energy consumed in the United States is consumed by the residential sector. Much of this energy can now be cost-effectively saved by constructing new houses to be more energy efficient and by retrofitting existing houses with more efficient equipment. Unfortunately, most of the opportunities to save energy, natural resources, and money in houses are not captured because of market barriers such as lack of information and lack of financing. Home Energy Rating Systems (HERS) and related financial products, like Energy Improvement Mortgages (EIMs), have the potential to facilitate identification and financing of a tremendous number of such opportunities. A recent study by the Indiana State Energy Office estimated that penetration by HERS and EIMs into just 20% of the nation's annual housing market would result in:

A HERS is a computer simulation-based method for assessing a home's estimated energy use under standard conditions (similar to a miles-per-gallon rating for a car) and its potential for improvement. A rating usually requires a detailed home inspection by a trained rater. HERS typically generate three types of output: (1) a rating score (e.g. 0 to 100 points); (2) energy use and energy cost estimates for specific end-uses like heating and hot water and for the whole house; and (3) a list of recommended improvements that are calculated to be cost-effective.

Rating a house is difficult because every house is different and there are many potential sources of error such as rater mistakes, imprecise simulation algorithms, and incorrect assumptions about physical features like air infiltration rates. Furthermore, ratings are designed to rate the house and not the occupants so standard assumptions are made for all occupant-related inputs such as the number of people, number of appliances, and thermostat settings. Thus a rating that is accurate for the "typical" family could still be highly inaccurate for any particular family. (Accuracy can be roughly defined as the degree to which rating estimates correspond to actual home energy use, energy cost, and potential savings.)

While accuracy of rating systems is currently not considered by HERS experts to be the most important barrier to widespread use of HERS, all agree that accuracy is important for long-term credibility and success and that research is needed to assess and improve accuracy. To date, however, very little research has been done on the subject and almost no data has been made publicly available. Thus the goals of this research have been:

The principal methodology was to compare estimated energy use and energy cost from ratings with actual energy use and cost from utility bills. We sought data from HERS organizations on houses that have recently been rated and for which utility billing data had already been collected or could easily be collected. Although tens of thousands of houses have been rated in the last several years, few HERS organizations were willing and able to supply actual ratings.

The first data set we received and the one we most extensively examined, was from the California Home Energy Efficiency Rating System (CHEERS). CHEERS supplied us with approximately 200 ratings--about 1/3 from Eureka (a relatively cold California climate) and 2/3 from Fresno (a relatively hot California climate). The houses were rated in 1994 using CHEERS Rate Tool Version I, which has since been replaced by the entirely new Version II tool. CHEERS also helped us to obtain three years of monthly gas and electric billing data for the rated homes from Pacific Gas and Electric.

The average energy cost estimation error for the CHEERS sample was about 50%. In other words, CHEERS tended to overestimate the actual energy cost by about 50%. The standard deviation was 80%, meaning that about 1/3 of the houses were overestimated by more than 130% or underestimated by more than about 30% (see Table 1). While some of the estimation error is attributable to occupant behavior, the magnitude of some of the errors and the consistent tendency to overestimate energy use clearly implies that it is possible to improve both the average error and the variance by addressing other sources of error.

While utility billing data for rated homes cannot pinpoint specific sources of error in ratings, they can yield valuable clues for improving HERS. For example, the fact that CHEERS overestimated gas use more in Eureka than in Fresno, and the fact that CHEERS overestimated electricity use more in Fresno, tends to indicate that CHEERS may be using incorrect heating and cooling setpoints or infiltration rates, or conduction rates, etc. The more heating or cooling required, the greater the overstatement. Another important trend found in the CHEERS data is that some raters tended to produce more accurate ratings than other raters, which emphasizes the need for rater training, oversight, retraining, and the need to minimize rater judgment calls in the rating procedures. We also found that the average cost estimation error and standard deviation decreased as house age decreased. In fact, for houses built between 1990 and 1994, CHEERS underestimated energy cost by 8% on average.


Table 1. Summary of Case Study Results
CHEERS (all homes) CHEERS (new only) Midwest- Kansas HERO-Ohio ERHC- Colorado *
sample size 185 30 16 14 276
avg. yr. built 11209 '90-'94 19120 N/A 1969
blower door test? no no yes yes yes
avg. energy cost error 51% -8% -7% -14% -3%
std dev in errors 63% 44% 15% 20% 35%
avg. actual energy cost $1,154 $1,327 $1,463 $1,697
std dev in actual cost 46% 48% 24% 41% 51%
avg. HDD/yr. for '84-'120 2791 2791 41204 5371 6354
HDD in study yr.(s) 3% below avg. 3% below avg. N/A 5% above avg. 3% above avg.
* Error and standard deviations for Colorado are for site energy use not energy cost because actual cost data were not available


Other case study data that we received showed a smaller average estimation error and smaller standard deviation. For example, for a sample of 276 houses rated by Energy Rated Homes of Colorado certified raters, the average energy use estimate was only 3% lower than the average actual energy use (see Table 1). However, directly comparing the accuracy of the rating systems based on these case studies is almost like comparing apples and oranges because each sample of homes and each HERS is unique. Differences in the samples and rating systems include the following:

One of our most surprising discoveries was that none of the HERS we examined showed any clear relationship between rating score and total energy use or energy cost. Technically, rating scores only measure a house's individual potential for energy improvement and therefore should not be used to compare different houses. However, many consumers and HERS-related housing programs expect and assume that houses with higher rating scores will have lower energy costs. Yet even when compared to houses of similar size, ones with higher scores did not tend to use any less energy than homes with lower scores. One possible explanation is the "takeback effect" which says that the higher scoring houses are indeed more efficient and would use less energy if they were operated in the same manner as lower scoring houses but they are not operated in the same manner. Occupants of more efficient, higher scoring houses are likely to be more affluent, to have more appliances, and to choose more comfortable heating and cooling setpoints. Thus they "takeback" some of the expected savings in higher levels of service.

The takeback effect is supported by another phenomenon common to all the HERS studied: a clearly downward sloping trendline, or linear regression line, when estimation error is plotted against overall score (see Figure 1). For example, CHEERS tended to significantly overestimate energy cost for houses with very low scores. Overestimation decreases as the score increases to the point where CHEERS tends to underestimate energy cost for houses with very high scores. While the CHEERS trendline appears to cross the x-axis (0% estimation error) at a score of around 90, the Kansas trendline crosses at a score of about 80, and the Colorado and Ohio rating systems both have their lowest average error at a score of approximately 60. Thus it appear that it is possible to adjust the point where this trendline intersects the x-axis but not to flatten out the downward slope because HERS assume the same standard occupant behavior for all house types, when in fact, behavior varies according to the energy efficiency of the house.

Furthermore, in addition to being a function of socio-economic behavior dynamics, the particular slope of this trendline may be an inevitable function of climate severity. The percentage difference in energy use between the energy "misers" and the energy "hogs" in a mild climate is greater than the difference in a severe climate. Indeed, California has the mildest climate and the steepest trendline, while Colorado has the most severe climate and the flattest trendline.

The trendline can be translated vertically (i.e. the average error can be brought close to zero) by calibrating ratings with utility billing data for a statistically representative sample of houses. Calibration means adjusting one or more of a number of standard assumptions such as heating setpoint, cooling setpoint, home operating profile, internal gains, infiltration rates, and hot water usage.


Figure 1. Linear Regression of Energy Cost Error vs. Rating Score

Making accurate recommendations for cost-effective improvements is the most difficult objective of a HERS; it is also the most difficult objective to validate. One way for us to gauge the accuracy of recommendations was to compare the actual energy use of the CHEERS homes to the predicted total energy savings if the occupants implemented all the recommendations. Since many of the ratings predicted that it was possible to save over 50%, and in some cases over 100%, of current consumption, it is likely that at least some of the recommendations would not be cost effective.

The HERS marketplace (homeowners, banks, HERS providers, builders, etc.) does not seem to demand a high degree of accuracy. HERS experts do not rank accuracy high on the list of keys to HERS success, and do not report many questions or complaints about accuracy from consumers. In response to this perception, HERS providers choose not to discuss accuracy in marketing literature or on the rating forms themselves. Indeed, it appears that lending institutions that participate in Energy Improvement Mortgages and other HERS-related energy efficiency financing products are not exposed to any significant risk related to HERS accuracy because the agreement between expected and actual energy bills does not appear to affect the probability that a homeowner will default on a mortgage (Horowitz 1996).

Homeowners, however, are bearing at least some risk related to HERS accuracy. One risk is that a homeowner will incorrectly conclude that one home is more energy efficient than another based on inaccurate HERS scores. Another risk is of making an uneconomical investment based on an inaccurate recommendation. This risk is real and may be significant. It is ironic to note, however, that a homeowner can make an uneconomical investment and never know it. There is another risk that both the homeowner and the lender face which is probably more significant than the risk from accuracy: the risk that the homeowner (or the bank, in the case of foreclosure) will not be able to resell the house for a price that recovers both the original sale price plus the cost of the energy improvements (i.e. the risk that the market will not value energy efficiency features at their true economic value).

This is not to say that HERS accuracy is unimportant. Lack of data regarding accuracy may be impeding the growth and acceptance of HERS amongst certain consumers, lenders, and other groups nationwide. Furthermore, a lack of accuracy may eventually impact some HERS and cause irreparable credibility problems, which could spread to all HERS. For these reasons, HERS organizations and HERS providers continue to strive to improve accuracy.

One way to improve accuracy is to collect more utility billing data on some of the thousands of homes that have been rated with HERS in the last several years. A great deal of very detailed data now exist and must simply be gathered. As this research project has demonstrated, this sort of analysis can be fairly inexpensive to perform and can yield valuable information for calibrating rating systems and for identifying sources of error. Other forms of research such as submetering of energy end-uses, software-to-software comparisons, and pre/post-retrofit analyses are also needed.

Based on the wealth of knowledge that can be gained from comparing ratings and utility data, HERS providers may want to consider some fundamental modifications to HERS. For example, it may be possible to greatly improve HERS accuracy by incorporating a few key pieces of information about the current or prospective occupants into a rating. Occupant-specific input could be as simple as the number of occupants or could include other characteristics such as preferred temperature settings, appliances owned, annual weeks of vacation, hours at home, etc. Another modification that has the potential to greatly reduce the cost of performing ratings without compromising accuracy is to switch from the current simulation-based system to a prescriptive rating system. Detailed statistical analysis of ratings and utility bills could yield a regression equation that allows accurate prediction of energy use and energy costs based on a much smaller number of variables than are now collected for ratings. Such a system has the ability to account for the "takeback effect" and could generate scores and recommendations as well.



Continue to Chapter II: Introduction

Return to the home page.
Return to the Cover Page and Acknowledgements.
This web page last modified by Brian Pon on April 27, 2000.
Questions? E-mail Alan Meier.