In this first of a series of Q&As with researchers in the Environmental Energy Technologies Division, Michael Sohn talks about probability and uncertainty in modeling.
Michael Sohn is an environmental engineer working to understand how chemicals and energy are used in the world at various scales—global, state, and building. His research interests are in mathematical modeling of environmental systems and quality, uncertainty analysis, value-of-information decision analysis, water-energy integrated assessment, and sensor-data fusion.
Sohn has a PhD in Civil and Environmental Engineering and an MS degree in Engineering and Public Policy from Carnegie Mellon University. He also has MS and BS degrees in Mechanical Engineering. Before coming to Lawrence Berkeley National Laboratory (Berkeley Lab) 15 years ago, Mike worked at an environmental engineering firm where he conducted environmental health risk assessments. At Berkeley Lab's Environmental Energy Technologies Division, he is deputy leader of the sustainable energy systems group and former leader of the Airflow and Pollutant Transport Group.
A: Let's say we believe that children are being exposed to pesticides but we don't know the route. We can measure the toxins in a child, but how do we estimate the exposure pathways? There is no exact answer: but we can guess that exposures are probably from multiple pathways, and also that there is variability (or noise) in the measurements. The best we can do is to describe the likely, or predominant, pathways. Assessing uncertainty helps us understand how much (and how little) we know from the existing measurements and whether additional measurements are likely to improve our analysis.
Another example, relevant to my current work in buildings: suppose I manage many buildings of similar type, say 50 or 60 across the United States. Suppose I want to weigh the costs and benefits of exchanging the heating systems in the buildings to ones that are more efficient. Suppose also the cost of analyzing each building, one by one, is cost prohibitive. Well, there is no exact solution, because one doesn't exist. But that might be okay. If we can provide a strong estimate of the range (or uncertainty) in the expected energy change, the building owner, a financier, or portfolio manager will have the needed information to assess financial risk in their cost-benefit analysis. Uncertainty assessment is often key to making good policy decisions.
Or, in another project I've worked on, we tried to determine the optimal placement of air monitoring sensors in buildings. We wanted to know: where to do I place the sensors to maximize the probably of detecting some unforeseen chemical spill? This, too, turns out to be an uncertainty assessment problem, requiring an assignment of probabilities to spills that might occur and also assessing the quality of the measurements that sensors might return. In this project, we placed a sensor near a location that was highly likely to have a spill, but the sensors in that location returned highly variable or noisy measurements. My colleagues and I had to develop an uncertainty assessment algorithm, based on Bayesian statistics, to maximize the likelihood of detecting the spills from one, two, three, or many sensors.
A: Bayesian statistics is a rather old statistical concept that has gained a lot of traction in the past few decades because of the vast availability of fast computers. The concept is powerful for solving decision analysis problems. These days, most engineering and statistical departments have an expert in Bayesian statistics, whereas only certain universities were known for this in the past. A key benefit of this kind of analysis is that it provides a transparent and quantitative approach to analyzing noisy and sometimes erroneous physical data. It's quite the cat's meow lately. It's applied in many diverse topics (web search engines, predicting election outcomes) but of course my interests are in applying these methods to understand physical and energy systems.
In my graduate studies, I lost interest in developing the more complex environmental models of the time, because I saw that these models were getting more and more difficult to verify or refute. I realized that an important gap in the field was the need for methods for computing how little one knows about the physics and determining what data was needed to reconcile our lack of knowledge. I also saw that in many cases these complex models were not needed for decision making, and that a transparent and tractable model was often better or good enough ... that is if we could also provide an assessment of uncertainty in the model predictions. Of course complex models are extremely important in certain domains of study, just not for all analyses.
I also became very interested in probabilistic risk assessments. I realized that with faster computers becoming available I could compute uncertainty for much more complex models and amounts of experiment data. These interests have led me to a research area called "value of information"—a fairly old but underutilized field in economics, and perhaps unappreciated in the physical sciences. For example, do I spend a lot of money to get one precise measurement, or a whole lot of coarse measurements? What's the tradeoff there?
In the work I do now, considering retrofits for energy-efficient equipment in buildings, I try to create ways to answer the questions that a building operator or owner might have: like, how much energy will I save if I change my HVAC equipment, and how much will I have to pay to do it? Is it beneficial to replace the equipment based on my expected energy savings? If I have great uncertainty in energy savings, I want to assess the "risk" for spending that money.
A: People are bandying the term "big data" around right now—but what does it mean? In the energy world, it can mean measuring everything about energy and buildings. Some people might think that "big data" is going to be the solution for many of our energy analysis questions—at building or national scale. I'm cautious. I think we need to get our hands around "big data"—and an important research area will be to explain what big data is for the applied energy fields.
A: I consider myself a scientist, and the "science" of the field is still developing. There are still many needs for developing algorithms and computer systems, making them approachable for non-technical users, and applying the questions about big data on current issues.
I'm also an engineer and am focused on applied energy problems. I see myself working more toward policy issues, the energy distribution at the grid level, and understanding the interplay between water and electricity demands. We're also looking at the potential impacts of climate change. There is great uncertainty and many unknowns about future climates and how they will affect the need for energy and water. Which means that we need to make decisions now that preplan against possible consequences. I suppose these applications have a recurring theme: we use measurements to understand and forecast the future of physical systems. I'm interested in what measurements we can use and need to sound forecasts, allowing decision makers to have the best information they can to make important decisions.