Nest Thermostat Disaster, When IoT Goes Wrong

Nest Thermostat Disaster, When IoT Goes Wrong

The Nest thermostat has been an iconic example of an Internet of Things device, but today it is the epitome of IoT failures. Nest had an epic meltdown for devices all across the world that left homeowners out in the cold… literally. As the New York Times reports, “The problems with the much-hyped thermostat…affected an untold number of customers when the device went haywire across America.” The device, which is an Internet of Things, or connected thermostat started shutting off, leaving many without heat in their homes.

As the temperature dropped, tempers flared as users took to Twitter to add fuel to the fire (which I suppose is what they were frantically trying to do at home too!).

With temperatures approaching 0-degrees Fahrenheit in parts of the United States, an unheated house can be perilous for the very old and the very young. I would imagine Nest never included “…and possibly death” in the “Risks and Mitigation” section of their original business plan. That’s a bit tongue and cheek, but the reality is that this is a serious problem that affects people in a more drastic way than if, say, their iPad quit working overnight. We’re so used to mechanical devices working, or their method of failure that we need to look hard at the potential issues as we build out the Internet of Things. Mechanical devices certainly fail, but with connected devices there’s the risk that they’ll fail all at once which is certainly a bigger problem.

So, how did this happen? Well, Internet of Things devices like the Nest thermostat have software in them, which we call firmware. Most of these devices have the ability to update that firmware remotely, which is what happened with the Nest several weeks ago. Unfortunately, that firmware had a bug that caused the battery life to drain — apparently over the course of “several weeks”. Over the last several days users have started reporting that their thermostats shut off as a result of that battery drain and were no longer heating their homes.

As an IoT software developer, this is one of the major things Geisel Software works with our clients to help prevent. There are two major sources of prevention here that we could all learn from. The first is to roll out your firmware in waves. No matter how good you think your new firmware is, you should always roll it out to a portion of your customer base first. That way, if there is an epic failure like with the Nest, it doesn’t affect all of your users and you have time to provide a fix before 100% of your customer base has been affected. This is similar to what Apple does with the iPhone and Sony and Microsoft with the PS4 and XBox One, as examples. It isn’t clear that Nest didn’t take this step, but it certainly would have been worse if they hadn’t (or better if they did!).

The second aspect of this is the same one that got the Spirit rover on Mars. The system needs to be tested, but also to be tested for equal lengths of time as it will be used. The Spirit rover had a memory leak that developed in certain conditions, which eventually led to a failure. A similar issue happened with the Nest, where the battery would be slowly drained, but the issue didn’t become apparent for several weeks.

In the case of the Nest, it’s pretty clear they’ll make sure they check for battery drains in future firmware releases. However, if looked at closely, this problem could have been anticipated early. For example, the Nest disconnects itself from wifi if it ever loses battery power, which is an obvious and immediate loss of the ability to update firmware. When you’re creating a connected device, one of the first things you need to assess is what are the things that could cause us to fail (and what are the possible consequences of those failures). When you realize that battery loss is a weak link in the update process, you need to be sure that it is well established in your testing routine before firmware is deployed.

Today Nest brings us a firm reminder that things can go wrong with our IoT devices. It’s vitally important that we understand what could go wrong and how devastating those failures could be so we can properly protect against their failure and test releases to keep them running smoothly.

You Might Also Like