Implementing a Watchdog Timer in Java

Watchdog timers have a number of uses for applications that stretch beyond embedded systems and operating systems. I’ve come across several scenarios where I needed a watchdog for an application, but I’ve always found Java’s built-in APIs and third-party libraries to be lacking in this area.

Luckily, implementing your own watchdog is actually pretty simple. In this post I’ll present a basic framework that can easily be expanded or customized to meet your needs.

Functional Requirements

What we want to implement is a timer set to a particular fixed or programmable interval. This timer needs to run in the background and expose a means for the application to reset it. If the timer does elapse, it needs to trigger some event or set some state variable accessible to the application. We also need the ability to shut down the timer entirely when the application is shutting down.

Seeing "timer" in the description may lead you to consider using java.util.Timer. Not only does Timer have serious drawbacks compared to ScheduledThreadPoolExecutor (check out Java Concurrency in Practice section 6.2.5 for more), we can use a BlockingQueue and the producer-consumer design pattern to the same effect without having to schedule separate background tasks.

Basic Implementation

For our basic implementation, let’s consider a watchdog that will be used to monitor the connection status of some remote device. We’ll assume the device is supposed to send a status message on some regular interval, and if we don’t hear from it for over a minute, we want to get a notification.

First, a simple application class that will kick everything off:

public class Application
{
    private final ExecutorService executorService = ...;

    private final ConnectionWatchdog connectionWatchdog = new ConnectionWatchdog();

    private Future<?> watchdogTask;

    public void start()
    {
        watchdogTask = executorService.submit(connectionWatchdog);
    }

    public void onDeviceStatusReceived()
    {
        connectionWatchdog.feed();
    }

    public void shutdown()
    {
        watchdogTask.cancel(true);
    }
}

There are a few minor details to note here. Our watchdog will implement Runnable to allow us to execute it in a background thread. Using the executor framework also makes it easy to stop our watchdog when the application is shutting down via its Future. The Application class also exposes a hook for handling incoming status messages from our remote device. Every time we receive a status message, we want to reset or "feed" our watchdog timer.

Next, the watchdog in its simplest form:

public final class ConnectionWatchdog implements Runnable
{
    private final BlockingQueue<Boolean> watchdogFood = new LinkedBlockingQueue<>();

    @Override
    public void run()
    {
        Thread.currentThread().setName("Connection Watchdog");
        try
        {
            while (true)
            {
                if (watchdogFood.poll(1, TimeUnit.MINUTES) == null)
                {
                    System.out.println("Device disconnected!");
                }
            }
        } catch (InterruptedException e)
        {
            Thread.currentThread().interrupt();
        }
    }

    public void feed()
    {
        watchdogFood.add(true);
    }
}

The heart of this implementation revolves around a BlockingQueue. The run() method is polling the queue, waiting up to our specified timeout value for an element to become available (in this case 1 minute). Note that poll() is a blocking operation – we are not consuming any CPU cycles while we are waiting on something to be added to the queue.

When we feed the watchdog, we add an element to the queue for the run loop to consume. When a non-null value gets polled from the head of the queue, the run loop simply restarts the 1-minute poll.

The poll() method on a BlockingQueue will return null if the waiting time elapses before an element is available. In this case, the run loop will enter the if block and print out the message indicating that the device was no longer connected.

Other Implementation Notes

If you (or your IDE) are worried that the while loop cannot complete without throwing exception – it’s by design! The poll() call on the BlockingQueue declares a checked InterruptedException that will be thrown if it gets interrupted while waiting. What we want is a watchdog that runs forever, or at least until our application is shutting down. Whether we shut down the entire executor service or cancel the watchdog’s Future directly like we do in the example Application class, this will interrupt the poll operation and break us out of our while loop. The only thing left to do in the catch block is to restore the interruption status so that code higher up on the call stack can deal with it.

You’ll notice that I am setting the name of the watchdog thread at the start of the run() method. Since this thread will be running in the background until the end of the application, I find it useful to rename the thread to make it more clear to anyone who does any profiling or thread dumps.

I’ve also chosen to use an unbounded implementation for my BlockingQueue. Even though I can’t imagine a situation where we were feeding the watchdog faster than it can consume its food, if you were worried about it you could use a bounded queue. In that case, you could use offer() in the feed method and simply ignore the return value (if the queue happened to be full, you wouldn’t be worried about failing to add an additional element).

Customizations

There are so many ways to extend and customize our basic implementation. Here are a few ideas.

Programmable Time Intervals

If you wanted a more general use watchdog, you could allow clients to set the timeout value in the constructor instead of relying on a hard-coded value.

private final long timeout;
private final TimeUnit unit;

public ConnectionWatchdog(long timeout, TimeUnit unit)
{
    this.timeout = timeout;
    this.unit = checkNotNull(unit);
}

State-based Timeout

If your application didn’t need to be notified at the exact moment your watchdog timer elapsed, you could store the state in an instance variable within the watchdog itself. With a public accessor, your application can check the state as it needs.

For our connection watchdog example, we can use a boolean to track whether the device is connected. Since we will be updating the value from our timer thread and accessing it from a separate application thread, I’m using an AtomicBoolean to ensure thread-safety.

private final AtomicBoolean isConnected = new AtomicBoolean(false);

@Override
public void run()
{
    Thread.currentThread().setName("Connection Watchdog");
    try
    {
        while (true) isConnected.set(watchdogFood.poll(1, TimeUnit.MINUTES) != null);
    } catch (InterruptedException e)
    {
        Thread.currentThread().interrupt();
    }
}

public boolean isConnected()
{
    return isConnected.get();
}

Early Exit

In the basic implementation, our watchdog will continue polling even after a timeout event is issued. If we wanted our watchdog to expire once the first timeout is reached, we could set it up with a do-while loop.

try
{
    Boolean food;
    do
    {
        food = watchdogFood.poll(1, TimeUnit.MINUTES);
    } while (food != null);
    System.out.println("Device disconnected!");
} catch (InterruptedException e)
{
    Thread.currentThread().interrupt();
}