Time Series Forecasting Using Past and Future External Data with Darts (2024)

Building models that are able to capture external data is often a key aspect of time series forecasting projects. For instance:

  • Recently-observed activity on an e-commerce website can help predict future sales.
  • Observed rainfalls and known weather forecasts can help to predict hydro and solar electricity production.
  • Making the model aware of up-coming holidays can help sales forecasting.
  • Knowing that some intervention is ongoing on a system can be helpful for correcting forecasting / outage detection.
  • etc…

In fact, more often than not, strictly relying on the history of a time series to predict its future is missing a lot of valuable information.

Dartsis an open source Python library whose primary goal is to smoothen the time series forecasting experience in Python. Out of the box it provides a variety of models, from ARIMA to deep learning models, which can all be used in a similar straightforward way usingfit()andpredict(). In this post, we’ll show how Darts can be used to easily take “covariates” — other time time series providing useful information — into account. First, let us quickly explain a subtle-yet-important distinction between “past” and “future” covariates.

Past and Future Covariates

We define two kinds of time series which can be used for forecasting:

  • Past covariatesare time series whose past values are known at prediction time. Those series often contain values that have to be observed to be known.
  • Future covariatesare time series whose future values are known at prediction time. More precisely, for a prediction made at timetfor a forecast horizonn, the values at timest+1, …, t+nare known. Often, the past values (for timest-k, t-k+1, …, tfor some lookback windowk) of future covariates are known as well. Future covariates series contain for instance calendar informations or weather forecasts.

Note that in general future covariates can also be used as past covariates, whereas the reverse is not true.

Past and Future Covariates in Darts

Darts differentiates models that make use of past and future covariates:

  • Past covariates models: Thefit()andpredict()methods of these models accept only apast_covariatesargument (specifying one or a sequence ofTimeSeries). These models will look only at past values of the covariate series when making a prediction. Past covariates models:BlockRNNModel,NBEATSModel,TCNModel,TransformerModel,RegressionModel(incl.LinearRegressionModelandRandomForest).

Time Series Forecasting Using Past and Future External Data with Darts (1)

Depiction of the inputs/outputs for “Past Covariates models” at prediction time.

  • Future covariatesmodels: Thefit()andpredict()methods of these models accept only afuture_covariatesargument. The training procedure will look at future values of the covariates (and possibly at historic values too), and future values will have to be provided at prediction time. Global future covariates models:RNNModel,RegressionModel(incl.LinearRegressionModel&RandomForest). Local future covariates models:ARIMA,VARIMA,AutoARIMA.

Time Series Forecasting Using Past and Future External Data with Darts (2)

“You shouldn’t be too worried about making a mistake when employing past and future covariates, because Darts will complain if you try providing the wrong kind of covariates to the wrong model or if your covariates are not known sufficiently into the future (or into the past). In addition, it takes care of slicing the covariates and targets for you automatically, even if they are not aligned (as long as the time axes of the series are correct).”

Note thatRegressionModel(incl.LinearRegressionModelandRandomForest)support bothpast_covariatesandfuture_covariates. In the rest of the article, we’ll see how to fit some RNN-based models using either past covariates or future covariates, and then we’ll fit aRegressionModelusing both past and future covariates.

A Toy Example: Forecasting a River Flow

As a toy example, let’s assume we want to forecast the flow of a river. We’ll be using synthetic time series data (created with Darts as well) to demonstrate how past and future covariates can be used. What we’ll do here is only meant to demonstrate how covariates can be used, and by no means represents a good (or realistic) way to forecast an actual river flow 😉

You can reproduce this example by installing Darts as follows:

pip install darts

The entire code is also available in a notebookhere.

A Simplistic River Model

We assume that the flow of our river on daytdepends on two factors:

  • The melting rate of an upstream glaciert – 5days ago.
  • The rainfalls during the last 5 days (fromt – 4tot).

We want to forecast the flow 10 days in advance. Furthermore, we assume that:

  • The glacier’s melting rate is not known in advance because we have to measure it directly in order to know it; it is thus apast covariate.
  • The rainfall is known 10 days in advance from weather forecasts. It is thus afuture covariate. It is also known in the past.

We start by generating some synthetic daily time series to create a problem instance. Darts’ global models (such as neural networks and regression models) can easily be trained on multiple time series (for instance just callingmodel.fit([series1, series2, ...], past_covariates=[covariate1, covariate2, ...])), so we could simulate several rivers and train one model on all these data. But here we will focus on showing how to use past and future covariates using only one target series.

In the code below,meltingis our past glacier melting covariate series,rainfallsis the future rainfall covariate series, andflowis the target river flow (which we want to forecast):

<script src=”https://gist.github.com/hrzn/dd81ce4770626527a33dd3308cc02827.js”></script>

Time Series Forecasting Using Past and Future External Data with Darts (3)

Our synthetic daily dataset representing the flow as a sum of lagged glacier melting and rainfalls. The dataset starts in January 2000 and lasts 3 years.

Evaluating Models

Now that we have our data, we can already think about how we would want to evaluate and compare the different models we’ll build. Below we write a small function which performs backtesting and evaluates the accuracy of a 10-days ahead predictions over the last 20% of the flow series, using RMSE:

First Model: No Covariate

Let’s first create a BlockRNNModel. These models support past_covariates, but here in order to get a first benchmark, we’ll fit it on the target only and see what we get. We somewhat arbitrarily select an input_chunk_length of 30 (this corresponds to the lookback window of the model), and we set the output_chunk_length to 10, as this is the horizon we’re interested to forecast:

Time Series Forecasting Using Past and Future External Data with Darts (4)

Block RNN model without covariate. Backtest RMSE = 0.194

Second Model: Using Past Melting Data

Let’s now try to provide the melting series as a past_covariates to the model fit() function. Doing this means that the model will look at the past 30 time steps of melting (in addition to the past 30 time steps of the target) when producing a forecast.

Time Series Forecasting Using Past and Future External Data with Darts (5)

Block RNN model with melting as a past covariate. Backtest RMSE = 0.172

This already improved the RMSE from 0.194 to 0.172, which is not bad; looking at the past melting helps because it determines part of the current flow.

Third Model: Using Past Melting and Past Rainfall Data

We can seamlessly extend this to use both the past melting and past rainfall data. The rainfall is known in advance, but here we specify it as a past_covariates, which means that the model will only look at past rainfalls.

In the following snippet, melting.stack(rainfalls) produces one multivariateTimeSeries containing two dimensions: the melting and the rainfall. This is the series we use as a past covariate.

Time Series Forecasting Using Past and Future External Data with Darts (6)

Adding past rainfalls helps too, reducing the error further from 0.172 to 0.169. The rainfalls impacts the next 5 days’ flow, and so past rainfalls provide some amount of signal to predict the next 10 days’ flow. The impact is still somewhat limited, though, because this model is only looking at past rainfalls and not at the actual future rainfalls happening during the 10 days for which we want to predict the flow.

Fourth Model: Using Future Rainfalls

Let’s now try to use future rainfalls as a covariate. This might help us because a model using future_covariates will be able to look at the next 10 days’ rainfalls (in addition to past rainfalls) in order to predict the next 10 days’ flow. To do this, we’ll use an RNNModel, which is a “pure RNN” implementation that is able to use future_covariates(our RNNModel is similar to DeepAR).

Time Series Forecasting Using Past and Future External Data with Darts (7)

RNNModel using the rainfalls as a future covariate. Backtest RMSE = 0.158

It seems that it’s working: letting the model see the rainfalls for the next n=10 days brings back the RMSE down to 0.158. Again, this makes sense as the recent rainfalls make up a large component of the flow.

Note that we cannot use the melting as a future covariate, because it is not known in advance, and so we wouldn’t be able to provide it at prediction time (Darts would complain if you tried to call predict() with a future_covariates series that doesn’t extend at least 10 time points in the future further than the target).

Fifth Model: Using Past Melting and Future Rainfalls

Finally, we will now use a RegressionModel in order to be able to specify both a past_covariates and a future_covariates. RegressionModel in Darts is a wrapper around any “scikit-learn like” regression model, and by default it will use a linear regression. It can predict future values of the target series as a function of any combination of lagged values of the target, past and future covariates.

The lags of the target and past covariates have to be strictly negative (in the past), whereas the lags of the future covariates can also be positive (in the future). For instance, a lag value of -5 means that the value at time t-5 is used to predict the target at time t; and a lag of 0 means that the future covariate value at time t is used to predict the target at time t. In the code below, we specify past covariate lags as [-5, -4, -3, -2, -1] which means that the model will look at the last 5 past_covariates values (we could also have specified lags_past_covariates=5 instead). Similarly, we specify the future covariate lags as [-4, -3, -2, -1, 0] which means that the model will look at the last 4 historic values (lags -4 to -1) and the current value (lag 0) of the future_covariates. (we could also have specified lags_future_covariates=(4,1) instead). Note that we do not specify any lags here, which means that this model won’t look at past values of the target at all — it will look at covariates only.

Time Series Forecasting Using Past and Future External Data with Darts (8)

RegressionModel (using a linear regression) predicting the flow as a function of the past 5 melting values and the past 4 and current rainfall values. Backtest RMSE = 0.102.

This model drastically improves the RMSE error, down to 0.102. So once again, linear regression wins! In fact, if we kept some additive noise on the covariates but removed the additive noise on the flow, we would find that this model produces perfect forecasts. To be fair, this was expected because the target is built as a linear combination of the covariates to begin with, and we built our RegressionModel specifying the exact right lags capturing the data generation process. Still, we expect these regression models to be very useful in practice, due to their speed, versatility in capturing both past and future covariates with precise lags, and the fact that, similar to neural networks, they can be trained on multiple series while requiring less tuning.

Conclusions

Past and future covariates often play an important role in forecasting problems, but they can be hard to handle and reason about. One goal of Darts is to make this experience easier and less error prone: using covariates with Darts boils down to providing your external time series data past_covariates or future_covariates arguments to the fit() and predict() methods of the models. In our river flow example, we observed that knowing past glacier melting and future rainfalls can each improve forecasting to different extents, and building a simple linear-regression based model capturing both obtains the best results in this case.

If you have any feedback on Darts, or if you have forecasting challenges you’d like to tell us about, feel free to reach out to us.

Time Series Forecasting Using Past and Future External Data with Darts (2024)
Top Articles
Latest Posts
Article information

Author: Frankie Dare

Last Updated:

Views: 5645

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Frankie Dare

Birthday: 2000-01-27

Address: Suite 313 45115 Caridad Freeway, Port Barabaraville, MS 66713

Phone: +3769542039359

Job: Sales Manager

Hobby: Baton twirling, Stand-up comedy, Leather crafting, Rugby, tabletop games, Jigsaw puzzles, Air sports

Introduction: My name is Frankie Dare, I am a funny, beautiful, proud, fair, pleasant, cheerful, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.