Time-series Forecasting using Conv1D-LSTM : Multiple timesteps into future.
Generally, there are many time-series forecasting methods such as ARIMA, SARIMA and Holtz-winters, but with the advent of deep learning many have started using LSTM for time-series forecasting. So why do we need Conv1D-LSTM/RNN for time series? Some of the reasons that I would come up are below.
- The Conv1D layers smoothens out the input time-series so we don’t have to add the rolling mean or rolling standard deviation values in the input features.
- LSTMs can model problems with multiple input variables. We need to give a 3D input vector as the input shape of the LSTM.
- This adds a great benefit in time series forecasting, where classical linear methods can be difficult to adapt to multivariate or multiple input forecasting problems (A side note here for multivariate forecasting — keep in mind that when we use multivariate data for forecasting, then we also need “future multi-variate” input data to predict the future outcome!…to mitigate this we have two methods discussed below.)
- Flexibility to use several combinations of seq2seq LSTM models to forecast time-series — many to one model(useful when we want to predict at the current timestep given all the previous inputs), many to many model (useful when we want to predict multiple future time steps at once given all the previous inputs) and several other variations on these.
In this post, I would like to focus on many to many model. In this case we can solve the problem in two different ways.
- Iterated Forecasting or Auto-regressive method: Create a look-back window containing the previous time steps to predict the value at the current step and then make a prediction. Now, add back the current prediction into the window to make prediction at the next time step and so on. This method is relatively easier but adds the cummulative error at every time step and the predictions are not very accurate.
- Direct Forecasting or Single-shot predictions: Create a look-back window containing the previous time steps to predict the value into the future. Here, we are using the “K” step forecast method.The value of “K”, that means the number of time steps we want to predict into future should be given in advance
The core idea and the mathematical equation has been taken from this research paper. I have tried to implement this Direct forecasting technique to predict the Global active power values into the future for 30 days. The dataset is taken from UCI machine learning repository and can be accessed from here.
Okay, lets do some coding!
# Read the data
df = pd.read_csv('/content/sample_data/household_power_consumption.txt',
parse_dates={'dt' : ['Date', 'Time']},
sep=";", infer_datetime_format=True,
low_memory=False, na_values=['nan','?'], index_col='dt')# The first five lines of df is shown below
df.head()# we use "dataset_train_actual" for plotting in the end.
dataset_train_actual = df.copy()# create "dataset_train for further processing
dataset_train = df.copy()
Now create training_set which is a 2D numpy array.
# Select features (columns) to be involved intro training and predictionsdataset_train = dataset_train.reset_index()cols = list(dataset_train)[1:8]# Extract dates (will be used in visualization)
datelist_train = list(dataset_train['dt'])
datelist_train = [date for date in datelist_train]training_set = dataset_train.values
Create two scalers one for input features and the other for target that has to be predicted. Note that “Global active power” collumn is present in input features too.
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
training_set_scaled = sc.fit_transform(training_set)
sc_predict = StandardScaler()
sc_predict.fit_transform(training_set[:, 0:1])
Create the datastructure for training:
# Creating a data structure with 72 timestamps and 1 outputX_train = []
y_train = []
n_future = 30 # Number of days we want to predict into the future.
n_past = 72 # Number of past days we want to use to predict future.for i in range(n_past, len(training_set_scaled) - n_future +1):
X_train.append(training_set_scaled[i - n_past:i,
0:dataset_train.shape[1]])
y_train.append(training_set_scaled[i+n_future-1:i+n_future, 0])
X_train, y_train = np.array(X_train), np.array(y_train)print('X_train shape == {}.'.format(X_train.shape))
print('y_train shape == {}.'.format(y_train.shape))
Explanation:
If the input feature values from [0:72] rows and all input columns then the target value that is learned is [72+30–1:72+30] row and one target column of the data. Since we predicting 30 values directly into future, we are making our model learn in such a way that for every block of input features(our lookback value is 72) the target is 30 timesteps ahead.
Now let’s create the model for training.
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=32, kernel_size=3,
strides=1, padding="causal",
activation="relu",
input_shape=[None, 7]),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32, return_sequences=False)),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 200)])# lr_schedule = tf.keras.callbacks.LearningRateScheduler(
# lambda epoch: 1e-8 * 10**(epoch / 20))optimizer = tf.keras.optimizers.SGD(lr=1e-5, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
optimizer=optimizer,
metrics=["mse"])
I have taken the learning_rate=1e-5, after running the above model using lr_schedule and plotting the plot (learning_rate vs loss) as in below.
import matplotlib.pyplot as plt
plt.semilogx(history.history["lr"], history.history["loss"])
plt.axis([1e-8, 1e-4, 0, 30])
Calculating the predictions into future.
# Perform predictionspredictions_future = model.predict(X_train[-n_future:])# getting predictions for training data for plotting purposepredictions_train = model.predict(X_train[n_past:])y_pred_future = sc_predict.inverse_transform(predictions_future)
y_pred_train = sc_predict.inverse_transform(predictions_train)# Construct two different dataframes for plotting.PREDICTIONS_FUTURE = pd.DataFrame(y_pred_future, columns=['Global_active_power']).set_index(pd.Series(datelist_future))PREDICTION_TRAIN = pd.DataFrame(y_pred_train, columns=['Global_active_power']).set_index(pd.Series(datelist_train[2 * n_past + n_future -1:]))
Let’s visualize the predictions.
# Set plot size
plt.rcParams['figure.figsize'] = 14, 5# Plot parameters
START_DATE_FOR_PLOTTING = '2009-06-07'# plot the target column in PREDICTIONS_FUTURE dataframeplt.plot(PREDICTIONS_FUTURE.index, PREDICTIONS_FUTURE['Global_active_power'], color='r', label='Predicted Global Active power')# plot the target column in PREDICTIONS_TRAIN dataframeplt.plot(PREDICTION_TRAIN.loc[START_DATE_FOR_PLOTTING:].index, PREDICTION_TRAIN.loc[START_DATE_FOR_PLOTTING['Global_active_power'], color='orange', label='Training predictions')# plot the target column in input dataframeplt.plot(dataset_train_actual.loc[START_DATE_FOR_PLOTTING:].index, dataset_train_actual.loc[START_DATE_FOR_PLOTTING:]['Global_active_power'], color='b', label='Actual Global Active power')plt.axvline(x = min(PREDICTIONS_FUTURE.index), color='green', linewidth=2, linestyle='--')plt.grid(which='major', color='#cccccc', alpha=0.5)plt.legend(shadow=True)plt.title('Predcitions and Acutal Global Active power values', family='Arial', fontsize=12)plt.xlabel('Timeline', family='Arial', fontsize=10)plt.ylabel('Stock Price Value', family='Arial', fontsize=10)
I resampled the dataset at frequency of 24hrs or a day. The data is actually sampled at a frequency of one data point per minute.
Complete code is present at the following Github link.
Please feel free to share your comments!
Thanks for reading!