I want to apply the Transformer in Time Series. I read some articles and short notes regarding how to apply the Transformer Network to time series data.
Some researchers used positional encoding, while others used Time2Vec for encoding time series features. Which one is suitable for time series? I read an article titled “Time2Vec: Learning a Vector Representation of Time,” and for the implementation of Time2Vec on time series data, I followed the short note https://towardsdatascience.com/time2vec-for-time-series-features-encoding-a03a4f3f937e. From both of these articles, I gained some theoretical understanding and a basic concept of implementation.
On the other side, researchers implemented Positional Encoding to time series for transformer-based models. Please, someone, explain in detail if possible, with an example of implementation, as it is encouraged.
Second question: I mentioned in the first sentences I am a beginner in Transformers. So, I have a question regarding the input_shape of time series data into the Transformer. I followed some videos on YouTube regarding the implementation of Transformers; the input shape in NLP for a Transformer is like [batch_size, sequence_length, d_model], and I found for time series, it’s represented as [batch_size, sequence_length, n_feature].
I understand that the batch_size is the bundle of data where we want to input for training, and sequence_length is the number of features or dimensions in our dataset. My question is about the last component, “n_feature.” Does it refer to the number of features we want to predict? In univariate time series, we set it to one, and in multivariate time series, we set it to greater than one.