Rethinking Temporal Dependencies in Multiple Time Series: A Use Case in Financial Data

These days, complex systems yield copious time series data, necessitating understanding co-generation, often assessed through pairwise comparisons. However, this method lacks scalability and temporal dynamics handling. In this paper, we advocate using a temporal graph to capture contiguous effects among multiple time series efficiently. Our two-step approach identifies patterns and temporal influences with low execution time, showcasing its potential in financial system incident prediction.


I. INTRODUCTION Complex systems generate huge amounts of temporal data.
Examples include climate systems and financial systems that generate or record data of multiple, often similar or correlated, variables sequentially to form datasets of multiple time series.
From a data perspective, this represents co-evolving data based on their time dependency.For example, data consisting of different weather stations' recordings of the rainfall volume are multiple (co-evolving) time series.Identifying dependencies in time series is crucial in areas such as financial markets [1], climate science [2], etc.
Dependency in time series data refers to the relationship(s) between variables or factors in time.To identify these dependencies a pairwise comparison such as the Granger causality test can be performed to measure a series of past values relationships and determine their subsequent directionality.It can also merely be done by looking at how series are correlated [3] in order to capture dependency strength.
To motivate this research, we present a case study involving a two-time series obtained from the Forex market, illustrated in Fig. 1.When examining the initial figure (the up), it is obvious that the series displays a clear downward trend that sharply declines for EURUSD right before the vertical black boundary.Conversely, CADEUR maintains a consistent horizontal fluctuation until the boundary.Starting from the boundary, CADEUR is on a declining trend while EURUSD remains relatively stable with some minor fluctuations.The question we ask is whether one time series has an effect on another beyond this vertical boundary.The four plots below with further explanations in the next paragraph, illustrate how the two-time series can influence each other.This showcases the complexities and interconnections that can arise in time series analysis.More on this in the subsequent sections.Despite some advances in identifying dependencies, progress in analyzing multiple time series data is rare.Existing methods mainly concentrate on autocorrelation, failing to capture the propagation of dependencies over time.Moreover, these approaches assume stationarity, limiting their ability to detect long-duration dependencies.In this paper, we address these challenges by learning directional dependencies in the co-evolution behaviour of multiple time series data.We also simplify our model by assuming that these time series can be regenerated using a finite number of linear models.We use a sequence of bipartite graphs to mimic dependencies we have at contiguous time intervals.
Our major contributions are as follows: • We suggest an approach that detects latent dependencies in multiple time series data by approximating them using finite linear models based on identified patterns.
• We illustrate temporal dependencies using a bipartite causal graph to model changes across multiple time series by linking repetitive and new dependencies at varying time durations in sub-series.
• To validate our approach, we experiment with real financial market datasets.We show the perks of the proposed method by predicting the subsequent dependencies occurring at time frames ahead.
This paper includes the following sections: related work, background and definitions, methodology, experiments, and conclusion.

II. RELATED WORK
This study builds on recent progress in comprehending temporal data dependencies and causality in finance.Traditional correlation analysis [4] underpins dependency detection, with financial research exploring the dynamic nature of market price correlations, especially during crises [5].Furthermore, the GARCH framework [6] is used to gauge interdependence in financial market movements, highlighting potential drawbacks in relying solely on conditional correlation coefficients.In [7], copula functions were employed to model temporal dependencies in daily stock market returns, uncovering heightened interconnections among European markets in response to shared directional shifts, leading to market crashes or booms.
Previous research on identifying dependencies through representation, particularly in the context of causality, has been explored in works by Eichler [8].The study demonstrates how to transform temporal data into graphs using Granger causality, leveraging time order to establish causal relationships among variables, albeit under certain assumptions, including non-instantaneous dependencies.Tian et al. [9] introduced a method for identifying causal dependencies by detecting structural changes based on local, spontaneous alterations in the underlying data-generating model.
In this context, an important issue is how dependency between multiple time series data in finance can be identified, represented and predicted over time.

III. PRELIMINARIES
We rely on several concepts and definitions for learning the parameters and identifying representative patterns/behaviours to represent temporal dependencies and predict causal dependencies.In what follows, we provide descriptions of important concepts used throughout this work.Multiple time series: Given a time series S i , we call multiple-time series a set of time series synchronously generated.The notation MS = {S i | i = 1, 2, . . ., n} is employed to denote a set of n series.Time interval: A time interval refers to a set of limited time points.For a series of time points (t 1 , t 2 , . . ., t 100 ), we use the notation T 1 to mean a time interval of a duration |T 1 | = 100.In the same way, any time interval T j of duration d, d ∈ N * (i.e., |T j | = d) will corresponds to the series (t j , t j+1 , . . ., t j+d−1 ).Sub-time-series data: A sub-time-series data is a section of temporal data observed within a time interval.The notation S i j denotes the portion of the time series S i observed within the time interval T j .
In identifying dependencies among multiple time series data, accounting for dynamic and temporal aspects is essential, as traditional lagged approaches may prove ineffective [10].Our approach emphasizes the significance of varying intervals in establishing time-dependent dependencies, as elaborated below.Definition III.1 (Pattern).We call a pattern, the values of the set of parameters Θ i = {e i 0 , e i 1 , . . ., e i p }, p ∈ N * of an autoregression model enabling the regeneration of a sub-time series S i .That is, for each time point t ∈ T j the observed data y(t) ∈ S i j can be approximated as, For the purpose of simplification, in the rest of the paper, we will use f (T j |Θ i , S i ) to denote the above autoregression function enabling the regeneration of the sub-time-series S i j via the pattern Θ i .
From the Definition III.1, at each time interval T j , for each sub-time-series S i j , there could be a corresponding pattern Θ i enabling its regeneration.This means that for m time intervals, there can be a total of m × n distinct patterns.However, with distinct patterns, no dependency between the multiple time series can be captured.We assume that if there are dependencies in the evolving process of multiple time series, then there should be a much smaller number of common patterns K ≪ m × n allowing approximately regenerate the sub-time-series.Definition III.2 (Co-evolving Time Series).A multiple time series data set MS = {S i | i = 1, . . ., n} is said to co-evolve across time intervals T 1 , . . ., T m , if there exists a small set of patterns Θ = {Θ κ |κ = 1, . . ., K}, with K ≪ m × n, such that, with ⊙ the concatenation operator.
The identification of the optimal number patterns is in-depth given in Section V-B.As suggested in Definition III.2, if we have multiple time series that co-evolve, then there should be similar patterns (if not equivalent patterns) enabling the regeneration of these time series at different time intervals.This, therefore, yields the following definition.
Definition III.3 (Pattern Similarity).Two patterns Θ 1 and Θ 2 extracted from a pair of co-evolving multiple time series MS are said to be similar (Θ 1 ≊ Θ 2 ) if there exist a subtime-series S i j that could be well generated by all of these patterns.
IV. PROBLEM STATEMENT Given a set of multiple time series MS, (1) We search for a (small) number of behaviours B = {B κ , κ = 1, . . ., K} in terms of regenerative autoregression    (2) Based on the identified behaviours B, we want to learn the transition mechanism that captures the temporal dependency among the set of series that time-evolve together.

V. METHODOLOGY A. Overview of the proposed approach
Graphs are commonly used to uncover co-dependencies in temporal data with nodes representing series and edges denoting relationships.However, for a large number of coevolving series, this approach can lead to either dense or sparse graphs.A denser graph limits its utility in understanding interseries effects, and we aim to avoid pairwise comparisons.
To streamline comparisons and handle multiple time series efficiently, we create a pattern graph (see Section III), where each node denotes a potential behaviour exhibited by a series.Each node is associated with an n-dimensional attribute variable, taking values in 0, 1 n .Specifically, the attribute's value at position i is 1 (0) if series S i exhibits (does not exhibit) the represented behaviour.For instance, suppose we have four series: S 1 , S 2 , S 4 displaying behaviour B 1 , while S 3 does not.Instead of using four separate nodes, we employ a single node to represent behaviour B 1 and its corresponding attribute to indicate which sequences exhibit the behaviour.See Fig. 2(a) for a visual representation.
To get an overview of the proposed approach, consider Fig. 2(b) − (d).In Fig. 2(b), we have a temporal multiple series with 6 examples.Our approach aims to identify 4 main behaviours (B 1 − B 4 ) represented by grey, yellow, green, and blue colours over time.To achieve this, we start by setting a time interval T 1 and extracting sub-co-evolving sequences (S 1 1 − S 6 1 ).We then estimate optimal parameters for linear models to regenerate these sub-series, using a clustering strategy.Initially, we obtain parameters for three linear models associated with yellow, grey, and green behaviours.
Next, we move forward using a fixed stride, checking if the existing models can still regenerate the sub-series.If they can't, we identify new linear models for regeneration.In our example, this process continues until we reach time interval T 2 , where we discover a new behaviour tagged with the colour blue.This process repeats until the latest observable time point of the co-evolving time series, as shown in Fig. 2(c).
Based on the identified models, we build a bipartite graph relating the effects series may have on each other at each contiguous time interval.Fig. 2(d) depicts the series of bipartite graphs.From T 1 we can note that nodes tagged with behaviours B 2 and B 3 are connected to nodes tagged with behaviours B 1 , B 2 and B 4 at time T 2 .For example, the relation B 1 → B 1 is observed because series S 2 presents the same behaviour during the two-time intervals.In the same way, the relation B 2 → B 1 is observed because series S 1 switches its behaviour from B 2 at T 1 to B 1 at T 2 .It is good to recall that, each node has attributes.
In brief, we suggest a two-step method to detect and model evolving effects in multiple time series.First, we identify patterns to recreate series variation over various time intervals.Second, using these patterns, we reconstruct the ensemble of series variation across different time intervals to analyze their mutual effects.We use a temporal bipartite graph to depict these incidents chronologically.Now, let's delve deeper into each step of our approach.

B. Step-1: Pattern identification
Pattern identification is a core step of our approach.In Algorithm 1 we have steps illustrating how we identify the patterns from a given multiple time series.
Given MS = {S i | i = 1, 2, . . ., n}, at a given first time interval T 1 of duration d, through a linear autoregressive model f (), we initially identify the set The set Θ is later clustered into an ensemble of homogeneous subsets each represented by one pattern Θ κ ∈ Θ.It is good noting that, patterns all falling within the same cluster enable the regeneration of the same time series within a given time interval.
Moving across the series, we repeat the process at each time interval T j ahead if and only if there exists some sub-time series for which none of the discovered patterns can regenerate their values at that time interval.At the end of the series coverage, we obtain a final set of patterns Θ = {Θ κ | κ = 1, . . ., K} from which we can regenerate the overall set of co-evolving time series.

C. Step-2: Causal dependencies
Based on the set of patterns, we build a temporal graph SG = {G 1, 2 , G 2, 3 , . ..}where each G j, j+1 = (B, E j, j+1 , A j , A j,j+1 ) is bipartite, attributed and directed graph with the set of oriented edges relating the behaviour transition a series may adopt in consecutive time intervals and are the sets of outgoing and incoming node attributes.
In the rest the of paper, when going to mathematical calculation, sets A j (resp.A j+1 ) and E j,j+1 should be viewed as the attribute matrix in {0, 1} K×n and adjacency matrix in {0, 1} K×K respectively.

VI. PREDICTING TEMPORAL DEPENDENCIES
Knowing how to identify dependencies from temporal data, we now want to capture the hidden logic explaining the behavioural changes that time series undergo from one time interval to another.

A. Learning process
We leverage the temporal bipartite graph to learn link dynamics, using a graph neural network for encoding and a reconstructor neural network for decoding.This constitutes a graph autoencoder, explained further in the following sections.
1) Encoder: To learn the causal dependencies, we exploit a graph-based neural network for encoding the previous behaviour transitions into a latent space.Due to the fact that snapshots G j,(j+1) ∈ SG of our temporal graph are distinctly built, for an order p ∈ N + , we distinctly and respectively encode p+1 consecutive graphs G (j−p−1),(j−p) , . .., G (j−1),j into latent data Y (j−p−1),(j−p) ∈ R 2K×q , . .., Y (j−1),j ∈ R 2K×q , q << n.Each latent data Y (j−p−1+l),(j−p+l) is here calculated using a l th (0 ≤ l ≤ p) topology adaptive graph convolutional neural network (T AGCN l ()) as, Above the fact that each snapshot of the temporal graph is generated using a TAGCN, through a perceptron, we aggregate the distinct p first encoded graph as Using equations ( 6) and ( 7) we thus define our encoder as, where W enc are the encoder parameters.
2) Decoder: In contrast of the encoder that ingest p + 1 graphs, the decoder takes as input the (p + 1) th embeddings and attempts to reconstruct the p+1 corresponding graph, that is the adjacency matrix and the attribute matrices.Formally, we define our decoder as, where W r attr and W r edge are the decoder neural network parameters.
3) Learning: Given W enc the parameters of the encoder, we train it in such way that the calculated latent data Ŷ(j−1),j values via Eq.( 7) should be close to the value naturally generated by T AGCN p+1 () the (p + 1) th TAGCN network as follows, For the decoder, we calculate the loss in reconstructing the attribute matrices as well as the adjacency matrix as, L edge dec = min Using equations ( 10), ( 11) and ( 12) we then calculate the whole loss of our autoencoder as,

B. Prediction
Given the temporal dependencies SG = {G 1,2 , . . ., G (m−1),m } covering time intervals ranging from T 1 to T m .We want to predict the plausible dependencies we may observe at the next contiguous time intervals T m and T m+1 (i.e., the graph G m,(m+1) ).For this, we will exploit our trained graph autoencoder.Here, we particularly make use of the MLP component of the encoder and the whole decoder component.
Recall that for a settled lag p, the encoder will ingest p + 1 consecutive snapshots of the temporal graph and reconstruct the (p + 1) th graph.Knowing the latest p + 1 snapshots we generate the corresponding embeddings using Eq. ( 8).To predict next embeddings at the contiguous time intervals T m and T m+1 , we use the relation Eq. ( 7) where we pass the novel list of p embeddings given as, Y (m−p),(m−p+1) , . . ., Y (m−1),m .From the aggregated embeddings, we thus use the decoder given in Eq. ( 9) to reconstruct the node attributes as well as the adjacency matrix.Formally, to predict the next dependencies G m,(m+1) we only need to estimate the attribute A m+1 and the adjacency E m,(m+1) matrices as follows,

VII. EXPERIMENTS
In this section, we evaluate the proposed approach: 1) For identifying causal dependencies in multiple time series using a linear regression model.We also present how patterns are repeatedly exhibited over time across the input multiple data.
2) For learning the underlying causal dependencies' mechanism explaining the co-evolutionary aspect of the input multiple data present.
3) For predicting causal dependencies in subsequent remaining series at times ahead.We utilize the FINCH clustering algorithm [11] to determine initial clusters in a scalable manner, eliminating the need for predefined cluster numbers.For addressing time series data representation challenges, we turn to Topology Adaptive Graph Convolutional Networks (TAGCNs) [12], renowned for their interpretable node representations, scalability, and performance in time series analysis, particularly in forecasting.We leverage TAGCN to perform convolutional operations, enabling us to aggregate local neighbourhood information effectively.

A. Dataset description
Our model was tested with a diverse range of real-world financial market data, including stocks, currencies, investment funds, and major indices from Yahoo Finance.This data offered unique insights into market behaviours.For stocks, we focused on 74 out of 78 large-cap stocks in the Global Industry Classification Standard -Information Technology sector index as of March 2022, spanning from 2017-01-01 to 2022-01-01.Additionally, we utilized Exchange-Traded Funds (ETFs) data, representing 15 series of the most liquid and     traded ETFs, suitable for global macro investment strategies.This dataset encompassed US and global equities, emerging markets, US Treasuries, bonds, commodities, real assets, and the US dollar index.For major indices, we sourced data from Yahoo Finance, covering a series range from 2009-01-01 to 2014-03-28, including 35 out of 36 series available as of June 2023.Lastly, we analyzed forex cross-currency rates, incorporating all major currencies and others, utilizing 26 out of 29 cross-currency series as of June 2023 from Yahoo Finance, between 2006-01-01 and 2021-12-31.We excluded 3 cross-currency series, 4 large-cap stocks, and 1 world index due to null values or data unavailability.
Table I summarizes the datasets of multiple series of returns.Each set exhibits observable temporal dependencies, as shown in Fig. 1.

B. Causal dependencies
To begin, let's examine the potential patterns that can arise from using linear autoregressive models of the form y(t) = a 0 + a 1 y(t − 1) + a 2 y(t − 2) (i.e., lag 2).Fig. 3 depicts the persistence of identified patterns in each of the datasets.Each data set comprises two components; the left side is the total number of series generated by the identified model and, the right side is the lifespan (duration) where these models are  As shown in Fig. 3, some patterns persist throughout the entire time intervals, indicating inter-series influences through generative models.This observation aligns with the findings in Fig. 4, detailed in Section V-C, where the causal graph represents patterns as nodes.Notably, each sub-graph maintains a constant number of nodes, predetermined by the total pattern count, yet the scalability of nodes and edges depends on the initial T 1 size.These co-evolving dependencies underscore the existence of inter-series dependencies, with a significant impact from the initial settings at T 1 and the chosen stride value.Table II shows the execution time taken to identify dependencies in our approach, the Granger causality test and PCMCI model [10].We maintained the same data settings on all models.

C. Predicting causal dependencies
In this section, we present how could the temporal graph be exploitable for predicting subsequent dependencies in multiple time series having a co-evolving aspect.For this, we first define the following settings, (1) For each data, we take all the historical temporal graphs except the two latest time intervals as knowledge-based information.The two latest time intervals are assumed to be unknown information.
(2) From the knowledge-based information, we then train our graph autoencoder.
(3) Having the neural network trained.We then predict the subsequent dependencies at the subsequent time intervals.Training phase: We employ a consistent neural network architecture across all our datasets, consisting of three graph neural networks (GNNs) with two hidden layers of TAGCN type.These hidden layers are configured with 18 and 25 neurons, respectively.We incorporate a custom binary activation function, which transforms all values below a predefined threshold of 0.5 to 1 and sets the rest to 0 within the encoder.Prediction phase: Predicting subsequent dependencies can also be likened to link prediction.We compare our approach to three recent deep-based models for temporal graphs: RGNN [13], DCGR [14], and TGCN [15], using the same architecture settings as our proposed approach.II.Future works will expand the dataset to include more system data.

Fig. 1 :
Fig. 1: Example of two-time series from the Cross-rate Forex market.The y − axes displays scaled values, and the x − axes shows time points on the graph.We analyze the impact of one series on another (four plots down), focusing on the time after the black line in the first plot up.Solid lines indicate the ground-truth values, and dashed lines represent the generated values.

Fig. 3 :
Fig. 3: Example of patterns identified using a linear autoregression model with lag 2. Figures illustrate how the identified patterns persist over time. 0

TABLE I :
Summary of Data Used

TABLE II :
Compared Execution time of models.

TABLE III :
Performance of the approach in predicting causal

TABLE IV :
Accuracy of models.
Table IV displays accuracy results, with bold indicating the best values for each dataset.VIII.CONCLUSION In summary, our novel two-tier dependency model surpasses conventional techniques like correlation and autoregression, enhancing interpretability by uncovering dependencies within sub-time series through parameter-based pattern exploration.It consistently outperforms traditional and recent models in detecting temporal dependencies, as demonstrated in Table