Telechargé par balongappolinaire

Similarity Computation for Complex Sequences: Drop-DTW Adaptation Thesis

publicité
Master 2
High Performance Computing and
Simulation
Similarity Computation for Complex
Sequences:
An Adapation of Drop-DTW Dissimilarity
Student
Marc WENG
Supervisors Mostafa BAMHA
Patrick MARCEL
Sophie ROBERT
Soraya ZERTAL
September 5, 2025
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Abstract
Ce travail porte sur la comparaison et le regroupement de séquences de suivi patient, irrégulières
et multivariées, issues d’un registre de patients atteints de SLA (CHU de Tours). Nous proposons une
variante de Drop-DTW adaptée aux données cliniques : le coût local est défini par attribut (seuils,
pondérations), ce qui améliore la robustesse et l’interprétabilité des alignements. Le pipeline de clustering combine cette dissimilarité avec la méthode DTW Barycenter Averaging pour estimer des centroïdes
sous alignement temporel, puis K-means. Nous normalisons les composantes (valeurs et temps) afin de
stabiliser les hyperparamètres et d’éviter qu’une échelle ne domine la distance. L’évaluation s’appuie
sur des critères internes (Silhouette) et sur la différenciation clinique par courbes de Kaplan–Meier et
test du log-rank. Sur le jeu de données étudié, la méthode produit des partitions cohérentes au sens interne et des alignements explicables par attribut. Nous parlerons des limites et des pistes d’amélioration
méthodologique (fenêtres de Sakoe–Chiba, parallélisme wavefront) ainsi que la nécessité d’une validation externe par des experts et sur des cohortes indépendantes.
Mots-clés : séquences temporelles, Drop-DTW, alignement élastique, clustering, DBA, Kaplan–Meier,
log-rank, SLA.
1
Table of Contents
Résumé . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Organization of the report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
6
6
2
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Time Series and Timed Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 DTW and drop-DTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Adaptive DTW for timed sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Drop-DTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Cost definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3.1
Alignment cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3.2
Drop cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 K-means++ initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
7
8
9
10
10
11
12
12
12
3
Contribution: Drop-DTW with Partial Drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Extended Drop-DTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Drop cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Partial drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 DTW Barycenter Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4
Dataset and Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 Clinical Context and Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.1 Clustering quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.2 Clinical differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5
Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1 Effect of hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.1 Selecting the number of clusters k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.2 pt /pe ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.3 Impact of τ in Drop-DTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.4 Impact of c in partial drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1 Alternative clustering and shapelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
8
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
List of Abbreviations
ALS
Amyotrophic Lateral Sclerosis
ALSFRS-R ALS Functional Rating Scale – Revised
CHU
Centre Hospitalier Universitaire
CNN
Convolutional Neural Network
CQA
Code Quality Analyzer
DBA
DTW Barycenter Averaging
DECAN
DECremental ANalysis
DTW
Dynamic Time Warping
FVC
Forced Vital Capacity
HAC
Hierarchical Agglomerative Clustering
iSAX
indexable Symbolic Aggregate approXimation
LCSS
Longest Common Subsequence
LIFO
Laboratoire d’Informatique Fondamentale d’Orléans
LProf
Lightweight Profiler
MAQAO
Modular Assembly Quality Analyzer and Optimizer
PAA
Piecewise Aggregate Approximation
PAM
Partitioning Around Medoids
RF
Random Forest
SAX
Symbolic Aggregate approXimation
SBD
Shape-Based Distance
VProf
Value Profiler
UVSQ - Université Paris-Saclay
1
Similarity of Complex Sequences
Introduction
Timed sequences analysis is used across many fields[7], [20], including medicine (patient stratification,
gene alignment), the social sciences (semantic trajectories), and data science (building and recommending
analysis pipelines).
Patient sequences differ from regularly sampled time series. Visits occur at irregular times (aperiodic),
several measurements are recorded at each visit (multivariate), and some values may be missing or recorded
unevenly across patients. In such conditions, standard methods for clustering, classification, or prediction
often need careful adaptation, where expert validation is essential, so methods should be configurable and
interpretable.
To compare two sequences, we need a suitable dissimilarity[27]. Dynamic Time Warping (DTW)[31] aligns
sequences by allowing local stretching and compression of time. Drop-DTW[11], [13] extends DTW by
skipping outliers along the alignment path so that only the common signal is matched.
For clustering, DTW Barycenter Averaging (DBA)[29] computes centroids under temporal alignment,
allowing the use of K-means[14], [23], [24] on time-warping distances.
In this report, we analyze patient follow-up sequences collected in clinical practice and cluster them using
an unsupervised method. Building on prior work, we propose a modified Drop-DTW for multivariate
clinical data, in which the cost is defined per attribute and can incorporate expert-specified thresholds,
weights, and penalties. The goal is to improve robustness and interpretability for irregular, multivariate
patient trajectories.
We evaluate our method on Amyotrophic Lateral Sclerosis (ALS)[37] patient data from the Centre Hospitalier Universitaire (CHU) of Tours, which includes measurements such as body weight, ALS Functional
Rating Scale – Revised (ALSFRS-R)[6], Forced Vital Capacity (FVC)[10], and survival. We report clustering quality and runtime, and examine survival separation across clusters.
Contributions
• We introduce a configurable similarity based on a modified Drop-DTW that incorporates perattribute clinical thresholds, weights, and penalties for irregular, multivariate patient sequences
that improve interpretability by reporting the contribution of each attribute to the alignment cost
and by providing practical parameter settings agreed with domain experts.
• We pair this similarity with DBA to compute representative centroids under temporal alignment,
which enables K-means to operate with time-warping dissimilarities.
• We conduct a reproducible empirical study on a dataset from the CHU of Tours, comparing our
method against Drop-DTW baselines and analyzing the accuracy–runtime trade-off.
1.1 Related Work
In the literature, a simple way to estimate the dissimilarity between two sequences is to use standard
distances such as the Euclidean distance. This, however, assumes that the sequences have the same
length and are already time-aligned, which is rarely true for patient follow-up data. Other commonly
used dissimilarities include the Manhattan distance, which sums absolute pointwise deviations and shares
the same alignment/equal-length assumptions; the Shape-Based Distance (SBD)[26], which compares z4
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
normalized series via the maximum normalized cross-correlation over circular shifts to capture shape
similarity up to phase; and Longest Common Subsequence (LCSS)[36], which matches points only within
value/time tolerances and treats the rest as gaps, thereby tolerating outliers and missing segments.
A very popular alternative is to allow time deformations, for example with DTW, which aligns sequences
of possibly different lengths, by permitting local stretching and compression of time. The downside is that
the computation is expensive and subject to constraints such as end-point matching. Moreover, DTW
tends to force matches even when some correspondences are outliers.
As noted above, DTW can be costly. One commonly constrains the warping path with a Sakoe–Chiba[31]
or Itakura[15] window to reduce the number of candidate alignments. There are also approximate variants,
such as FastDTW[32], [33], that speed up computation by returning an approximation of the exact value.
These dissimilarities are central to clustering methods such as K-means or Hierarchical Agglomerative
Clustering (HAC)[16], [17], [19]. Yet clustering also requires a notion of an average sequence. An approach
is DBA, which constructs a barycenter sequence by minimizing the sum of squared DTW distances between
the centroid and the series in the cluster.
To avoid matching outliers, variants such as Drop-DTW modify the alignment path by allowing certain
pairings to be skipped (dropped) when their local cost is too high. The algorithm can therefore ignore
outliers anywhere in the sequences while still optimizing the final alignment.
At very large scale, one typically combines dimensionality reduction and indexing: first apply Piecewise
Aggregate Approximation (PAA)[18], which replaces a series with the means of a few equal-length segments, then discretize these values with Symbolic Aggregate approXimation (SAX)[22] to obtain a short
symbolic word. The resulting representation can be indexed with indexable Symbolic Aggregate approXimation (iSAX)[8], [34], a multi-resolution tree[39] that prunes candidates quickly.
An alternative approach accelerates sliding-window correlation[40] by mapping each series to compact,
incrementally updatable sketches. Low-dimensional sub-sketches are indexed on regular grids to propose
a small set of candidate pairs, which are then verified with an exact measure. This strategy scales to very
large collections while preserving high precision, trading a modest loss in recall for substantial gains in
runtime and memory.
Another approach uses shapelets[41], which are short discriminative subsequences whose presence is quantified as the minimum distance to any window of a series, and then used to compare or classify time series.
They capture local motifs rather than full-sequence alignments.
Many studies leverage machine learning at various stages of the pipeline, for example, learned symbolic representations for multivariate time series to improve supervised classification with Random Forest (RF)[2],
or deep neural networks such as 1D Convolutional Neural Network (CNN)s[12] that learn discriminative
representations end-to-end and achieve strong results.
5
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
1.2 Motivation and Goals
The internship was carried out within the PAMDA team at Laboratoire d’Informatique Fondamentale
d’Orléans (LIFO). As part of the team’s ongoing work[35], a PhD project on prognostic stratification in
ALS is currently underway. The subject is to introduce a new statistical approach that groups patients by
their rate of clinical deterioration and shows that biological parameters play an important role in functional
decline. In addition to this line of work, our goal is to investigate an alternative similarity metric to open
an additional research direction and to obtain more robust evidence in this study.
1.3 Organization of the report
The remainder of the report is organized as follows. We start with a review of related work covering timewarping alignment methods, robust variants, indexing, and sequence averaging. Section 2 defines DTW
and presents our modified Drop-DTW with attribute-wise costs in Section 3. Section 4 details datasets,
metrics, and baselines, and Section 5 presents the results on clustering quality, survival separation, and
runtime. Section 6 discusses limitations and future work, and Section 7 presents the conclusion.
2
Background
2.1 Time Series and Timed Sequences
Time series
A univariate time series is a sequence of values x = (x1 , . . . , xN ) observed at regular intervals.
In the multivariate setting, a time series with d variables is represented by a matrix X ∈ RN ×d , where
Xi,j denotes the value of variable j at the ith time point, where all variables are measured at the same
times and thus share the same length N .
Timed sequences
A univariate timed sequence is an ordered list of time-stamped events with a potentially irregular time
step:
X = (t1 , x1 ), . . . , (tN , xN ) ,
where the timestamps satisfy t1 < t2 < · · · < tN .
In the multivariate case, each observation is a vector xi = (x1i , . . . , xdi ) of d attributes measured at time ti :
X = (t1 , ⟨x11 , . . . , xd1 ⟩), . . . , (tN , ⟨x1N , . . . , xdN ⟩) .
A time series is a special case of a timed sequence in which the timestamps are regular.
Example 1. Aperiodic multivariate timed sequence.
Time in days since the first visit; attributes {weight, ALSFRS-R, FVC}.
Time (days)
0
23
103
Weight (kg)
ALSFRS-R
FVC (%)
72.3
71.8
70.6
38
37
35
82
80
78
6
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
The associated sequence is:
X = (0, ⟨72.3, 38, 82⟩), (23, ⟨71.8, 37, 80⟩), (103, ⟨70.6, 35, 78⟩) .
With this notation, both univariate and multivariate aperiodic sequences can be represented. In this
section, observations are assumed to be complete.
2.2 DTW and drop-DTW
Dynamic Time Warping
The DTW[31] distance is a similarity measure between two time series that aligns sequences that may
have different lengths, Fig 1.
Consider X = ⟨x1 , . . . , xN ⟩ ∈ RN and Z = ⟨z1 , . . . , zK ⟩ ∈ RK , and a user-specified alignment cost d(zi , xj ).
Denote by W = ((i1 , j1 ), . . . , (iL , jL )) an admissible alignment path; the DTW is then defined by
X
DTW(Z, X) = min
W ∈W
d(zi , xj ),
(i,j)∈W
where W is the set of paths satisfying:
• End-point matching: (i1 , j1 ) = (1, 1) and (iL , jL ) = (K, N )
• Monotonicity: i and j can’t decrease along W
• Continuity: (∆i, ∆j) ∈ {(1, 0), (0, 1), (1, 1)}, i.e., one moves by at most one cell horizontally and/or
vertically at each step
We introduce the cumulative cost matrix D ∈ RK×N :


D(i − 1, j)
D(i, j) = d(i, j) + min D(i, j − 1)


D(i − 1, j − 1)
The final distance is DTW(Z, X) = DK,N .
Example 2 — Computing the DTW distance
Consider two univariate series
A = ⟨2, 2, 3, 4⟩ and B = ⟨1, 2, 3⟩,
and the cost d(i, j) = (bi − aj )2 . The cumulative cost matrix D is:
1
2
3
2
1
1
2
2
2
1
2
7
3
6
2
1
4
15
6
2
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Figure 1: Example of two-sequence alignment made with the DTW algorithm
Source: Wikipedia
An optimal path is
(1, 1) → (2, 2) → (3, 3) → (3, 4)
DTW(B, A) = D(3, 4) = 2
Sakoe–Chiba window
To reduce the cost, the alignment path can be restricted to a diagonal band[15], [31] of width w:
Ww = {(i, j) | |i − j| ≤ w}
that is, only cells with j ∈ [i − w, i + w] are explored.
2.2.1 Adaptive DTW for timed sequence
To use the timed-sequence notation, we adapt the DTW computation accordingly[13].
We consider two timed sequences:
X = (tx1 , x1 ), . . . , (txN , xN ) ,
Z = (tz1 , z1 ), . . . , (tzK , zK )
of lengths N and K, respectively, where txi , tzj ∈ R are timestamps and xi , zj ∈ Rd are d-dimensional
attributes.
Let M ∈ RK×N be the alignment matrix between Z and X, where
(
1 if zi can be matched to xj
Mi,j =
0 otherwise
8
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
We also define M ∗ , the optimal alignment matrix, which contains the optimal correspondence path between points of Z and X.
The goal is to find the optimal alignment M ∗ that minimizes the global matching cost:
X
Mi,j Ci,j
M ∗ = arg min⟨M, C⟩ = arg min
M ∈M
M ∈M i∈[K], j∈[N ]
where ⟨M, C⟩ is the Frobenius inner product between M and the cost matrix C, and M is the set of
alignment matrices that satisfy the usual constraints: monotonicity, continuity, and end-point matching.
2.2.2 Drop-DTW
Drop-DTW[11], [13] is a variant of DTW in which some constraints have been removed, in particular
the end-point matching constraint. What distinguishes drop-DTW from DTW is the ability to “drop” a
point when the two points are not alignable. To implement this strategy, we split the matrix D into three
matrices.
• D+ ∈ RK+1×N +1 , the matching matrix
• D− ∈ RK+1×N +1 , the drop matrix
• D ∈ RK+1×N +1 , the optimal matrix
The drop cost is represented by vectors dx ∈ RN and dz ∈ RK , where a cost is assigned to each event of a
sequence.
We therefore use a new objective function:
M ∗ = arg min ⟨M, C⟩ + · Pz (M ) · (dzi ) + Px (M ) · (dxj )
M ∈M
where:
• M is the set of all alignment matrices that satisfy only the monotonicity constraint
• Px (M ) ∈ {0, 1}N is a vector such that

K
X

1 if
Mi,j = 0
Px (M )j =
i=1


0 otherwise
• Pz (M ) ∈ {0, 1}K is a vector such that

N

X

1 if
Mi,j = 0
Pz (M )i =
j=1


0 otherwise
9
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Figure 2: Timed sequences (left) and the cost matrix for the alignment between s1 and s2 according to
the drop-DTW (right). Source: Clustering of timed sequences—Application to the analysis of care pathways[13]
Example 3 — Drop-DTW
Fig 2 shows three timed sequences (left) and the Drop-DTW cost matrix for aligning s1 and s2 (right).
• Alignment: gray bars indicate the selected pairings between events.
• Admissible matches: colored cells in the matrix denote valid matches.
• Dropped events: red crosses mark events P and S that are dropped.
• Temporal constraint: a cell (i, j) is valid only if |tzi −txj | ≤ τ ; otherwise the cell is dark (forbidden).
• Symbols: S = surgery, C = consultation, P = physiotherapy, R = radiotherapy.
2.2.3 Cost definitions
2.2.3.1 Alignment cost
As noted above, the alignment cost matrix C ∈ RK×N is defined by a local distance d(zi , xj ) between
events. Several choices are possible. There are several ways to define it, for example, using a symmetric
function, such as:
s
Ci,j
= 1 − cos(zi , xj )
(1)
pe · ∥zi − xj ∥22 + pt · (tzi − txj )2
(2)
or a mixed attribute–time Euclidean cost
s
Ci,j
=
q
Here, zi and xj are the attribute vectors of the events from Z and X, tzi and txj are their timestamps, and
pt , pe > 0 weight the time and attribute terms. Unlike Equation 1, where the timestamp term is optional,
Equation 2 explicitly incorporates the timestamp difference.
10
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
2.2.3.2 Drop cost
Global drop cost
Similarly, for the drop cost, one approach is to choose a global cost[11], [13] for all events in X and Z with
dx = dz = δ, for example by taking a given percentile p of the cost matrix C:
δ = percentile { Ci,j | 1 ≤ i ≤ K, 1 ≤ j ≤ N }, p
Let C ∈ RK×N and v = ⟨v1 , · · · , vm ⟩ be the entries of C, which is flattened and sorted, with m = K × N .
For a percentile p ∈ [0, 100], we set
q=
p
,
100
h = 1 + (m − 1)q,
k = ⌊h⌋,
γ = h − k.
Then the p-th percentile is



v1
percentile(C, p) = vm


(1 − γ) v + γ v
k
if p = 0
if p = 100
k+1
otherwise
Example
Let
1 3 2
,
C=
8 4 6
v = (1, 2, 3, 4, 6, 8), m = 6
For p = 50.0:
q = 0.5, h = 1 + (6 − 1) · 0.5 = 3.5, k = 3, γ = 0.5
percentile(C, 50) = 0.5 v3 + 0.5 v4 = 0.5 · 3 + 0.5 · 4 = 3.5
Learnable drop costs
An alternative to a fixed drop penalty is to learn the drop costs. In particular, per-element drop vectors
are produced from the opposite sequence via a feed-forward network:
dx = X fωx (Z̄),
dz = Z fωz (X̄)
where Z̄ and X̄ are sequence means, and fωx , fωz are learnable functions parametrized by ω. To make the
method differentiable, the hard min in the dynamic programming is replace by a smooth minimum,
smoothMin(x; γ) = x · softmax(−x/γ)
so that the training loss is the (smoothed) optimal alignment cost:
LDTW (Z, X) = Drop-DTW(Z, X) = DK,N (Z, X)
11
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
2.3 Clustering
To use the K-means[14], [23], [24] algorithm, we need a way to compute an average of several sequences.
An existing well-established algorithm called DBA[29] computes a DTW barycenter by iteratively aligning
each timed sequence to a current prototype via DTW and then averaging matched points to update the
prototype until convergence.
2.3.1 K-means
K-means partitions n points x1 , . . . , xn ∈ Rd into k clusters by choosing centroids µ1 , . . . , µk that minimize
the intra-cluster sum of squares. The standard algorithm alternates two steps until convergence:
• Step 1: assignment each point to its nearest centroid
• Step 2: update each centroid to the mean of the points currently assigned to it
The procedure is fast but can get stuck in local minima and is sensitive to the initial centroids. Here, we
cluster time-stamped sequences with a k-means scheme adapted to an elastic dissimilarity Drop-DTW.
2.3.2 K-means++ initialization
K-means++[1] is an initialization scheme that improves the starting centroids for K-means. It selects the
first centroid uniformly at random, then chooses each subsequent centroid with probability proportional to
the squared distance D(x)2 to the nearest already chosen centroid. This spreads seeds across the dataset,
typically yields better final clusters.
3
Contribution: Drop-DTW with Partial Drop
3.1 Extended Drop-DTW
Given the different ways to assign costs to matches and drops, the original authors opted for a symmetric
local cost and for learning the drop penalty. On their datasets, they reported over 70% accuracy[11].
However, learning the drop penalty does not provide a clear link between the method and domain experts.
This creates an interpretability gap: it may be difficult to understand why one sequence receives a strong
alignment score compared to another.
For this reason, we introduce an additional type of drop in which the costs explicitly allow experts to
adjust the impact of each attribute type. The choice among alternatives still follows a minimum-cost rule;
our contribution is to shape the local costs so that, under expert-specified conditions, when events satisfy
the criteria, one option is favored over the other.
3.1.1 Drop cost
For the drop value, we set a fixed cost for all events of X, length N , and Z, length K:
δjx = δiz = δ,
i ∈ {1, . . . , K}, j ∈ {1, . . . , N }
We initialize its value from the cost matrix C:
δ = percentile { Ci,j | 1 ≤ i ≤ K, 1 ≤ j ≤ N }, p
12
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
3.1.2 Partial drop
The way we defined the costs dx and dz is to incorporate an expert-defined condition: depending on the
difference between the two values of the same attribute for two events from different sequences, we add
either a penalty or a reward. To this end, we define the partial event-drop matrix S.
To allow different event types to have a validity threshold, we add a condition c for each type to impose
a penalty, with:
∥xkj − zik ∥ > ck
To distinguish C and S, the weight vector w used for the distance between two events is redefined. Unlike
in C, where the weights remain equal to 1, the matrix S uses weights computed as follows:
wk = exp ∥xkj − zik ∥ − ck
With this choice, wk < 1 when the difference is within the tolerance, ∥xkj − zik ∥ ≤ ck , which makes the
partial-drop cost smaller (Si,j < Ci,j ) and thus favors a partial drop, on the other hand, wk > 1 when
∥xkj − zik ∥ > ck , discouraging the partial-drop branch.
We then use the following distance:
v
u
n
X
u
2
d(zi , xj ) = t pe ·
wk · (xkj − zik ) + pt · (tzi − txj )2
k=0
In Algorithm 1, dynamic programming[3], [4], [5] computes the minimum cost alignment between the
sequences. Following the original paper[11], we further simplify by allowing drops only on events of X, a
more detailed algorithm4 is provided in the appendix.
By default, the Algorithm 2 reconstructs the best path starting from DK+1,N +1 . With the temporal
constraint, however, this entry may not be computed and thus corresponds to an impossible alignment.
In that case, we take the first finite distance found on the last row or last column, giving priority to the
one that traverses the smallest proportion of the sequence length, regardless of the cell’s value.
3.1.3 DTW Barycenter Averaging
We adapt Algorithm 3 to our notation, while restricting drops to a single sequence, the full version is
given in Appendix 5. In short, at each alignment step we append the attribute values and the timestamp
to the appropriate lists according to the label M . We distinguish three cases:
• Match: add to H the attribute values from X and Z, together with the timestamps of both events
in τ
• Drop: depending on which side is removed, append only the event from X or only the one from Z,
it is also possible to append nothing
• Partial drop: as in a match, aggregate the two events, but weight their contributions to remain
consistent with the alignment computation
At the end of this process, we obtain the centroids of the clusters for the K-means step.
13
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Algorithm 1: Sequence alignment with partial Drop-DTW
Input: ta and tb , the timestamp vectors of the timed sequences
C ∈ RK×N , alignment cost matrix, S ∈ RK×N , partial-alignment cost matrix
δ, drop cost, τ , temporal constraint
Output: dist, the final distance, M , the optimal alignment matrix, and D, the cumulative cost
matrix
+
+
(1) D1,1 ← 0, Di,1 ← ∞, D1,j ← ∞; i ∈ [2, K + 1], j ∈ [2, N + 1];
Pi
−
−
−
(2) D1,1 ← 0, Di,1 ← ∞, D1,j ← k=1 δ; i ∈ [2, K + 1], j ∈ [2, N + 1];
p
p
p
(3) D1,1 ← 0, Di,1 ← ∞, D1,j ← ∞; i ∈ [2, K + 1], j ∈ [2, N + 1];
−
−
(4) D1,1 ← 0, Di,1 ← Di,1 , D1,j ← D1,j ; i ∈ [2, K + 1], j ∈ [2, N + 1];
(5) for i ← 2 to K + 1 do
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
for j ← 2 to N + 1 do
if ∥ta,i − tb,j ∥ > τ then
p
+
−
Di,j
← ∞, Di,j
← ∞, Di,j
← ∞, Di,j ← ∞;
else
+
+
Di,j
← Ci,j + min Di−1,j−1 , Di,j−1 , Di−1,j
;
−
Di,j ← δ + Di,j−1 ;
p
p
Di,j
← Si,j + min Di−1,j−1 , Di,j−1 , Di−1,j
;
p
+
−
Di,j ← min{Di,j , Di,j , Di,j };
Mi,j is assigned a value based on Di,j
(15) dist, M ∗ ← traceback(D, M );
(16) return dist, M ∗ , D
Figure 3: A set of time series to be averaged (left), and the averaged sequence after DBA (right)
Source: A global averaging method for dynamic time warping, with applications to clustering[29]
14
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Algorithm 2: Traceback for Drop-DTW
Données: Cumulative cost matrix D, alignment matrix M
Résultat: dist, the final distance; optimal alignment matrix M ∗
(1) Initialize M ∗ to zeros
(2) if D[N, M ] is finite then
(3)
i ← N; j ← M;
(4) else
(5)
(i, j) ← last finite entry on the border
(6)
if none then
(7)
return ∞, M ∗
(8) dist ← D[i, j]
(9) while i > 0 or j > 0 do
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
∗ ←M
Mi,j
i,j
diag ← D[i − 1][j − 1]
lef t ← D[i][j − 1]
top ← D[i − 1][j]
if lef t < diag then
j ←j−1
else
if top < diag then
i←i−1
else
i ← i − 1; j ← j − 1
(21) return dist, M ∗
15
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Algorithm 3: Construction of an average timed sequence with partial DBA
Input: S = {s1 , s2 , . . . , sm }: set of timed sequences;
maxit ∈ N: maximum number of iterations;
Output: C (k) : average timed sequence
(1) C (0) ← random(S);
(2) k ← 0;
(3) while k < maxit do
(4)
E ← [∅, . . . , ∅ ];
(5)
T ← [∅, . . . , ∅ ];
(6)
for i ← 1 to m do
(7)
p ← lenght of C( k), q ← lenght of si ;
(8)
init D ∈ Rp×q , the cumulative cost matrix;
(9)
(dist, M, D) ← DropDTW C (k) , si ;
// from last finite value on the border
(10)
foreach (r, s) ∈ M do
(11)
state ← Mr,s ;
(12)
if Match then
(13)
append C (k) [r].d to Er ; append si [s].d to Er ;
(14)
append C (k) [r].t and si [s].t to Tr ;
(15)
else if Drop si then
(16)
append C (k) [r].d to Er ; append C (k) [r].t to Tr ;
(17)
else if Partial Drop then
(18)
foreach attribute k do
(19)
wk ← exp |C (k) [r].dk − si [s].dk | − ck ;
append C (k) [r].d to Er with weight vector w;
append si [s].d to Er with weight vector w;
append C (k) [r].t and si [s].t to Tr ;
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
move to next cell with the minimum value between {diag, top, lef t} in D;
P
τ ← τj ∀ j ∈ |C (k−1) |, τj = |T1j | t∈Tj t, |Tj | ̸= 0 ;
P
H ← hj ∀ j ∈ |C (k−1) |, hj = |E1j | d∈Ej d, |Ej | ̸= 0 ;
k ← k + 1;
C (k) ← (H, τ );
(28) return C (k) ;
16
UVSQ - Université Paris-Saclay
4
Similarity of Complex Sequences
Dataset and Settings
4.1 Clinical Context and Dataset
Amyotrophic Lateral Sclerosis (ALS)[37] is a progressive neurodegenerative disease that primarily affects
motor neurons, leading to muscle weakness, loss of function, and, ultimately, respiratory failure.
We use a real-world hospital dataset from CHU of Tours comprising 353 patients with ALS and 43,029
time-stamped measurements collected between 2004 and 2023. Each patient has multiple visits recorded
as time-stamped events; for alignment, we normalize each patient’s timeline so that the first visit defines
time zero.
We consider a subset of clinically relevant features, including the ALSFRS-R[6], FVC[10], and other measures such as WEIGHT/BMI. The ALSFRS-R is a 12-item functional scale covering four domains—bulbar,
fine motor, gross motor, and respiratory. Each item is scored from 0 to 4, yielding a total score from 0 to
48, with higher scores indicating better function.
4.2 Preprocessing
Z-normalization
We normalize all components, so they lie on comparable scales, preventing the distance from being dominated by raw units or follow-up length and keeping hyperparameters such as pt /pe and τ stable across
patients.
While z-normalization is a strong default, the choice of scaling can affect similarity. A large empirical
comparison[21] reported that maximum-absolute scaling is often competitive for Euclidean-distance methods, whereas mean normalization can be reasonable for deep models. So we applied z-normalization to
remove offset and scale, so that a set of values has mean 0 and variance 1.
For a finite set V = {v1 , . . . , vn }, compute
µ=
n
1X
n
i=1
vi ,
v
u n
u1 X
σ=t
(vi − µ)2 ,
n
i=1
then rescale each element as
ṽi =
vi − µ
.
σ
In our data, we apply this transformation per feature: for each feature k, compute µk and σk over its
observed values and replace each value by (v − µk )/σk . If σk = 0 (constant feature), set all normalized
values to 0 for that feature.
For a multivariate, timed sequence X, the normalization is
x̃i,d =
xi,d − µd
σd
(d = 1, . . . , D),
t̃i =
X̃ = (t̃1 , x̃1 ), . . . , (t̃N , x̃N ) .
17
ti − µ t
,
σt
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
4.3 Evaluation metrics
In the absence of truth labels, we rely on internal clustering quality and clinical differentiation via KaplanMeier survival curves with a global log-rank p-value.
4.3.1 Clustering quality
We report the Silhouette[30] computed from the Drop-DTW dissimilarity. For each series i, let a(i) be
the average dissimilarity to the other members of its cluster and b(i) the minimum average dissimilarity
to any other cluster. The pointwise score is
s(i) =
b(i) − a(i)
,
max{a(i), b(i)}
the cluster score is the mean of s(i) within the cluster, and the overall Silhouette is the mean over all
series.
a(i) captures intra-cluster cohesion, lower means a denser cluster, while b(i) captures between-cluster
separation from the nearest other cluster, higher means better separation. The Silhouette s(i) contrasts
these:
• Values close to 1 indicate compact, well-separated clusters
• Values near 0 indicate overlapping boundaries
• Negative values suggest misclustered points
4.3.2 Clinical differentiation
The data may include right-censoring, meaning that for some individuals the event is not observed within
the follow-up period (e.g., due to study end or loss to follow-up). We record a time T and an indicator
x ∈ {0, 1}, where x = 1 for an observed event and x = 0 if censored. Censored individuals contribute to
the risk set up to their censoring time but do not count as events thereafter.
We estimate a survival function per cluster using the Kaplan–Meier estimator. At distinct event times ti ,
with dgi events and ngi individuals at risk just before ti in cluster g, the estimator is
Ŝg (t) =
Y ngi − dgi
ti <t
ngi
.
When there is no censoring, ngi is simply the number alive just before ti . If censoring occurs at ti , ngi
excludes those censored at ti . Additionally, to test whether survival differs across clusters, we use the
log-rank test, which compares observed, against expected events under equal hazards across groups and
yields a χ2 statistic with K − 1 degrees of freedom and a p-value[9], [25], [28], [38].
Additionally, to test whether survival differs across clusters, we use the log-rank test, which contrasts the
observed number of events with that expected under the null of equal hazards across groups and yields
a χ2 statistic with K − 1 degrees of freedom and a p-value.[9], [25], [28], [38] If the p-value exceeds a
pre-specified significance level (typically α = 0.05), we do not reject the null hypothesis—i.e., there is no
statistically significant difference between the survival curves; conversely, p < α indicates evidence that
the curves differ.
18
UVSQ - Université Paris-Saclay
5
Similarity of Complex Sequences
Experiments and Results
5.1 Effect of hyperparameters
Hyperparameters are typically chosen with domain expertise. In this study, however, we did not have
access to the additional clinical guidance needed to fine-tune them for ALS. We selected the configuration
via internal validation, choosing the setting that maximized our evaluation metric while respecting clinical
constraints.
Concretely, the best-performing configuration used:
Selected hyperparameters
• Temporal constraint: τ = 1024.000.
• Trade-off weights: pt = 511.979, pe = 1.054.
• Partial-drop thresholds (per attribute):
– ALSFRS-R: cALSFRS-R = 14.600
– BMI: cBMI = 21.433
– FVC: cFVC = 53.793
– WEIGHT: cWEIGHT = 65.589
• Drop penalty percentile: p = 20%.
• Drop budget: δ = 10%.
Evaluation summary
• Silhouette: 0.455 — moderate cohesion/separation.
• Log-rank p-value: 0.182 — no statistically significant survival difference across clusters.
5.1.1 Selecting the number of clusters k
We varied k ∈ 2, 3, 4, 5 and evaluated each solution using our quality metrics. Points denote the mean
across runs; error bars show the standard deviation.
Silhouette decreases as k increases (Fig. 4b), favoring more compact partitions at k = 2. The p-score
shows no consistent improvement beyond k = 2 and higher variability for k ≥ 3 (Fig. 4a). Balancing
internal quality, stability, we select k = 2 for the main analysis.
5.1.2 pt /pe ratio
The ratio ρ = pt /pe [13] sets the trade-off between temporal misalignment pt and event mismatch pe . A
small ratio makes time shifts cheap, so the path can slide in time to keep the same events matched. A large
ratio makes time shifts expensive, so the path keeps timestamps close even if that means mismatching
events.
We varied pt (with pe fixed) and pe (with pt fixed) and evaluated each setting. Points show the mean
across runs, error bars are standard deviations.
19
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
(a) p-score vs. k (lower is better)
(b) Silhouette vs. k (larger is better)
Figure 4: Comparison of survival separation (p-score) and internal quality (Silhouette) w.r.t k
(a) p-score vs. pt
(b) Silhouette vs. pt
Figure 5: Effect of the temporal weight pt (with pe fixed)
(a) p-score vs. pe
(b) Silhouette vs. pe
Figure 6: Effect of the event weight pe (with pt fixed)
20
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Across wide ranges of pt (Fig. 5) and pe (Fig. 6), Silhouette varies moderately, whereas the p-score shows
higher variance at the extremes.
5.1.3 Impact of τ in Drop-DTW
In Drop-DTW, τ limits admissible matches to pairs whose timestamps satisfy |tzi − txj | ≤ τ . Too small a
τ over-constrains the path, too large a τ makes warping nearly unconstrained.
(a) p-score vs. τ
(b) Silhouette vs. τ
Figure 7: Effect of the temporal constraint τ on survival separation and internal clustering quality. Points
denote means across runs, error bars show standard deviations
Across the range we explored (Fig. 7), both metrics improve as τ increases from very small values, where
alignments are overly constrained, then stabilize; variability is highest near the boundaries. We therefore
select a τ that balances flexibility with constraint, as reported in the hyperparameter summary, rather
than pushing to the largest feasible window.
Silhouette
Ratio C1 and C2
0.7087
0.0322
0.7287
0.0352
0.7707
0.0382
0.7460
0.0398
0.7460
0.0413
0.6637
0.1210
Table 1: Silhouette mean w.r.t. ratio between cluster C1 and C2
Table 1 reports the cluster-size ratio between the smallest cluster and the largest. Despite relatively high
mean Silhouette values, the very small ratios r ∈ [0.032, 0.121] reveal severe imbalance; the smallest cluster
contains only 3–12% as many patients as the largest. Such imbalance can inflate the global Silhouette
(e.g., a tiny compact cluster far from a large diffuse one) and is undesirable for clinical interpretation.
5.1.4 Impact of c in partial drop
As noted above, the partial-drop mechanism is controlled by a threshold c, which in turn determines the
weight w. In the absence of precise clinical guidance, we selected c by validation, choosing the values that
maximized our internal evaluation metrics. In our implementation of S, increasing c relaxes the condition,
so partial drops occur more frequently; conversely, smaller c tightens the condition and makes partial
drops rarer.
Figure ?? shows that about two thirds of steps keep some form of match (align 37.1% + drop_part 29.5%),
while hard drops account for the remaining third (drop_x 16.6%, drop_z 14.2%, and very rare drop_all
21
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Figure 8: Alignment types — percentage.
Types:
align
drop_x
drop_z
drop_all
drop_part
Counts:
2410
1082
926
167
1918
Table 2: Frequency of alignment types, n = 6503
2.6%). This is consistent with our goal: the algorithm mostly aligns events, resorting to partial drops to
attenuate attribute-level noise, and using full drops only when a pairing is clearly inconsistent.
6
Discussion
6.1 Alternative clustering and shapelets
While K-means with DBA provides a simple and effective baseline under elastic distances, other clustering
methods may better accommodate irregular, multivariate clinical trajectories. A first option is k-medoids
(Partitioning Around Medoids (PAM)), which optimizes representative observed sequences instead of
means; this avoids barycenter artifacts and only requires a pairwise distance matrix. HAC with different
types of linkage[16], [17], [19] is another natural choice when a dendrogram and multi-scale views are
desired; cutting the tree at different heights yields clusterings at varying granularities.
6.2 Limitations and future work
A weighted DBA replaces the simple mean of matched points with a weighted mean, allowing noisy
matches to be down-weighted and batch-wise partial averages to be merged, which stabilizes the centroid
and speeds computation without changing the final result.
Hyperparameters (e.g., pt , pe and partial-drop threshold c) were chosen by internal validation given limited
clinical guidance. A more principled approach is to elicit expert ranges per attribute, regularize toward
these priors, and report sensitivity of cluster assignments, Silhouette, and survival separation. The partial22
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
drop matrix S already exposes per-attribute contributions; the weighting function could be further refined.
Clustering quality was assessed with Silhouette, Kaplan–Meier, and log-rank separation. Future evaluations should include external validation with clinical expert review and testing on independent ALS
cohorts.
Further Drop-DTW speedups include constraining the warping path with a Sakoe–Chiba band and using
wavefront parallelism in the DP to accelerate pairwise distances.
7
Conclusion
This report introduced a configurable Drop-DTW variant for irregular, multivariate patient follow-up
sequences. By using attribute-wise costs with expert-specified thresholds, weights, and partial drops, the
similarity aims to preserve clinically meaningful alignments while down-weighting mismatches. Combined
with DBA for centroid estimation and K-means for clustering, the approach yields an interpretable pipeline
for clinical sequence clustering.
While hyperparameters were selected by internal validation due to limited prior guidance, the attributewise formulation exposes per-feature contributions to the alignment cost, improving transparency and
facilitating expert review.
Methodologically, constraining the warping path (e.g., with a Sakoe–Chiba band) and adopting wavefront
parallelism could improve scalability. Clinically, we plan to establish an expert-informed priority over
attribute types and to define plausible per-attribute ranges; we will use these as priors to regularize the
model—weighting attributes by priority and shrinking estimates toward the specified ranges—and we will
run sensitivity analyses to assess robustness. Finally, external validation—via expert review and testing
on independent ALS cohorts—will be essential to confirm cluster stability and clinical relevance.
23
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
8
References
[1]
D. Arthur and S. Vassilvitskii, “K-means++: The advantages of careful seeding,” in Proceedings of
the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007.
[2]
M. G. Baydogan and G. Runger, “Learning a symbolic representation for multivariate time series
classification,” Data Mining and Knowledge Discovery, 2015.
[3]
R. Bellman, “On the theory of dynamic programming,” Proceedings of the National Academy of
Sciences, 1952.
[4]
R. Bellman, “The theory of dynamic programming,” Bulletin of the American Mathematical Society,
1954.
[5]
R. Bellman, Dynamic Programming. 1957.
[6]
J. M. Cedarbaum and N. Stambler, “The alsfrs-r: A revised als functional rating scale that incorporates assessments of respiratory function,” Journal of the Neurological Sciences, 1999.
[7]
G. Chatzigeorgakidis, K. Patroumpas, D. Skoutas, S. Athanasiou, and S. Skiadopoulos, “Visual
exploration of geolocated time series with hybrid indexing,” Big Data Research, 2019.
[8]
M. P. da Conceição Monteiro Anacleto, “MSAX: Multivariate symbolic aggregate approximation for
time series classification,” M.S. thesis, Universidade de Lisboa, 2019.
[9]
D. R. Cox, “Regression models and life-tables (with discussion),” Journal of the Royal Statistical
Society: Series B, 1972.
[10]
A. Czaplinski, A. A. Yen, and S. H. Appel, “Forced vital capacity (fvc) as an indicator of survival and
disease progression in an als clinic population,” Journal of Neurology, Neurosurgery & Psychiatry,
2006.
[11]
N. Dvornik, I. Radosavovic, K. G. Derpanis, A. Garg, and A. D. Jepson, “Drop-DTW: Aligning
common signal between sequences while dropping outliers,” in Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021.
[12]
H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Deep learning for time series
classification: A review,” Data Mining and Knowledge Discovery, 2019.
[13]
T. Guyet, G. Pinson, and A. Gesny, “Clustering of timed sequences—application to the analysis of
care pathways,” Data & Knowledge Engineering, 2025.
[14]
J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means clustering algorithm,” Journal of
the Royal Statistical Society. Series C (Applied Statistics), 1979.
[15]
F. Itakura, “Minimum prediction residual principle applied to speech recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, 1975.
24
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
[16]
S. C. Johnson, “Hierarchical clustering schemes,” Psychometrika, 1967.
[17]
J. H. W. Jr., “Hierarchical grouping to optimize an objective function,” Journal of the American
Statistical Association, 1963.
[18]
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality reduction for fast similarity
search in large time series databases,” Knowledge and Information Systems, 2001.
[19]
G. N. Lance and W. T. Williams, “A general theory of classificatory sorting strategies: I. hierarchical
systems,” The Computer Journal, 1967.
[20]
S. Lhermitte, J. Verbesselt, W. W. Verstraeten, and P. Coppin, “A comparison of time series similarity measures for classification and change detection of ecosystem dynamics,” Remote Sensing of
Environment, 2011.
[21]
F. T. Lima and V. M. A. Souza, “A large comparison of normalization methods on time series,” Big
Data Research, 2023.
[22]
J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A symbolic representation of time series, with implications
for streaming algorithms,” in Proceedings of the 8th ACM SIGMOD Workshop on Research Issues
in Data Mining and Knowledge Discovery (DMKD), 2003.
[23]
S. P. Lloyd, “Least squares quantization in pcm,” IEEE Transactions on Information Theory, 1982.
[24]
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967.
[25]
N. Mantel, “Evaluation of survival data and two new rank order statistics arising in its consideration,”
Cancer Chemotherapy Reports, 1966.
[26]
J. Paparrizos and L. Gravano, “K-shape: Efficient and accurate clustering of time series,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015.
[27]
J. Paparrizos, H. Li, F. Yang, K. Wu, J. E. D’Hondt, and O. Papapetrou, “A survey on time-series
distance measures,” arXiv, 2024.
[28]
A. Perry et al., “Unsupervised cluster analysis of patients with recovered left ventricular ejection
fraction identifies unique clinical phenotypes,” PLOS ONE, 2021.
[29]
F. Petitjean, A. Ketterlin, and P. Gançarski, “A global averaging method for dynamic time warping,
with applications to clustering,” Pattern Recognition, 2011.
[30]
P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,”
Journal of Computational and Applied Mathematics, 1987.
[31]
H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,”
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1978.
25
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
[32]
S. Salvador and P. Chan, “Fastdtw: Toward accurate dynamic time warping in linear time and space,”
in Proceedings of the ACM SIGKDD Workshop on Mining Temporal and Sequential Data, 2004.
[33]
S. Salvador and P. Chan, “Toward accurate dynamic time warping in linear time and space,” Intelligent Data Analysis, 2007.
[34]
J. Shieh and E. Keogh, “iSAX: Indexing and mining terabyte sized time series,” in Proceedings of
the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.
[35]
G. Tejedor, N. Labroche, P. Marcel, V. Peralta, H. Blasco, and H. Alarcan, “Stratification pour le
pronostic de patients atteints de la sclérose latérale amyotrophique,” 2024.
[36]
M. Vlachos, G. Kollios, and D. Gunopulos, “Discovering similar multidimensional trajectories,” in
Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2002.
[37]
L. C. Wijesekera and P. N. Leigh, “Amyotrophic lateral sclerosis,” Orphanet Journal of Rare Diseases,
2009.
[38]
J. Wu et al., “Unsupervised clustering of quantitative image phenotypes reveals breast cancer subtypes with distinct prognoses and molecular pathways,” Clinical Cancer Research, 2017.
[39]
D. E. Yagoubi, R. Akbarinia, F. Masseglia, and T. Palpanas, “DPiSAX: Massively distributed partitioned iSAX,” in 2017 IEEE International Conference on Data Mining (ICDM), 2017.
[40]
D. E. Yagoubi, R. Akbarinia, F. Masseglia, T. Palpanas, and R. Cole, “Parcorr: Identifying similar
time series pairs across sliding windows,” Data Mining and Knowledge Discovery, 2018.
[41]
M. Zakaria, A. Mueen, and E. Keogh, “Accelerating the discovery of time series shapelets,” Data
Mining and Knowledge Discovery, 2015.
26
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Appendices
Algorithm 4: Subsequence Alignment with Drop-DTW
Input: ta and tb , the timestamp vectors of the timed sequences
C ∈ RK×N , alignment cost matrix, S ∈ RK×N , partial-drop cost matrix
δ, drop cost and τ , temporal constraint
Output: DK,N , the distance, and M , the optimal alignment matrix
zx ← 0, D zx ← ∞, D zx ← ∞; i ∈ [2, K + 1], j ∈ [2, N + 1];
(1) D1,1
i,1
1,j
Pi
z−
z−
z−
(2) D1,0 ← 0, Di,1 ← ∞, D1,j ← k=1 δ; i ∈ [2, K + 1], j ∈ [2, N + 1];
Pj
−x
−x
−x
(3) D1,1 ← 0, Di,1 ← k=1 δ, D1,j ← ∞; i ∈ [2, K + 1], j ∈ [2, N + 1];
−−
−−
−x
−−
z−
(4) D1,1 ← 0, Di,1 ← Di,1 , D1,j ← D1,j ; i ∈ [2, K + 1], j ∈ [2, N + 1];
−x
z−
(5) D1,1 ← 0, Di,1 ← Di,1 , D1,j ← D1,j ; i ∈ [2, K + 1], j ∈ [2, N + 1];
p
p
p
−x
z−
(6) D1,1 ← 0, Di,1 ← Di,1 , D1,j ← D1,j ; i ∈ [2, K + 1], j ∈ [2, N + 1];
(7) for i ← 2 to K + 1 do
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
for j ← 2 to N + 1 do
if ∥ta,i − tb,j ∥ > τ then
zx , D z− , D −x ← ∞;
Di,j
i,j
i,j
p
−−
Di,j
, Di,j
, Di,j ← ∞;
else
p
z−
−x
−−
zx
diag ← {Di−1,j−1
, Di−1,j−1
, Di−1,j−1
, Di−1,j−1
, Di−1,j−1
};
p
−x
zx
left z ← {Di,j−1 , Di,j−1 , Di,j−1 };
z−
zx , D p
top x ← {Di−1,j
i−1,j , Di−1,j };
−x
−−
left −z ← {Di,j−1
, Di,j−1
};
z−
−−
top −x ← {Di−1,j , Di−1,j };
zx ← C
Di,j
i,j + min diag ∪ left z ∪ top x ;
p
Di,j
← Si,j + min diag ∪ left z ∪ top x ;
z−
Di,j
← δ + min(left z );
−x
Di,j ← δ + min(top x );
−−
Di,j
← min top −x ⊕ δ ∪ left −z ⊕ δ ;
p
zx , D z− , D −x , D −− };
Di,j ← min{Di,j
, Di,j
i,j
i,j
i,j
(24) dist, M ← traceback(D);
(25) return dist, M
27
UVSQ - Université Paris-Saclay
Similarity of Complex Sequences
Algorithm 5: Construction of an average timed sequence with DBA
Input: S = {s1 , s2 , . . . , sm }: set of timed sequences;
maxit ∈ N: maximum number of iterations;
Output: C (k) : average timed sequence
(1) C (0) ← random(S);
(2) k ← 0;
(3) while k < maxit do
(4)
E ← [∅, . . . , ∅ ];
(5)
T ← [∅, . . . , ∅ ];
(6)
for i ← 1 to m do
(7)
p ← lenght of C( k);
(8)
q ← lenght of si ;
(9)
init D ∈ Rp×q , the cumulative cost matrix;
(10)
(dist, M, D) ← DropDTW C (k) , si ;
// from last finite value on the border
(11)
foreach (r, s) ∈ M do
(12)
state ← Mr,s ;
(13)
if Match then
(14)
append C (k) [r].d to Er ; append si [s].d to Er ;
(15)
append C (k) [r].t and si [s].t to Tr ;
(16)
else if Drop si then
(17)
append C (k) [r].d to Er ; append C (k) [r].t to Tr ;
(18)
else if Drop C (k) then
(19)
append si [s].d to Er ; append si [s].t to Tr ;
(20)
else if Drop C (k) and si then
(21)
continue;
(22)
else if Partial Drop then
(23)
foreach attribute k do
(24)
wk ← exp |C (k) [r].dk − si [s].dk | − ck ;
append C (k) [r].d to Er with weight vector w;
append si [s].d to Er with weight vector w;
append C (k) [r].t and si [s].t to Tr ;
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
move to next cell with the minimum value between {diag, top, lef t} in D;
P
τ ← τj ∀ j ∈ |C (k−1) |, τj = |T1j | t∈Tj t, |Tj | ̸= 0 ;
P
H ← hj ∀ j ∈ |C (k−1) |, hj = |E1j | d∈Ej d, |Ej | ̸= 0 ;
k ← k + 1;
C (k) ← (H, τ );
(33) return C (k) ;
28
Téléchargement