Quantitative gait analysis and prediction using artificial intelligence for patients with gait disorders | Scientific Reports – Nature.com

Data acquisition

This study was carried out in accordance with the tenets of the Declaration of Helsinki and with the approval of the Brest, France hospitals (CHRUs) Ethics Committee. Patients had also signed an informed consent. Our work was conducted between 2021 and 2022. Data collected between June 2006 and June 2021 from 734 patients (115 adults and 619 children) who had undergone clinical 3D gait analysis were used. Their identities were preserved by respecting medical secret and protecting patient confidentiality. All data were recorded using the same motion analysis system (Vicon MX, Oxford Metrics, UK) and four force platforms (Advanced Mechanical Technology, Inc., Watertown, MA, USA) in the same motion laboratory (CHU Brest) between 2006 and 2022. The data collected by the 15 infrared cameras (sampling rate of 100 or 120Hz) were synchronized with the ground reaction forces recorded by the force platforms (1000Hz or 1200Hz). The 16 markers were placed according to the protocol by Kadaba et al.11. Marker trajectories and ground reaction forces were dual-pass filtered with a low-pass Butterworth filter at a cut-off frequency of 6 Hz. After an initial calibration in the standing position, all patients were asked to walk at a self-selected speed along a 10m walkway.

Gait kinematics were processed using the Vicon Plug-in Gait model. Kinematics were time-normalized to stride duration, from 0 to 100% from initial contact (IC) to the next IC of the ipsilateral foot. Nine gait joint angles (kinematic gait variables) were used: anteversion/retroversion of the pelvis, rotation of the pelvis, pelvic tilt, flexion/extension of the hip, abduction/adduction of the hip, internal/external rotation of the hip, flexion/extension of the knee, plantar/dorsiflexion of the ankle, and the foots angle of progression. As a result, a gait cycle yielded 101 (times) 9 measurements. Let (E_{p,d}) denote the gait session of patient p at datetime d. It can be written as follows:

$$begin{aligned} E_{p,d} = left{ {C_{ E_{p,d}}}^{1}, {C_{ E_{p,d}}}^{2}, ldots , {C_{ E_{p,d}}}^{K} right} end{aligned}$$

(1)

where ({C_{ E_{p,d}}}^{k}) is the k-th gait cycle of a gait session (E_{p,d}) and K the total number of gait cycles. Let (c_{t,n}^{E_{p,d}^{k}}) denote the gait cycle ({C_{E_{p,d}}}^{k}) value at time step t and joint angle n. To keep notations simple, (c_{t,n}^{E_{p,d}^{k}}) is referred to as (c_{t,n}) in what follows. ({C_{E_{p,d}}}^{k}) can simply be represented with a matrix of 101 lines and 9 columns, as follows:

$$begin{aligned} {C_{ E_{p,d}}}^{k} = begin{bmatrix} c_{1,1} &{} c_{1,2} &{}cdots &{} c_{1,9} \ c_{2,1} &{} c_{2,2} &{}cdots &{} c_{2,9}\ vdots &{} &{} &{} \ c_{101,1} &{} c_{101,2} &{}cdots &{} c_{101,9}\ end{bmatrix} end{aligned}$$

(2)

The Gait Profile Score (GPS), a walking behavior score, was computed for each gait cycle from the previously described joint angles12,13,14. The GPS is a single index measure that summarizes the overall deviation of kinematic gait data relative to normative data. It can be decomposed to provide Gait Variable Scores (GVS) for nine key component kinematic gait variables, which are presented as a Movement Analysis Profile (MAP). The GVS corresponding to the n-th kinematic variable, GVS(_{textrm{n}}), is given by15,16,17:

$$begin{aligned} GVS_n = sqrt{frac{1}{T}sum _{t=1}^{T}(c_{t,n} - c_{t,n} ^{ref})^{2}} end{aligned}$$

(3)

where t is a specific point in the gait cycle, T its total number of points (typically equal to 10118,19), (c_{t,n}) the value of the kinematic variable n at point t, and (c_{t,n}^{textrm{ref}}) is its mean on the reference population (physiological normative). The GPS is obtained from the GVS scores15,17 as follows:

$$begin{aligned} GPS = sqrt{frac{1}{N}sum _{n=1}^{N}GVS_n^{2}} end{aligned}$$

(4)

where N is the total number of kinematic variables (equal to 9 by definition).

We had a total of 1459 gait sessions from 734 patients (115 adults and 619 children). Each patient had an average of 1.988 gait sessions with a standard deviation of 1.515. 53,693 gait cycles were collected. Their average number per gait session is equal to 18 with a standard deviation of 6. Neurological conditions, notably cerebral palsy, are the most frequent etiologies, as we can see in Fig.1.

The average patient age within the first gait session is equal to 14years, with a standard deviation of 16years. The time delay between the first and last gait session (for patients with more than one gait session, i.e., 319) is equal to 3.92years on average with a standard deviation of 3.24years. Directly consecutive gait sessions are, on average, separated by approximately 740days, with a standard deviation of 577days. The shortest (resp. longest) time delay was equal to 4 (resp. 4438) days. We had 1384 pairs of directly consecutive gait sessions belonging to 319 patients (the remaining patients were removed since they had only one gait session). Involved gait conditions are various: without any equipment, with a cane, with a rollator, with an orthosis, with a prosthesis.. Only pairs of gait sessions without equipment were selected in order to be in the same condition (79% of all available pairs, i.e. 1152). The first gait sessions in these pairs were used for training. Models were fed the gait cycles of these first gait sessions (i.e., 21,167 gait cycles in total).

GPS variation prediction is similar enough to a Time Series Classification (TSC) issue that its proposed popular architectures should be adopted. Consecutive gait session pairs ((E_{p,d}, E_{p,d+Delta d})) were considered. For each gait cycle ({C_{ E_{p,d}}}^{k}) of the current gait session (E_{p,d}), a GPS variation (Delta {}GPS) was computed using:

$$begin{aligned} Delta {}GPS({C_{ E_{p,d}}}^{k}) = GPS_{avg}( E_{p,d+Delta d}) - GPS({C_{ E_{p,d}}}^{k}) end{aligned}$$

(5)

where (GPS_{avg}(E_{p,d+Delta d})) is the average GPS per cycle of (E_{p,d+Delta d}) and (GPS({C_{ E_{p,d}}}^{k})) the GPS of the current gait cycle ({C_{E_{p,d}}}^{k}). The average GPS per cycle (GPS_{average}(E_{p,d})) of a gait session (E_{p,d}) is simply equal to:

$$begin{aligned} GPS_{avg}(E_{p,d}) = frac{sum _{k=1}^{K} GPS({C_{ E_{p,d}}}^{k}) }{K} end{aligned}$$

(6)

(Delta {})

GPS was ranked in a binary fashion. Either it is negative, in which case the patients gait improves (class 1), or it is positive, in which case the patients gait worsens (class 0). The metric used is the Area Under the Curve (AUC).

The distribution of patients between training, validation, and test groups is provided in Table1. Such a split put 73%, 12%, and 14% of total gait cycles within the training, validation, and test groups, respectively.

To be exhaustive, one MLP, one recurrent neural network (LSTM), one hybrid architecture (Encoder), several CNN architectures (FCN, ResNet, t-LeNet), and a one-dimensional Transformer20 were included. The MLP and LSTM were designed and developed from scratch. Their hyper-parameters were optimized manually. FCN, ResNet, Encoder, and t-LeNet are among the most effective end-to-end discriminative architectures regarding the TSC state-of-the-art10. These methods were also compared to the Transformer, a more recent and popular architecture. The Transformer does not suffer from long-range context dependency issues compared to LSTM21. In addition, it is notable for requiring less training. The Adam optimizer22 and binary cross-entropy loss were employed23.

For MLP, gait cycles were flattened so that the input length was equal to 909 time steps. The number of neurons was the same across all the fully connected layers. Many values of this number were tested to find the best structure for our task. In the same way, the number of layers was optimized. The corresponding architecture is shown in Fig.2.

MLP architecture for prediction.

LSTM layers were stacked, and a dropout was added before the last layer to avoid overfitting. The corresponding architecture is shown in Fig.3.

LSTM architecture for prediction.

For FCN, ResNet, Encoder and t-LeNet, the architectures proposed in Ref.10 were considered. They are shown in Figs. 4, 5, 6 and 7, respectively. We followed an existing implementation24 to set up the Transformer.

FCN architecture for prediction.

ResNet architecture for prediction.

Encoder architecture for prediction.

t-LeNet architecture for prediction.

Different techniques of data augmentation were tested as a pre-processing step to avoid overfitting: jittering, scaling, window warping, permutation, and window slicing. Their hyperparameters were empirically optimized for each model. These are among the TSC literatures most frequently utilized techniques, particularly when it comes from sensor data10.

Image-based time series representation initiated a new branch of deep learning approaches that consider image transformation as an innovative pre-processing of feature engineering25. In an attempt to reveal features and patterns less visible in the one-dimensional sequence of the original time series, many transformation methods were developed to encode time series as input images.

In our study, sensor modalities are transformed to the visual domain using 2D FFT in order to utilize a set of pre-trained CNN models for transfer learning on the converted imagery data. The full workflow of our framework is represented in Fig.8.

Proposed (Delta GPS) prediction workflow for the image-based approach.

2D FFT is used to work in the frequency domain or Fourier domain because it efficiently extracts features based on the frequency of each time step in the time series. It can be defined as:

$$F(u,v) = frac{1}{{T.N}}sumlimits_{{t = 0}}^{T} {sumlimits_{{n = 0}}^{N} {c_{{t,n}} } } exp left( { - j2pi left( {frac{{ut}}{T} + frac{{vn}}{N}} right)} right)$$

(7)

where F(u,v) is the direct Fourier transform of the gait cycle. It is a complex function that shows the phase and magnitude of the signal in the frequency domain. u and v are the frequency space coordinates. The magnitude of the 2D FFT |F(u,v)|, also known as the spectrum, is a two-dimensional signal that represents frequency information. Because the 2D FFT has translation and rotation attributes, the zero-frequency component can be moved to the center of |F(u,v)| without losing any information, making the spectrum image more visible. The centralized FFT spectrums were computed and fed to the proposed deep learning models. A centralized FFT spectrum for a given gait cycle is represented in Fig.9.

2D FFT for a given gait cycle. (a) The gait cycle; (b) FFT spectrum of the gait cycle; (c) Centralized FFT spectrum of the gait cycle.

The Timm librarys26 pre-trained VGG16, ResNet34, EfficientNet_b0, and the Vision Transformer vit_base_patch16_224 were investigated. They were pre-trained on a large collection of images, in a supervised fashion. For the Transformer, the pre-training was at a resolution of (224 times 224) pixels. Its input images were considered as a sequence of fixed-size patches (resolution (16 times 16)), which were linearly embedded.

Converting our grayscale images to RGB images was not necessary because Timms implementations support any number of input channels. The models minimum input size for VGG16 is (32 times 32). The images width dimension (N) equals 9, which is less than 32. In order to fit the minimum needed size, 2D FFT images were repeated 4 times in this width dimension. Transfer learning with fine-tuning methods was employed. One neurons final fully connected layer was used. In the same way that the top layers were trainable, all convolutional blocks were.

The pre-trained Timm models are deep and sophisticated, with many layers. As a result, a CNN model with fewer parameters, designed from scratch, was conceived. The number of used two-dimensional convolutional layers was a hyper-parameter to optimize in a finite range of values {1, 2, 3, 4, 5}. After the convolutional block, a dropout function was applied. Following that, two-dimensional max-pooling (MaxPooling2D) and batch normalization were used. The flattened output of the batch normalization was then fed to a dense layer of a certain number of neurons to tune. In order to predict the (Delta GPS), our model had a dense output layer with a single neuron. The corresponding architecture is shown in Fig.10.

Tailored 2D CNN for prediction.

The following are all of the architecture hyper-parameters to tune: the number of convolutional layers (num_layers), the number of filters for each convolution layer (num_filters), the kernel size of each convolution layer (kernel_size), the dropout rate (dropout), the pooling size of the MaxPooling2D (pool_size), the number of neurons in the dense layer (units), and the learning rate (lr). Five models with a varying number of convolutional layers (from 1 to 5) were tested. For each of them, the rest of the hyper-parameters were tuned using KerasTuner9 to maximize the validation AUC.

Originally posted here:
Quantitative gait analysis and prediction using artificial intelligence for patients with gait disorders | Scientific Reports - Nature.com

Related Posts

Comments are closed.