Cross-Domain Spatial-Temporal GCN Model for Micro-Expression Recognition
-
Graphical Abstract
-
Abstract
Although significant progress has been made in micro-expression recognition, effectively modeling the intricate spatial-temporal dynamics remains a persistent challenge owing to their brief duration and complex facial dynamics. Furthermore, existing methods often suffer from limited generalization, as they primarily focus on single-dataset tasks with small sample sizes. To address these two issues, this paper proposes the cross-domain spatial-temporal graph convolutional network (GCN) (CDST-GCN) model, which comprises two primary components: a siamese attention spatial-temporal branch (SASTB) and a global-aware dynamic spatial-temporal branch (GDSTB). Specifically, SASTB utilizes a contrastive learning strategy to project macro- and micro-expressions into a shared, aligned feature space, actively addressing cross-domain discrepancies. Additionally, it integrates an attention-gated mechanism that generates adaptive adjacency matrices to flexibly model collaborative patterns among facial landmarks. While largely preserving the structural paradigm of SASTB, GDSTB enhances the feature representation by integrating global context extracted from a pretrained model. Through this dual-branch architecture, CDST-GCN successfully models both the global and local spatial-temporal features. The experimental results on CASME II and SAMM datasets demonstrate that the proposed model achieves competitive performance. Especially in more challenging 5-class tasks, the accuracy of the model on CASME II dataset is as high as 80.5%.
-
-