Cross-Domain Spatial-Temporal GCN Model for Micro-Expression Recognition

Minghui Su; Chenwen Ma; Tianhuan Huang; Lei Chen; Hongchao Zhou; Xianye Ben

doi:10.15918/j.jbit1004-0579.2025.057

Minghui Su, Chenwen Ma, Tianhuan Huang, Lei Chen, Hongchao Zhou, Xianye Ben. Cross-Domain Spatial-Temporal GCN Model for Micro-Expression Recognition[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2025, 34(5): 496-509. DOI: 10.15918/j.jbit1004-0579.2025.057

Citation:

Cross-Domain Spatial-Temporal GCN Model for Micro-Expression Recognition

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Although significant progress has been made in micro-expression recognition, effectively modeling the intricate spatial-temporal dynamics remains a persistent challenge owing to their brief duration and complex facial dynamics. Furthermore, existing methods often suffer from limited generalization, as they primarily focus on single-dataset tasks with small sample sizes. To address these two issues, this paper proposes the cross-domain spatial-temporal graph convolutional network (GCN) (CDST-GCN) model, which comprises two primary components: a siamese attention spatial-temporal branch (SASTB) and a global-aware dynamic spatial-temporal branch (GDSTB). Specifically, SASTB utilizes a contrastive learning strategy to project macro- and micro-expressions into a shared, aligned feature space, actively addressing cross-domain discrepancies. Additionally, it integrates an attention-gated mechanism that generates adaptive adjacency matrices to flexibly model collaborative patterns among facial landmarks. While largely preserving the structural paradigm of SASTB, GDSTB enhances the feature representation by integrating global context extracted from a pretrained model. Through this dual-branch architecture, CDST-GCN successfully models both the global and local spatial-temporal features. The experimental results on CASME II and SAMM datasets demonstrate that the proposed model achieves competitive performance. Especially in more challenging 5-class tasks, the accuracy of the model on CASME II dataset is as high as 80.5%.

FullText(HTML)

References (29)

Cited By

Cross-Domain Spatial-Temporal GCN Model for Micro-Expression Recognition

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content