Feedback Graph Convolutional Network for Skeleton-Based Action Recognition

Hao Yang; Dan Yan; Li Zhang; Yunda Sun; Dong Li; Stephen J Maybank

doi:10.1109/TIP.2021.3129117

Feedback Graph Convolutional Network for Skeleton-Based Action Recognition

IEEE Trans Image Process. 2022:31:164-175. doi: 10.1109/TIP.2021.3129117. Epub 2021 Dec 2.

Authors

Hao Yang, Dan Yan, Li Zhang, Yunda Sun, Dong Li, Stephen J Maybank

PMID: 34818190
DOI: 10.1109/TIP.2021.3129117

Abstract

Skeleton-based action recognition has attracted considerable attention since the skeleton data is more robust to the dynamic circumstances and complicated backgrounds than other modalities. Recently, many researchers have used the Graph Convolutional Network (GCN) to model spatial-temporal features of skeleton sequences by an end-to-end optimization. However, conventional GCNs are feedforward networks for which it is impossible for the shallower layers to access semantic information in the high-level layers. In this paper, we propose a novel network, named Feedback Graph Convolutional Network (FGCN). This is the first work that introduces a feedback mechanism into GCNs for action recognition. Compared with conventional GCNs, FGCN has the following advantages: (1) A multi-stage temporal sampling strategy is designed to extract spatial-temporal features for action recognition in a coarse to fine process; (2) A Feedback Graph Convolutional Block (FGCB) is proposed to introduce dense feedback connections into the GCNs. It transmits the high-level semantic features to the shallower layers and conveys temporal information stage by stage to model video level spatial-temporal features for action recognition; (3) The FGCN model provides predictions on-the-fly. In the early stages, its predictions are relatively coarse. These coarse predictions are treated as priors to guide the feature learning in later stages, to obtain more accurate predictions. Extensive experiments on three datasets, NTU-RGB+D, NTU-RGB+D120 and Northwestern-UCLA, demonstrate that the proposed FGCN is effective for action recognition. It achieves the state-of-the-art performance on all three datasets.