Attention‐aware spatio‐temporal learning for multi‐view gait‐based age estimation and gender classification

Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due to the unique advantage that gaits can be perceived at a long distance. Intuitively, age and gender can be recognised by observing people'...

Full description

Saved in:
Bibliographic Details
Published inIET computer vision Vol. 19; no. 1
Main Authors Huang, Binyuan, Luo, Yongdong, Xie, Jiahui, Pan, Jiahui, Zhou, Chengju
Format Journal Article
LanguageEnglish
Published 01.01.2025
Online AccessGet full text

Cover

Loading…
Abstract Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due to the unique advantage that gaits can be perceived at a long distance. Intuitively, age and gender can be recognised by observing people's static shape (e.g. different hairstyles between males and females) and dynamic motion (e.g. different walking velocities between the elderly and youth). However, most of the existing gait‐based age and gender recognition methods are based on Gait Energy Image (GEI), which loses the capability of explicitly modelling temporal dynamic information and is not robust to the multi‐view recognition that inevitably happens in a real application. Therefore, in this study, an Attention‐aware Spatio‐Temporal Learning (ASTL) framework is proposed, which employs a silhouette sequence as input to learn essential and invariable spatial‐temporal gait representations. More specifically, a Multi‐Scale Temporal Aggregation (MSTA) module provides an effective scheme for dynamic gait description by exploring and aggregating multi‐scale temporal interval information, which is a core supplement to spatial representation. Then, a Multiple Attention Aggregation (MAA) module is designed to help the network focus on the most discriminatory information along temporal, spatial and channel dimensions. Finally, a Multimodal Collaborative Learning (MCL) block gives full play to the advantages of different modal features through a multimodal cooperative learning strategy. The mean absolute error (MAE) for the age estimation and the correct classification rate (CCR) for the gender classification on OU‐MVLP achieve 6.68 years and 97%, respectively, demonstrating the superiority of the method. Ablation experiments and visualisation results also prove the effectiveness of the three individual modules in their framework.
AbstractList Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due to the unique advantage that gaits can be perceived at a long distance. Intuitively, age and gender can be recognised by observing people's static shape (e.g. different hairstyles between males and females) and dynamic motion (e.g. different walking velocities between the elderly and youth). However, most of the existing gait‐based age and gender recognition methods are based on Gait Energy Image (GEI), which loses the capability of explicitly modelling temporal dynamic information and is not robust to the multi‐view recognition that inevitably happens in a real application. Therefore, in this study, an Attention‐aware Spatio‐Temporal Learning (ASTL) framework is proposed, which employs a silhouette sequence as input to learn essential and invariable spatial‐temporal gait representations. More specifically, a Multi‐Scale Temporal Aggregation (MSTA) module provides an effective scheme for dynamic gait description by exploring and aggregating multi‐scale temporal interval information, which is a core supplement to spatial representation. Then, a Multiple Attention Aggregation (MAA) module is designed to help the network focus on the most discriminatory information along temporal, spatial and channel dimensions. Finally, a Multimodal Collaborative Learning (MCL) block gives full play to the advantages of different modal features through a multimodal cooperative learning strategy. The mean absolute error (MAE) for the age estimation and the correct classification rate (CCR) for the gender classification on OU‐MVLP achieve 6.68 years and 97%, respectively, demonstrating the superiority of the method. Ablation experiments and visualisation results also prove the effectiveness of the three individual modules in their framework.
Author Xie, Jiahui
Pan, Jiahui
Huang, Binyuan
Luo, Yongdong
Zhou, Chengju
Author_xml – sequence: 1
  givenname: Binyuan
  surname: Huang
  fullname: Huang, Binyuan
  organization: School of Software South China Normal University Guangzhou China
– sequence: 2
  givenname: Yongdong
  surname: Luo
  fullname: Luo, Yongdong
  organization: School of Software South China Normal University Guangzhou China
– sequence: 3
  givenname: Jiahui
  surname: Xie
  fullname: Xie, Jiahui
  organization: School of Software South China Normal University Guangzhou China
– sequence: 4
  givenname: Jiahui
  surname: Pan
  fullname: Pan, Jiahui
  organization: School of Software South China Normal University Guangzhou China
– sequence: 5
  givenname: Chengju
  orcidid: 0000-0003-4948-0909
  surname: Zhou
  fullname: Zhou, Chengju
  organization: School of Software South China Normal University Guangzhou China
BookMark eNptUMFKAzEUDFLBtnrxC3IWtibZ7K45lqJWKHjR8_KavCyRbbYkscWbn-A3-iWmVTyIpzfDvHm8mQkZ-cEjIZeczTiT6lrvnJhxwevqhIx5U_FC1ZKNfnEpzsgkxhfGqlopOSZxnhL65Ab_-f4BewhI4xYyzzThZjsE6GmPELzzHbVDoJvXPrms7hzuaQcuZbyGiIZChxRjcpuD31PwhnboDQaqe4jRWaePyjk5tdBHvPiZU_J8d_u0WBarx_uHxXxV6LIRqeBacw4lNHCz1nXdVIppUzFQBkWj0UpTGWaVVTJnEWwtBWTdWC1LDmhEOSXs-64OQ4wBbatdOn6QAri-5aw9lNYeSmuPpWXL1R_LNuQ84e2_5S9s03ee
CitedBy_id crossref_primary_10_1016_j_array_2025_100379
crossref_primary_10_1016_j_eswa_2024_123843
crossref_primary_10_3233_THC_235012
Cites_doi 10.1609/aaai.v33i01.33018126
10.1109/ICIP.2017.8296252
10.1186/s41074‐017‐0035‐2
10.1007/978-981-16-8225-4_22
10.22161/ijaers.88.52
10.1186/s41074‐019‐0053‐3
10.1109/CVPR.2018.00745
10.1016/j.patcog.2019.04.023
10.1109/tpami.2016.2545669
10.1007/978-3-030-58545-7_22
10.1109/CVPR.2018.00813
10.1109/tpami.2022.3183288
10.1016/j.patcog.2022.108797
10.1109/ICB45273.2019.8987240
10.1109/IJCB48548.2020.9304914
10.1007/978-3-030-01234-2_1
10.1007/978-981-10-4765-7_34
10.1007/s11432‐019‐2733‐4
10.1214/aoms/1177729694
10.1109/tpami.2019.2938758
10.1109/CVPR42600.2020.01423
10.1109/WACV48630.2021.00350
10.1088/1742‐6596/2010/1/012031
10.1109/ICB.2016.7550060
10.1109/ICPR.2010.934
10.1109/TIP.2022.3164543
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1049/cvi2.12165
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISSN 1751-9640
ExternalDocumentID 10_1049_cvi2_12165
GroupedDBID .DC
0R~
0ZK
1OC
24P
29I
5GY
6IK
8FE
8FG
8VB
AAHJG
AAJGR
AAMMB
AAYXX
ABJCF
ABQXS
ABUWG
ACCMX
ACESK
ACGFO
ACGFS
ACIWK
ACXQS
ADEYR
AEFGJ
AEGXH
AENEX
AFKRA
AGXDD
AIDQK
AIDYY
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ARAPS
AVUZU
AZQEC
BENPR
BGLVJ
BPHCQ
CCPQU
CITATION
CS3
DU5
DWQXO
EBS
EJD
GNUQQ
GROUPED_DOAJ
HCIFZ
HZ~
IAO
IDLOA
IPLJI
ITC
J9A
K1G
K6V
K7-
L6V
LAI
M43
M7S
MCNEO
MS~
O9-
OK1
P62
PHGZM
PHGZT
PQGLB
PQQKQ
PROAC
PTHSS
PUEGO
QWB
RNS
RUI
S0W
UNMZH
WIN
ZL0
~ZZ
ID FETCH-LOGICAL-c372t-1cc11a3a7a8bc667590cd50a9de27cef4d5d0f9f9456920b42acd5dfc431aed23
ISSN 1751-9632
IngestDate Thu Apr 24 23:00:09 EDT 2025
Wed Aug 27 16:38:59 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c372t-1cc11a3a7a8bc667590cd50a9de27cef4d5d0f9f9456920b42acd5dfc431aed23
ORCID 0000-0003-4948-0909
OpenAccessLink https://onlinelibrary.wiley.com/doi/pdf/10.1049/cvi2.12165
ParticipantIDs crossref_citationtrail_10_1049_cvi2_12165
crossref_primary_10_1049_cvi2_12165
PublicationCentury 2000
PublicationDate 2025-01-01
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-01
  day: 01
PublicationDecade 2020
PublicationTitle IET computer vision
PublicationYear 2025
References e_1_2_11_10_1
e_1_2_11_31_1
e_1_2_11_30_1
e_1_2_11_14_1
e_1_2_11_13_1
e_1_2_11_12_1
e_1_2_11_11_1
e_1_2_11_7_1
e_1_2_11_29_1
e_1_2_11_6_1
e_1_2_11_28_1
e_1_2_11_5_1
e_1_2_11_27_1
e_1_2_11_4_1
e_1_2_11_26_1
e_1_2_11_3_1
e_1_2_11_2_1
Sakata A. (e_1_2_11_18_1) 2018
e_1_2_11_21_1
e_1_2_11_20_1
e_1_2_11_24_1
e_1_2_11_9_1
e_1_2_11_23_1
e_1_2_11_8_1
e_1_2_11_22_1
e_1_2_11_17_1
e_1_2_11_16_1
e_1_2_11_15_1
e_1_2_11_19_1
Takemura N. (e_1_2_11_32_1) 2018; 10
Xu C. (e_1_2_11_25_1) 2017; 9
References_xml – ident: e_1_2_11_2_1
  doi: 10.1609/aaai.v33i01.33018126
– ident: e_1_2_11_11_1
  doi: 10.1109/ICIP.2017.8296252
– ident: e_1_2_11_5_1
  doi: 10.1186/s41074‐017‐0035‐2
– ident: e_1_2_11_21_1
  doi: 10.1007/978-981-16-8225-4_22
– ident: e_1_2_11_20_1
  doi: 10.22161/ijaers.88.52
– ident: e_1_2_11_24_1
  doi: 10.1186/s41074‐019‐0053‐3
– ident: e_1_2_11_9_1
  doi: 10.1109/CVPR.2018.00745
– ident: e_1_2_11_10_1
  doi: 10.1016/j.patcog.2019.04.023
– ident: e_1_2_11_13_1
  doi: 10.1109/tpami.2016.2545669
– ident: e_1_2_11_29_1
  doi: 10.1007/978-3-030-58545-7_22
– ident: e_1_2_11_6_1
  doi: 10.1109/CVPR.2018.00813
– ident: e_1_2_11_26_1
  doi: 10.1109/tpami.2022.3183288
– ident: e_1_2_11_28_1
  doi: 10.1016/j.patcog.2022.108797
– ident: e_1_2_11_8_1
– ident: e_1_2_11_15_1
  doi: 10.1109/ICB45273.2019.8987240
– ident: e_1_2_11_30_1
  doi: 10.1109/IJCB48548.2020.9304914
– ident: e_1_2_11_7_1
  doi: 10.1007/978-3-030-01234-2_1
– ident: e_1_2_11_23_1
  doi: 10.1007/978-981-10-4765-7_34
– ident: e_1_2_11_14_1
  doi: 10.1007/s11432‐019‐2733‐4
– ident: e_1_2_11_31_1
  doi: 10.1214/aoms/1177729694
– ident: e_1_2_11_27_1
  doi: 10.1109/tpami.2019.2938758
– ident: e_1_2_11_3_1
  doi: 10.1109/CVPR42600.2020.01423
– start-page: 55
  volume-title: Asian Conference on Computer Vision
  year: 2018
  ident: e_1_2_11_18_1
– ident: e_1_2_11_12_1
  doi: 10.1109/WACV48630.2021.00350
– ident: e_1_2_11_19_1
  doi: 10.1088/1742‐6596/2010/1/012031
– ident: e_1_2_11_16_1
  doi: 10.1109/ICB.2016.7550060
– volume: 9
  start-page: 1
  issue: 1
  year: 2017
  ident: e_1_2_11_25_1
  article-title: The OU‐ISIR gait database comprising the large population dataset with age and performance evaluation of age estimation
  publication-title: IPSJ Trans. Computer Vision Appl.
– ident: e_1_2_11_22_1
  doi: 10.1109/ICPR.2010.934
– ident: e_1_2_11_4_1
  doi: 10.1109/TIP.2022.3164543
– volume: 10
  start-page: 1
  issue: 1
  year: 2018
  ident: e_1_2_11_32_1
  article-title: Multi‐view large population gait dataset and its performance evaluation for cross‐view gait recognition
  publication-title: IPSJ Trans. Computer Vision Appl.
– ident: e_1_2_11_17_1
SSID ssj0056994
Score 2.3613236
Snippet Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due...
SourceID crossref
SourceType Enrichment Source
Index Database
Title Attention‐aware spatio‐temporal learning for multi‐view gait‐based age estimation and gender classification
Volume 19
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Nb9MwFLdKd-HCxwAxGMjSuKAq0Dh2Uh8L2rTtwKlFhUvl2E6JNKUTTUBw2nHH_Y38JTx_xMm6HQaXKH1x0qrvl-dnv997D6E3JGWUmtBgQWMdUU50xDW8VzQ2xTdjBUsSW-3zU3o8p6cLthgMLnuspabO38nft-aV_I9WQQZ6NVmy_6DZ8FAQwDnoF46gYTjeScfTunZsxUBZED8Nk2tjadJB6MtPnbUtIhx10lIJwxibwLISZR0kZn5TI0PpMYU4XIajDTWsbPe5kTR-tyEadbr1Tu7J4cxS1U23iJFLXu_Q4_enP5TVr6bHB2rslu2XdbVSaz-ZgnjhoienpfjWlF20q9oS-l0Lwnq7Fs7QZiyO4OV3llj3Za58U7DOfBuFN4w-LHJAU_JHSUytDNd64npl7a0ZL_AQbQSe8qW5d2nvvYd2ALmEDNHO9PP867yd1VnKbVPN8LvbUreUv---uefc9LyU2SP0wC8v8NRh5TEa6GoXPfRLDewN-eYJ2gTo_Lm4sqDBDjTwsYULbuGCAS7YwgWuGqBgAxQ4txDBABHcQQQDRLCDCL4OkadofnQ4-3gc-f4bkUwyUkexlHEsEpGJSS5TWFnysVRsLLjSJJO6oIqpccELDk44J-OcEgHXVSHBKRVakeQZGlbrSj9HuJhMksLMDWACaJqoXCaUKSkyRlORU7qH3rZ_3FL64vSmR8rZ8qaK9tBBGHvuSrLcMurFnUa9RPc7dO6jYf290a_Ay6zz1x4AfwGh_4Ml
linkProvider Wiley-Blackwell
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Attention%E2%80%90aware+spatio%E2%80%90temporal+learning+for+multi%E2%80%90view+gait%E2%80%90based+age+estimation+and+gender+classification&rft.jtitle=IET+computer+vision&rft.au=Huang%2C+Binyuan&rft.au=Luo%2C+Yongdong&rft.au=Xie%2C+Jiahui&rft.au=Pan%2C+Jiahui&rft.date=2025-01-01&rft.issn=1751-9632&rft.eissn=1751-9640&rft.volume=19&rft.issue=1&rft_id=info:doi/10.1049%2Fcvi2.12165&rft.externalDBID=n%2Fa&rft.externalDocID=10_1049_cvi2_12165
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1751-9632&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1751-9632&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1751-9632&client=summon