Attention‐aware spatio‐temporal learning for multi‐view gait‐based age estimation and gender classification

Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due to the unique advantage that gaits can be perceived at a long distance. Intuitively, age and gender can be recognised by observing people'...

Full description

Saved in:

Bibliographic Details
Published in	IET computer vision Vol. 19; no. 1
Main Authors	Huang, Binyuan, Luo, Yongdong, Xie, Jiahui, Pan, Jiahui, Zhou, Chengju
Format	Journal Article
Language	English
Published	01.01.2025
Online Access	Get full text

Cover

Loading…

Abstract	Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due to the unique advantage that gaits can be perceived at a long distance. Intuitively, age and gender can be recognised by observing people's static shape (e.g. different hairstyles between males and females) and dynamic motion (e.g. different walking velocities between the elderly and youth). However, most of the existing gait‐based age and gender recognition methods are based on Gait Energy Image (GEI), which loses the capability of explicitly modelling temporal dynamic information and is not robust to the multi‐view recognition that inevitably happens in a real application. Therefore, in this study, an Attention‐aware Spatio‐Temporal Learning (ASTL) framework is proposed, which employs a silhouette sequence as input to learn essential and invariable spatial‐temporal gait representations. More specifically, a Multi‐Scale Temporal Aggregation (MSTA) module provides an effective scheme for dynamic gait description by exploring and aggregating multi‐scale temporal interval information, which is a core supplement to spatial representation. Then, a Multiple Attention Aggregation (MAA) module is designed to help the network focus on the most discriminatory information along temporal, spatial and channel dimensions. Finally, a Multimodal Collaborative Learning (MCL) block gives full play to the advantages of different modal features through a multimodal cooperative learning strategy. The mean absolute error (MAE) for the age estimation and the correct classification rate (CCR) for the gender classification on OU‐MVLP achieve 6.68 years and 97%, respectively, demonstrating the superiority of the method. Ablation experiments and visualisation results also prove the effectiveness of the three individual modules in their framework.
AbstractList	Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due to the unique advantage that gaits can be perceived at a long distance. Intuitively, age and gender can be recognised by observing people's static shape (e.g. different hairstyles between males and females) and dynamic motion (e.g. different walking velocities between the elderly and youth). However, most of the existing gait‐based age and gender recognition methods are based on Gait Energy Image (GEI), which loses the capability of explicitly modelling temporal dynamic information and is not robust to the multi‐view recognition that inevitably happens in a real application. Therefore, in this study, an Attention‐aware Spatio‐Temporal Learning (ASTL) framework is proposed, which employs a silhouette sequence as input to learn essential and invariable spatial‐temporal gait representations. More specifically, a Multi‐Scale Temporal Aggregation (MSTA) module provides an effective scheme for dynamic gait description by exploring and aggregating multi‐scale temporal interval information, which is a core supplement to spatial representation. Then, a Multiple Attention Aggregation (MAA) module is designed to help the network focus on the most discriminatory information along temporal, spatial and channel dimensions. Finally, a Multimodal Collaborative Learning (MCL) block gives full play to the advantages of different modal features through a multimodal cooperative learning strategy. The mean absolute error (MAE) for the age estimation and the correct classification rate (CCR) for the gender classification on OU‐MVLP achieve 6.68 years and 97%, respectively, demonstrating the superiority of the method. Ablation experiments and visualisation results also prove the effectiveness of the three individual modules in their framework.
Author	Xie, Jiahui Pan, Jiahui Huang, Binyuan Luo, Yongdong Zhou, Chengju
Author_xml	– sequence: 1 givenname: Binyuan surname: Huang fullname: Huang, Binyuan organization: School of Software South China Normal University Guangzhou China – sequence: 2 givenname: Yongdong surname: Luo fullname: Luo, Yongdong organization: School of Software South China Normal University Guangzhou China – sequence: 3 givenname: Jiahui surname: Xie fullname: Xie, Jiahui organization: School of Software South China Normal University Guangzhou China – sequence: 4 givenname: Jiahui surname: Pan fullname: Pan, Jiahui organization: School of Software South China Normal University Guangzhou China – sequence: 5 givenname: Chengju orcidid: 0000-0003-4948-0909 surname: Zhou fullname: Zhou, Chengju organization: School of Software South China Normal University Guangzhou China
BookMark	eNptUMFKAzEUDFLBtnrxC3IWtibZ7K45lqJWKHjR8_KavCyRbbYkscWbn-A3-iWmVTyIpzfDvHm8mQkZ-cEjIZeczTiT6lrvnJhxwevqhIx5U_FC1ZKNfnEpzsgkxhfGqlopOSZxnhL65Ab_-f4BewhI4xYyzzThZjsE6GmPELzzHbVDoJvXPrms7hzuaQcuZbyGiIZChxRjcpuD31PwhnboDQaqe4jRWaePyjk5tdBHvPiZU_J8d_u0WBarx_uHxXxV6LIRqeBacw4lNHCz1nXdVIppUzFQBkWj0UpTGWaVVTJnEWwtBWTdWC1LDmhEOSXs-64OQ4wBbatdOn6QAri-5aw9lNYeSmuPpWXL1R_LNuQ84e2_5S9s03ee
CitedBy_id	crossref_primary_10_1016_j_array_2025_100379 crossref_primary_10_1016_j_eswa_2024_123843 crossref_primary_10_3233_THC_235012
Cites_doi	10.1609/aaai.v33i01.33018126 10.1109/ICIP.2017.8296252 10.1186/s41074‐017‐0035‐2 10.1007/978-981-16-8225-4_22 10.22161/ijaers.88.52 10.1186/s41074‐019‐0053‐3 10.1109/CVPR.2018.00745 10.1016/j.patcog.2019.04.023 10.1109/tpami.2016.2545669 10.1007/978-3-030-58545-7_22 10.1109/CVPR.2018.00813 10.1109/tpami.2022.3183288 10.1016/j.patcog.2022.108797 10.1109/ICB45273.2019.8987240 10.1109/IJCB48548.2020.9304914 10.1007/978-3-030-01234-2_1 10.1007/978-981-10-4765-7_34 10.1007/s11432‐019‐2733‐4 10.1214/aoms/1177729694 10.1109/tpami.2019.2938758 10.1109/CVPR42600.2020.01423 10.1109/WACV48630.2021.00350 10.1088/1742‐6596/2010/1/012031 10.1109/ICB.2016.7550060 10.1109/ICPR.2010.934 10.1109/TIP.2022.3164543
ContentType	Journal Article
DBID	AAYXX CITATION
DOI	10.1049/cvi2.12165
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISSN	1751-9640
ExternalDocumentID	10_1049_cvi2_12165
GroupedDBID	.DC 0R~ 0ZK 1OC 24P 29I 5GY 6IK 8FE 8FG 8VB AAHJG AAJGR AAMMB AAYXX ABJCF ABQXS ABUWG ACCMX ACESK ACGFO ACGFS ACIWK ACXQS ADEYR AEFGJ AEGXH AENEX AFKRA AGXDD AIDQK AIDYY ALMA_UNASSIGNED_HOLDINGS ALUQN ARAPS AVUZU AZQEC BENPR BGLVJ BPHCQ CCPQU CITATION CS3 DU5 DWQXO EBS EJD GNUQQ GROUPED_DOAJ HCIFZ HZ~ IAO IDLOA IPLJI ITC J9A K1G K6V K7- L6V LAI M43 M7S MCNEO MS~ O9- OK1 P62 PHGZM PHGZT PQGLB PQQKQ PROAC PTHSS PUEGO QWB RNS RUI S0W UNMZH WIN ZL0 ~ZZ
ID	FETCH-LOGICAL-c372t-1cc11a3a7a8bc667590cd50a9de27cef4d5d0f9f9456920b42acd5dfc431aed23
ISSN	1751-9632
IngestDate	Thu Apr 24 23:00:09 EDT 2025 Wed Aug 27 16:38:59 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c372t-1cc11a3a7a8bc667590cd50a9de27cef4d5d0f9f9456920b42acd5dfc431aed23
ORCID	0000-0003-4948-0909
OpenAccessLink	https://onlinelibrary.wiley.com/doi/pdf/10.1049/cvi2.12165
ParticipantIDs	crossref_citationtrail_10_1049_cvi2_12165 crossref_primary_10_1049_cvi2_12165
PublicationCentury	2000
PublicationDate	2025-01-01
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– month: 01 year: 2025 text: 2025-01-01 day: 01
PublicationDecade	2020
PublicationTitle	IET computer vision
PublicationYear	2025
References	e_1_2_11_10_1 e_1_2_11_31_1 e_1_2_11_30_1 e_1_2_11_14_1 e_1_2_11_13_1 e_1_2_11_12_1 e_1_2_11_11_1 e_1_2_11_7_1 e_1_2_11_29_1 e_1_2_11_6_1 e_1_2_11_28_1 e_1_2_11_5_1 e_1_2_11_27_1 e_1_2_11_4_1 e_1_2_11_26_1 e_1_2_11_3_1 e_1_2_11_2_1 Sakata A. (e_1_2_11_18_1) 2018 e_1_2_11_21_1 e_1_2_11_20_1 e_1_2_11_24_1 e_1_2_11_9_1 e_1_2_11_23_1 e_1_2_11_8_1 e_1_2_11_22_1 e_1_2_11_17_1 e_1_2_11_16_1 e_1_2_11_15_1 e_1_2_11_19_1 Takemura N. (e_1_2_11_32_1) 2018; 10 Xu C. (e_1_2_11_25_1) 2017; 9
References_xml	– ident: e_1_2_11_2_1 doi: 10.1609/aaai.v33i01.33018126 – ident: e_1_2_11_11_1 doi: 10.1109/ICIP.2017.8296252 – ident: e_1_2_11_5_1 doi: 10.1186/s41074‐017‐0035‐2 – ident: e_1_2_11_21_1 doi: 10.1007/978-981-16-8225-4_22 – ident: e_1_2_11_20_1 doi: 10.22161/ijaers.88.52 – ident: e_1_2_11_24_1 doi: 10.1186/s41074‐019‐0053‐3 – ident: e_1_2_11_9_1 doi: 10.1109/CVPR.2018.00745 – ident: e_1_2_11_10_1 doi: 10.1016/j.patcog.2019.04.023 – ident: e_1_2_11_13_1 doi: 10.1109/tpami.2016.2545669 – ident: e_1_2_11_29_1 doi: 10.1007/978-3-030-58545-7_22 – ident: e_1_2_11_6_1 doi: 10.1109/CVPR.2018.00813 – ident: e_1_2_11_26_1 doi: 10.1109/tpami.2022.3183288 – ident: e_1_2_11_28_1 doi: 10.1016/j.patcog.2022.108797 – ident: e_1_2_11_8_1 – ident: e_1_2_11_15_1 doi: 10.1109/ICB45273.2019.8987240 – ident: e_1_2_11_30_1 doi: 10.1109/IJCB48548.2020.9304914 – ident: e_1_2_11_7_1 doi: 10.1007/978-3-030-01234-2_1 – ident: e_1_2_11_23_1 doi: 10.1007/978-981-10-4765-7_34 – ident: e_1_2_11_14_1 doi: 10.1007/s11432‐019‐2733‐4 – ident: e_1_2_11_31_1 doi: 10.1214/aoms/1177729694 – ident: e_1_2_11_27_1 doi: 10.1109/tpami.2019.2938758 – ident: e_1_2_11_3_1 doi: 10.1109/CVPR42600.2020.01423 – start-page: 55 volume-title: Asian Conference on Computer Vision year: 2018 ident: e_1_2_11_18_1 – ident: e_1_2_11_12_1 doi: 10.1109/WACV48630.2021.00350 – ident: e_1_2_11_19_1 doi: 10.1088/1742‐6596/2010/1/012031 – ident: e_1_2_11_16_1 doi: 10.1109/ICB.2016.7550060 – volume: 9 start-page: 1 issue: 1 year: 2017 ident: e_1_2_11_25_1 article-title: The OU‐ISIR gait database comprising the large population dataset with age and performance evaluation of age estimation publication-title: IPSJ Trans. Computer Vision Appl. – ident: e_1_2_11_22_1 doi: 10.1109/ICPR.2010.934 – ident: e_1_2_11_4_1 doi: 10.1109/TIP.2022.3164543 – volume: 10 start-page: 1 issue: 1 year: 2018 ident: e_1_2_11_32_1 article-title: Multi‐view large population gait dataset and its performance evaluation for cross‐view gait recognition publication-title: IPSJ Trans. Computer Vision Appl. – ident: e_1_2_11_17_1
SSID	ssj0056994
Score	2.3613236
Snippet	Recently, gait‐based age and gender recognition have attracted considerable attention in the fields of advertisement marketing and surveillance retrieval due...
SourceID	crossref
SourceType	Enrichment Source Index Database
Title	Attention‐aware spatio‐temporal learning for multi‐view gait‐based age estimation and gender classification
Volume	19
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Nb9MwFLdKd-HCxwAxGMjSuKAq0Dh2Uh8L2rTtwKlFhUvl2E6JNKUTTUBw2nHH_Y38JTx_xMm6HQaXKH1x0qrvl-dnv997D6E3JGWUmtBgQWMdUU50xDW8VzQ2xTdjBUsSW-3zU3o8p6cLthgMLnuspabO38nft-aV_I9WQQZ6NVmy_6DZ8FAQwDnoF46gYTjeScfTunZsxUBZED8Nk2tjadJB6MtPnbUtIhx10lIJwxibwLISZR0kZn5TI0PpMYU4XIajDTWsbPe5kTR-tyEadbr1Tu7J4cxS1U23iJFLXu_Q4_enP5TVr6bHB2rslu2XdbVSaz-ZgnjhoienpfjWlF20q9oS-l0Lwnq7Fs7QZiyO4OV3llj3Za58U7DOfBuFN4w-LHJAU_JHSUytDNd64npl7a0ZL_AQbQSe8qW5d2nvvYd2ALmEDNHO9PP867yd1VnKbVPN8LvbUreUv---uefc9LyU2SP0wC8v8NRh5TEa6GoXPfRLDewN-eYJ2gTo_Lm4sqDBDjTwsYULbuGCAS7YwgWuGqBgAxQ4txDBABHcQQQDRLCDCL4OkadofnQ4-3gc-f4bkUwyUkexlHEsEpGJSS5TWFnysVRsLLjSJJO6oIqpccELDk44J-OcEgHXVSHBKRVakeQZGlbrSj9HuJhMksLMDWACaJqoXCaUKSkyRlORU7qH3rZ_3FL64vSmR8rZ8qaK9tBBGHvuSrLcMurFnUa9RPc7dO6jYf290a_Ay6zz1x4AfwGh_4Ml
linkProvider	Wiley-Blackwell
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Attention%E2%80%90aware+spatio%E2%80%90temporal+learning+for+multi%E2%80%90view+gait%E2%80%90based+age+estimation+and+gender+classification&rft.jtitle=IET+computer+vision&rft.au=Huang%2C+Binyuan&rft.au=Luo%2C+Yongdong&rft.au=Xie%2C+Jiahui&rft.au=Pan%2C+Jiahui&rft.date=2025-01-01&rft.issn=1751-9632&rft.eissn=1751-9640&rft.volume=19&rft.issue=1&rft_id=info:doi/10.1049%2Fcvi2.12165&rft.externalDBID=n%2Fa&rft.externalDocID=10_1049_cvi2_12165
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1751-9632&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1751-9632&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1751-9632&client=summon