Toward Foundation Models for Inclusive Object Detection: Geometry- and Category-Aware Feature Extraction Across Road User Categories

The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the priorities of automated driving and smart infrastructure services. Three-dimensional (3-D) LiDAR-based object detection has been a promising...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on systems, man, and cybernetics. Systems Vol. 54; no. 11; pp. 6570 - 6580
Main Authors Meng, Zonglin, Xia, Xin, Ma, Jiaqi
Format Journal Article
LanguageEnglish
Published IEEE 01.11.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the priorities of automated driving and smart infrastructure services. Three-dimensional (3-D) LiDAR-based object detection has been a promising approach to perceiving road users. Despite accurate 3-D geometry information, the point cloud from LiDAR is usually nonuniform, and learning the effective point cloud abstract representations for diverse road users remains challenging for 3-D object detection, particularly for small objects such as VRUs. For inclusive object detection (IDetect), we propose a general foundation convolution component, called geometry-aware convolution (GA Conv) toward a foundation feature extraction model, to serve as basic convolution operations of the neutral network for inclusive 3-D object detection. Further, the GA Conv operations are then utilized as the elementary feature extraction layers to build a novel elegant and pyramid network for IDetect. It learns the effective geometric-related features from the unstructured point cloud data by implicitly learning the distribution property and geometry-related features from different categories of road users in particular for VRUs. The proposed IDetect is comprehensively evaluated on the large-scale benchmark Waymo open datasets with all categories of road users. The qualitative and quantitative experiment results demonstrate that IDetect can effectively consider the nonuniform distributed point clouds and learn the geometric features to assist the different categories of road user detection. In addition, the GA Conv has been integrated with other state-of-the-art neural networks and a performance boost for VRU detection has been demonstrated, showing the foundation functionality of the GA Conv and making it a general component in the future inclusive 3-D object detection foundation model.
AbstractList The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the priorities of automated driving and smart infrastructure services. Three-dimensional (3-D) LiDAR-based object detection has been a promising approach to perceiving road users. Despite accurate 3-D geometry information, the point cloud from LiDAR is usually nonuniform, and learning the effective point cloud abstract representations for diverse road users remains challenging for 3-D object detection, particularly for small objects such as VRUs. For inclusive object detection (IDetect), we propose a general foundation convolution component, called geometry-aware convolution (GA Conv) toward a foundation feature extraction model, to serve as basic convolution operations of the neutral network for inclusive 3-D object detection. Further, the GA Conv operations are then utilized as the elementary feature extraction layers to build a novel elegant and pyramid network for IDetect. It learns the effective geometric-related features from the unstructured point cloud data by implicitly learning the distribution property and geometry-related features from different categories of road users in particular for VRUs. The proposed IDetect is comprehensively evaluated on the large-scale benchmark Waymo open datasets with all categories of road users. The qualitative and quantitative experiment results demonstrate that IDetect can effectively consider the nonuniform distributed point clouds and learn the geometric features to assist the different categories of road user detection. In addition, the GA Conv has been integrated with other state-of-the-art neural networks and a performance boost for VRU detection has been demonstrated, showing the foundation functionality of the GA Conv and making it a general component in the future inclusive 3-D object detection foundation model.
Author Meng, Zonglin
Ma, Jiaqi
Xia, Xin
Author_xml – sequence: 1
  givenname: Zonglin
  orcidid: 0000-0002-0592-0135
  surname: Meng
  fullname: Meng, Zonglin
  organization: Department of Civil and Environmental Engineering, University of California, Los Angeles, CA, USA
– sequence: 2
  givenname: Xin
  orcidid: 0000-0002-5108-7578
  surname: Xia
  fullname: Xia, Xin
  email: x35xia@ucla.edu
  organization: Department of Civil and Environmental Engineering, University of California, Los Angeles, CA, USA
– sequence: 3
  givenname: Jiaqi
  orcidid: 0000-0002-8184-5157
  surname: Ma
  fullname: Ma, Jiaqi
  organization: Department of Civil and Environmental Engineering, University of California, Los Angeles, CA, USA
BookMark eNpNkFFPwjAUhRuDiYj8ABMf-geGvS3rNt_IBCSBkCg8L113a0ZgNW1RefeHO0CNL_fcm5xzcvNdk05jGyTkFtgAgGX3q5dFPuCMDwdCpHECcEG6HGQacS54528HeUX63m8YY8BTKZjskq-V_VCuohO7byoVatvQha1w66mxjs4avd37-h3pstygDvQRQyut64FO0e4wuENEVVPRXAV8te01auuQTlCFfavjz-DUKUBH2lnv6bNVFV17dL-RGv0NuTRq67H_oz2ynoxX-VM0X05n-WgeaQ5piEojVBkrPtSa6XaUCXIjdaaE4onglZFxJozMhgJinbA0TUUsASoGBjLOtegROPeeXnFoijdX75Q7FMCKI8niSLI4kix-SLaZu3OmRsR__pglqYzFN1K6cts
CODEN ITSMFE
Cites_doi 10.1109/OJITS.2022.3155126
10.1109/tiv.2023.3326169
10.1109/TSMC.2022.3226748
10.1109/OJITS.2021.3139393
10.1109/TSMC.2023.3301662
10.1109/CVPR46437.2021.00567
10.1109/JIOT.2023.3307002
10.1109/TIV.2023.3285417
10.1109/OJITS.2022.3194879
10.1609/aaai.v36i1.19980
10.1631/fitee.2100418
10.1109/TPEL.2023.3341797
10.3390/s18103337
10.1109/JAS.2022.105878
10.1109/CVPR52688.2022.00827
10.1109/TSMC.2020.3042823
10.1109/TIV.2023.3244948
10.1109/JSEN.2022.3230947
10.1109/CVPR42600.2020.01054
10.1109/TMECH.2023.3236245
10.1109/CVPR42600.2020.00252
10.1109/OJITS.2022.3169700
10.1109/TIV.2023.3282567
10.1109/CVPR.2019.00086
10.1109/JAS.2020.1003453
10.1109/CVPR52729.2023.01318
10.1609/aaai.v35i2.16207
10.1109/CVPR.2018.00961
10.1109/JAS.2023.123447
10.1109/JAS.2020.1003414
10.1109/TSMC.2023.3327142
10.1109/ojits.2021.3109423
10.1109/TSMC.2023.3324797
10.1109/OJITS.2022.3176471
10.1631/FITEE.2200380
10.1109/TPAMI.2021.3104172
10.1109/CVPR52688.2022.00823
10.1109/OJITS.2022.3214094
10.1109/TIV.2023.3260040
10.1109/CVPR.2019.01298
10.1109/OJITS.2022.3160888
10.1109/TSMC.2023.3301881
10.1109/CVPR.2018.00472
10.1109/TIV.2023.3298528
10.1109/TIV.2023.3274536
10.1109/TSMC.2019.2945053
10.1109/JAS.2022.105893
10.1109/ICCV.2017.324
10.1016/j.trc.2023.104120
10.1016/j.displa.2022.102322
10.1109/JAS.2023.123696
10.1109/TSMC.2022.3228594
10.1109/CVPR.2012.6248074
10.1109/TIV.2023.3311949
10.1109/IAVVC57316.2023.10328081
10.1109/JAS.2023.123210
10.1631/FITEE.2200323
10.1109/TSMC.2023.3320750
10.1109/TSMC.2020.3005231
10.1109/CVPR52688.2022.00828
10.1007/s11263-022-01710-9
10.1109/ICCV.2019.00651
10.1016/j.ifacol.2023.10.511
10.1109/CVPR52688.2022.00535
10.1109/TIV.2022.3225340
10.1109/TSMC.2023.3325248
10.1109/CVPR46437.2021.01161
10.1007/978-3-031-20080-9_25
10.1109/TSMC.2023.3328319
10.1109/OJITS.2022.3205504
10.1109/TSMC.2022.3228314
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TSMC.2024.3385711
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library Online
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2168-2232
EndPage 6580
ExternalDocumentID 10_1109_TSMC_2024_3385711
10507865
Genre orig-research
GrantInformation_xml – fundername: RIMI: Infrastructure-Assisted Automation
– fundername: Federal Highway Administration Center of Excellence on New Mobility and Automated Vehicle Program
– fundername: Cooperative Perception and Control for Freeway Traffic System Operations
– fundername: Federal Highway Administration Exploratory Advanced Research (EAR) Program
– fundername: California Statewide Transportation Research Program (SB 1) Program
GroupedDBID 0R~
6IK
97E
AAJGR
AASAJ
ABQJQ
ACGFS
ACIWK
AKJIK
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PQQKQ
RIA
RIE
RIG
RNS
AAYXX
CITATION
ID FETCH-LOGICAL-c218t-bf3ab5a24cc0c4ccb7e2f6c9a3a2732df6593f694315c7088835611d01f1922c3
IEDL.DBID RIE
ISSN 2168-2216
IngestDate Wed Oct 23 14:17:12 EDT 2024
Wed Oct 23 05:52:15 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c218t-bf3ab5a24cc0c4ccb7e2f6c9a3a2732df6593f694315c7088835611d01f1922c3
ORCID 0000-0002-0592-0135
0000-0002-5108-7578
0000-0002-8184-5157
PageCount 11
ParticipantIDs crossref_primary_10_1109_TSMC_2024_3385711
ieee_primary_10507865
PublicationCentury 2000
PublicationDate 2024-Nov.
PublicationDateYYYYMMDD 2024-11-01
PublicationDate_xml – month: 11
  year: 2024
  text: 2024-Nov.
PublicationDecade 2020
PublicationTitle IEEE transactions on systems, man, and cybernetics. Systems
PublicationTitleAbbrev TSMC
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref57
ref12
ref56
ref15
ref59
ref58
ref53
ref52
ref11
ref55
ref10
ref54
ref17
ref16
ref19
Hu (ref14) 2022
ref18
ref51
ref50
ref46
ref45
ref48
ref47
ref42
ref41
ref44
ref43
ref49
ref8
ref7
ref9
ref4
ref3
ref6
Chen (ref65) 2022
ref5
ref40
ref34
ref37
ref31
ref75
ref30
ref74
ref33
ref32
Qi (ref36)
ref76
ref2
ref1
ref39
ref71
ref70
ref73
ref72
ref24
Qi (ref35)
ref68
ref23
ref67
ref26
ref25
ref69
ref20
ref64
ref63
ref22
Graham (ref38) 2017
ref66
ref21
ref28
ref27
ref29
ref60
ref62
ref61
References_xml – ident: ref19
  doi: 10.1109/OJITS.2022.3155126
– ident: ref52
  doi: 10.1109/tiv.2023.3326169
– ident: ref54
  doi: 10.1109/TSMC.2022.3226748
– ident: ref10
  doi: 10.1109/OJITS.2021.3139393
– ident: ref29
  doi: 10.1109/TSMC.2023.3301662
– ident: ref46
  doi: 10.1109/CVPR46437.2021.00567
– ident: ref9
  doi: 10.1109/JIOT.2023.3307002
– ident: ref21
  doi: 10.1109/TIV.2023.3285417
– start-page: 1
  volume-title: Proc. 31st Conf. Neural Inf. Process. Syst.
  ident: ref36
  article-title: Pointnet++: Deep hierarchical feature learning on point sets in a metric space
  contributor:
    fullname: Qi
– ident: ref18
  doi: 10.1109/OJITS.2022.3194879
– ident: ref70
  doi: 10.1609/aaai.v36i1.19980
– ident: ref1
  doi: 10.1631/fitee.2100418
– ident: ref2
  doi: 10.1109/TPEL.2023.3341797
– ident: ref40
  doi: 10.3390/s18103337
– ident: ref12
  doi: 10.1109/JAS.2022.105878
– ident: ref67
  doi: 10.1109/CVPR52688.2022.00827
– ident: ref33
  doi: 10.1109/TSMC.2020.3042823
– ident: ref50
  doi: 10.1109/TIV.2023.3244948
– ident: ref66
  doi: 10.1109/JSEN.2022.3230947
– ident: ref45
  doi: 10.1109/CVPR42600.2020.01054
– ident: ref7
  doi: 10.1109/TMECH.2023.3236245
– ident: ref74
  doi: 10.1109/CVPR42600.2020.00252
– ident: ref17
  doi: 10.1109/OJITS.2022.3169700
– ident: ref57
  doi: 10.1109/TIV.2023.3282567
– ident: ref37
  doi: 10.1109/CVPR.2019.00086
– ident: ref13
  doi: 10.1109/JAS.2020.1003453
– ident: ref51
  doi: 10.1109/CVPR52729.2023.01318
– ident: ref42
  doi: 10.1609/aaai.v35i2.16207
– ident: ref44
  doi: 10.1109/CVPR.2018.00961
– ident: ref53
  doi: 10.1109/JAS.2023.123447
– ident: ref60
  doi: 10.1109/JAS.2020.1003414
– ident: ref28
  doi: 10.1109/TSMC.2023.3327142
– ident: ref25
  doi: 10.1109/ojits.2021.3109423
– ident: ref31
  doi: 10.1109/TSMC.2023.3324797
– ident: ref55
  doi: 10.1109/OJITS.2022.3176471
– ident: ref27
  doi: 10.1631/FITEE.2200380
– ident: ref69
  doi: 10.1109/TPAMI.2021.3104172
– ident: ref68
  doi: 10.1109/CVPR52688.2022.00823
– ident: ref22
  doi: 10.1109/OJITS.2022.3214094
– ident: ref56
  doi: 10.1109/TIV.2023.3260040
– ident: ref39
  doi: 10.1109/CVPR.2019.01298
– ident: ref23
  doi: 10.1109/OJITS.2022.3160888
– ident: ref24
  doi: 10.1109/TSMC.2023.3301881
– ident: ref41
  doi: 10.1109/CVPR.2018.00472
– ident: ref49
  doi: 10.1109/TIV.2023.3298528
– ident: ref5
  doi: 10.1109/TIV.2023.3274536
– ident: ref3
  doi: 10.1109/TSMC.2019.2945053
– start-page: 652
  volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
  ident: ref35
  article-title: PointNet: Deep learning on point sets for 3D classification and segmentation
  contributor:
    fullname: Qi
– ident: ref61
  doi: 10.1109/JAS.2022.105893
– ident: ref75
  doi: 10.1109/ICCV.2017.324
– ident: ref58
  doi: 10.1016/j.trc.2023.104120
– year: 2022
  ident: ref65
  article-title: Scaling up kernels in 3D CNNs
  publication-title: arXiv:2206.10555
  contributor:
    fullname: Chen
– ident: ref63
  doi: 10.1016/j.displa.2022.102322
– ident: ref34
  doi: 10.1109/JAS.2023.123696
– ident: ref47
  doi: 10.1109/TSMC.2022.3228594
– ident: ref73
  doi: 10.1109/CVPR.2012.6248074
– ident: ref48
  doi: 10.1109/TIV.2023.3311949
– ident: ref6
  doi: 10.1109/IAVVC57316.2023.10328081
– ident: ref8
  doi: 10.1109/JAS.2023.123210
– ident: ref26
  doi: 10.1631/FITEE.2200323
– ident: ref59
  doi: 10.1109/TSMC.2023.3320750
– ident: ref16
  doi: 10.1109/TSMC.2020.3005231
– ident: ref72
  doi: 10.1109/CVPR52688.2022.00828
– ident: ref76
  doi: 10.1007/s11263-022-01710-9
– ident: ref62
  doi: 10.1109/ICCV.2019.00651
– year: 2022
  ident: ref14
  article-title: Assessment of US department of transportation lane-level map for connected vehicle applications
  publication-title: arXiv:2206.13774
  contributor:
    fullname: Hu
– ident: ref15
  doi: 10.1016/j.ifacol.2023.10.511
– ident: ref64
  doi: 10.1109/CVPR52688.2022.00535
– ident: ref4
  doi: 10.1109/TIV.2022.3225340
– ident: ref32
  doi: 10.1109/TSMC.2023.3325248
– ident: ref43
  doi: 10.1109/CVPR46437.2021.01161
– ident: ref71
  doi: 10.1007/978-3-031-20080-9_25
– year: 2017
  ident: ref38
  article-title: Submanifold sparse convolutional networks
  publication-title: arXiv:1706.01307
  contributor:
    fullname: Graham
– ident: ref11
  doi: 10.1109/TSMC.2023.3328319
– ident: ref20
  doi: 10.1109/OJITS.2022.3205504
– ident: ref30
  doi: 10.1109/TSMC.2022.3228314
SSID ssj0001286306
Score 2.3504014
Snippet The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the...
SourceID crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 6570
SubjectTerms 3-dimensional (3-D) object detection
automated driving
Convolution
Feature extraction
geometry-aware convolution (GA Conv)
Kernel
Object detection
Pedestrians
Point cloud compression
Roads
vulnerable road user (VRU) perception
Title Toward Foundation Models for Inclusive Object Detection: Geometry- and Category-Aware Feature Extraction Across Road User Categories
URI https://ieeexplore.ieee.org/document/10507865
Volume 54
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJz34OXF-kYMnobNpmrTxNubmEDZBN9itpGl6cbYyW1DP_uG-pB1WQfCSNqUJaV6a93sv7wOhC0poKHVojgZd6viBL5xQha4TyFQooSSwPKOHnEz5eO7fLdiidla3vjBaa2t8pnvm1p7lJ7kqjaoM_nBALyFnLdQKhKictRoKlZBTm0vTIxyoD2V9iklccTV7nAxAGvT8HshkLCDkBx9qJFaxfGW0g6brEVXmJE-9soh76uNXsMZ_D3kXbdcIE_erJbGHNnS2j7YacQcP0OfMGsvi75xK2OREW75igLAYtoxlaaza8X1stDT4RhfWYCu7xrc6f9bF6t3BMkvwwISZyKHWh-40NnCyhOvwrVhV_hK4b78eP-QywXNY7esmIJ930Hw0nA3GTp2OwVGAAwonTqmMmfR8pVwFRRxoL-VKSCoBA3lJypmgKRcASZgKYPcCcMcJSVySAoz0FD1E7SzP9BHCYQJMkWnG_RDEP19BFzIllEmfaekK2kWXa-JEL1XUjchKK66IDCUjQ8mopmQXdcy8N16spvz4j-cnaNM0rxwKT1G7WJX6DJBFEZ_bFfUFgiDKNA
link.rule.ids 315,783,787,799,27936,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELVYDsCBHbHjAyeklDiOHZtbVZaytEjQStwix3EuQIJKIgFnPpyxk4qChMTFWZSMHI_jeTOeBaFDSqhQRtitQZ96YRRKT2jhe5HKpJZagcizdshen3eH4dUDe2iC1V0sjDHGOZ-Zlj11e_lpoStrKoM_HNCL4GwazQKwFrwO15owqQhOXTXNgHDgP7TNPibx5fHgvtcBfTAIW6CVsYiQH5JoorSKkyznS6g_7lPtUPLYqsqkpT9-pWv8d6eX0WKDMXG7nhQraMrkq2hhIvPgGvocOHdZ_F1VCduqaE-vGEAshkXjqbJ-7fg2sXYafGpK57KVn-ALUzybcvTuYZWnuGMTTRRw1QZyBltAWcHx7K0c1RETuO2-Ht8VKsVDmO_jV0BDX0fD87NBp-s1BRk8DUig9JKMqoSpINTa19AkkQkyrqWiClBQkGacSZpxCaCE6QjWL4B3nJDUJxkAyUDTDTSTF7nZRFikIBaZYTwUoACGGkiojFCmQmaUL-kWOhozJ36p827ETl_xZWw5GVtOxg0nt9C6HfeJB-sh3_7j_gGa6w56N_HNZf96B81bUnV44S6aKUeV2QOcUSb7bnZ9AT8AzX8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Toward+Foundation+Models+for+Inclusive+Object+Detection%3A+Geometry-+and+Category-Aware+Feature+Extraction+Across+Road+User+Categories&rft.jtitle=IEEE+transactions+on+systems%2C+man%2C+and+cybernetics.+Systems&rft.au=Meng%2C+Zonglin&rft.au=Xia%2C+Xin&rft.au=Ma%2C+Jiaqi&rft.date=2024-11-01&rft.pub=IEEE&rft.issn=2168-2216&rft.volume=54&rft.issue=11&rft.spage=6570&rft.epage=6580&rft_id=info:doi/10.1109%2FTSMC.2024.3385711&rft.externalDocID=10507865
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2216&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2216&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2216&client=summon