Toward Foundation Models for Inclusive Object Detection: Geometry- and Category-Aware Feature Extraction Across Road User Categories
The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the priorities of automated driving and smart infrastructure services. Three-dimensional (3-D) LiDAR-based object detection has been a promising...
Saved in:
Published in | IEEE transactions on systems, man, and cybernetics. Systems Vol. 54; no. 11; pp. 6570 - 6580 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the priorities of automated driving and smart infrastructure services. Three-dimensional (3-D) LiDAR-based object detection has been a promising approach to perceiving road users. Despite accurate 3-D geometry information, the point cloud from LiDAR is usually nonuniform, and learning the effective point cloud abstract representations for diverse road users remains challenging for 3-D object detection, particularly for small objects such as VRUs. For inclusive object detection (IDetect), we propose a general foundation convolution component, called geometry-aware convolution (GA Conv) toward a foundation feature extraction model, to serve as basic convolution operations of the neutral network for inclusive 3-D object detection. Further, the GA Conv operations are then utilized as the elementary feature extraction layers to build a novel elegant and pyramid network for IDetect. It learns the effective geometric-related features from the unstructured point cloud data by implicitly learning the distribution property and geometry-related features from different categories of road users in particular for VRUs. The proposed IDetect is comprehensively evaluated on the large-scale benchmark Waymo open datasets with all categories of road users. The qualitative and quantitative experiment results demonstrate that IDetect can effectively consider the nonuniform distributed point clouds and learn the geometric features to assist the different categories of road user detection. In addition, the GA Conv has been integrated with other state-of-the-art neural networks and a performance boost for VRU detection has been demonstrated, showing the foundation functionality of the GA Conv and making it a general component in the future inclusive 3-D object detection foundation model. |
---|---|
AbstractList | The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the priorities of automated driving and smart infrastructure services. Three-dimensional (3-D) LiDAR-based object detection has been a promising approach to perceiving road users. Despite accurate 3-D geometry information, the point cloud from LiDAR is usually nonuniform, and learning the effective point cloud abstract representations for diverse road users remains challenging for 3-D object detection, particularly for small objects such as VRUs. For inclusive object detection (IDetect), we propose a general foundation convolution component, called geometry-aware convolution (GA Conv) toward a foundation feature extraction model, to serve as basic convolution operations of the neutral network for inclusive 3-D object detection. Further, the GA Conv operations are then utilized as the elementary feature extraction layers to build a novel elegant and pyramid network for IDetect. It learns the effective geometric-related features from the unstructured point cloud data by implicitly learning the distribution property and geometry-related features from different categories of road users in particular for VRUs. The proposed IDetect is comprehensively evaluated on the large-scale benchmark Waymo open datasets with all categories of road users. The qualitative and quantitative experiment results demonstrate that IDetect can effectively consider the nonuniform distributed point clouds and learn the geometric features to assist the different categories of road user detection. In addition, the GA Conv has been integrated with other state-of-the-art neural networks and a performance boost for VRU detection has been demonstrated, showing the foundation functionality of the GA Conv and making it a general component in the future inclusive 3-D object detection foundation model. |
Author | Meng, Zonglin Ma, Jiaqi Xia, Xin |
Author_xml | – sequence: 1 givenname: Zonglin orcidid: 0000-0002-0592-0135 surname: Meng fullname: Meng, Zonglin organization: Department of Civil and Environmental Engineering, University of California, Los Angeles, CA, USA – sequence: 2 givenname: Xin orcidid: 0000-0002-5108-7578 surname: Xia fullname: Xia, Xin email: x35xia@ucla.edu organization: Department of Civil and Environmental Engineering, University of California, Los Angeles, CA, USA – sequence: 3 givenname: Jiaqi orcidid: 0000-0002-8184-5157 surname: Ma fullname: Ma, Jiaqi organization: Department of Civil and Environmental Engineering, University of California, Los Angeles, CA, USA |
BookMark | eNpNkFFPwjAUhRuDiYj8ABMf-geGvS3rNt_IBCSBkCg8L113a0ZgNW1RefeHO0CNL_fcm5xzcvNdk05jGyTkFtgAgGX3q5dFPuCMDwdCpHECcEG6HGQacS54528HeUX63m8YY8BTKZjskq-V_VCuohO7byoVatvQha1w66mxjs4avd37-h3pstygDvQRQyut64FO0e4wuENEVVPRXAV8te01auuQTlCFfavjz-DUKUBH2lnv6bNVFV17dL-RGv0NuTRq67H_oz2ynoxX-VM0X05n-WgeaQ5piEojVBkrPtSa6XaUCXIjdaaE4onglZFxJozMhgJinbA0TUUsASoGBjLOtegROPeeXnFoijdX75Q7FMCKI8niSLI4kix-SLaZu3OmRsR__pglqYzFN1K6cts |
CODEN | ITSMFE |
Cites_doi | 10.1109/OJITS.2022.3155126 10.1109/tiv.2023.3326169 10.1109/TSMC.2022.3226748 10.1109/OJITS.2021.3139393 10.1109/TSMC.2023.3301662 10.1109/CVPR46437.2021.00567 10.1109/JIOT.2023.3307002 10.1109/TIV.2023.3285417 10.1109/OJITS.2022.3194879 10.1609/aaai.v36i1.19980 10.1631/fitee.2100418 10.1109/TPEL.2023.3341797 10.3390/s18103337 10.1109/JAS.2022.105878 10.1109/CVPR52688.2022.00827 10.1109/TSMC.2020.3042823 10.1109/TIV.2023.3244948 10.1109/JSEN.2022.3230947 10.1109/CVPR42600.2020.01054 10.1109/TMECH.2023.3236245 10.1109/CVPR42600.2020.00252 10.1109/OJITS.2022.3169700 10.1109/TIV.2023.3282567 10.1109/CVPR.2019.00086 10.1109/JAS.2020.1003453 10.1109/CVPR52729.2023.01318 10.1609/aaai.v35i2.16207 10.1109/CVPR.2018.00961 10.1109/JAS.2023.123447 10.1109/JAS.2020.1003414 10.1109/TSMC.2023.3327142 10.1109/ojits.2021.3109423 10.1109/TSMC.2023.3324797 10.1109/OJITS.2022.3176471 10.1631/FITEE.2200380 10.1109/TPAMI.2021.3104172 10.1109/CVPR52688.2022.00823 10.1109/OJITS.2022.3214094 10.1109/TIV.2023.3260040 10.1109/CVPR.2019.01298 10.1109/OJITS.2022.3160888 10.1109/TSMC.2023.3301881 10.1109/CVPR.2018.00472 10.1109/TIV.2023.3298528 10.1109/TIV.2023.3274536 10.1109/TSMC.2019.2945053 10.1109/JAS.2022.105893 10.1109/ICCV.2017.324 10.1016/j.trc.2023.104120 10.1016/j.displa.2022.102322 10.1109/JAS.2023.123696 10.1109/TSMC.2022.3228594 10.1109/CVPR.2012.6248074 10.1109/TIV.2023.3311949 10.1109/IAVVC57316.2023.10328081 10.1109/JAS.2023.123210 10.1631/FITEE.2200323 10.1109/TSMC.2023.3320750 10.1109/TSMC.2020.3005231 10.1109/CVPR52688.2022.00828 10.1007/s11263-022-01710-9 10.1109/ICCV.2019.00651 10.1016/j.ifacol.2023.10.511 10.1109/CVPR52688.2022.00535 10.1109/TIV.2022.3225340 10.1109/TSMC.2023.3325248 10.1109/CVPR46437.2021.01161 10.1007/978-3-031-20080-9_25 10.1109/TSMC.2023.3328319 10.1109/OJITS.2022.3205504 10.1109/TSMC.2022.3228314 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TSMC.2024.3385711 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library Online CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEL url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2168-2232 |
EndPage | 6580 |
ExternalDocumentID | 10_1109_TSMC_2024_3385711 10507865 |
Genre | orig-research |
GrantInformation_xml | – fundername: RIMI: Infrastructure-Assisted Automation – fundername: Federal Highway Administration Center of Excellence on New Mobility and Automated Vehicle Program – fundername: Cooperative Perception and Control for Freeway Traffic System Operations – fundername: Federal Highway Administration Exploratory Advanced Research (EAR) Program – fundername: California Statewide Transportation Research Program (SB 1) Program |
GroupedDBID | 0R~ 6IK 97E AAJGR AASAJ ABQJQ ACGFS ACIWK AKJIK ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE RIG RNS AAYXX CITATION |
ID | FETCH-LOGICAL-c218t-bf3ab5a24cc0c4ccb7e2f6c9a3a2732df6593f694315c7088835611d01f1922c3 |
IEDL.DBID | RIE |
ISSN | 2168-2216 |
IngestDate | Wed Oct 23 14:17:12 EDT 2024 Wed Oct 23 05:52:15 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c218t-bf3ab5a24cc0c4ccb7e2f6c9a3a2732df6593f694315c7088835611d01f1922c3 |
ORCID | 0000-0002-0592-0135 0000-0002-5108-7578 0000-0002-8184-5157 |
PageCount | 11 |
ParticipantIDs | crossref_primary_10_1109_TSMC_2024_3385711 ieee_primary_10507865 |
PublicationCentury | 2000 |
PublicationDate | 2024-Nov. |
PublicationDateYYYYMMDD | 2024-11-01 |
PublicationDate_xml | – month: 11 year: 2024 text: 2024-Nov. |
PublicationDecade | 2020 |
PublicationTitle | IEEE transactions on systems, man, and cybernetics. Systems |
PublicationTitleAbbrev | TSMC |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref13 ref57 ref12 ref56 ref15 ref59 ref58 ref53 ref52 ref11 ref55 ref10 ref54 ref17 ref16 ref19 Hu (ref14) 2022 ref18 ref51 ref50 ref46 ref45 ref48 ref47 ref42 ref41 ref44 ref43 ref49 ref8 ref7 ref9 ref4 ref3 ref6 Chen (ref65) 2022 ref5 ref40 ref34 ref37 ref31 ref75 ref30 ref74 ref33 ref32 Qi (ref36) ref76 ref2 ref1 ref39 ref71 ref70 ref73 ref72 ref24 Qi (ref35) ref68 ref23 ref67 ref26 ref25 ref69 ref20 ref64 ref63 ref22 Graham (ref38) 2017 ref66 ref21 ref28 ref27 ref29 ref60 ref62 ref61 |
References_xml | – ident: ref19 doi: 10.1109/OJITS.2022.3155126 – ident: ref52 doi: 10.1109/tiv.2023.3326169 – ident: ref54 doi: 10.1109/TSMC.2022.3226748 – ident: ref10 doi: 10.1109/OJITS.2021.3139393 – ident: ref29 doi: 10.1109/TSMC.2023.3301662 – ident: ref46 doi: 10.1109/CVPR46437.2021.00567 – ident: ref9 doi: 10.1109/JIOT.2023.3307002 – ident: ref21 doi: 10.1109/TIV.2023.3285417 – start-page: 1 volume-title: Proc. 31st Conf. Neural Inf. Process. Syst. ident: ref36 article-title: Pointnet++: Deep hierarchical feature learning on point sets in a metric space contributor: fullname: Qi – ident: ref18 doi: 10.1109/OJITS.2022.3194879 – ident: ref70 doi: 10.1609/aaai.v36i1.19980 – ident: ref1 doi: 10.1631/fitee.2100418 – ident: ref2 doi: 10.1109/TPEL.2023.3341797 – ident: ref40 doi: 10.3390/s18103337 – ident: ref12 doi: 10.1109/JAS.2022.105878 – ident: ref67 doi: 10.1109/CVPR52688.2022.00827 – ident: ref33 doi: 10.1109/TSMC.2020.3042823 – ident: ref50 doi: 10.1109/TIV.2023.3244948 – ident: ref66 doi: 10.1109/JSEN.2022.3230947 – ident: ref45 doi: 10.1109/CVPR42600.2020.01054 – ident: ref7 doi: 10.1109/TMECH.2023.3236245 – ident: ref74 doi: 10.1109/CVPR42600.2020.00252 – ident: ref17 doi: 10.1109/OJITS.2022.3169700 – ident: ref57 doi: 10.1109/TIV.2023.3282567 – ident: ref37 doi: 10.1109/CVPR.2019.00086 – ident: ref13 doi: 10.1109/JAS.2020.1003453 – ident: ref51 doi: 10.1109/CVPR52729.2023.01318 – ident: ref42 doi: 10.1609/aaai.v35i2.16207 – ident: ref44 doi: 10.1109/CVPR.2018.00961 – ident: ref53 doi: 10.1109/JAS.2023.123447 – ident: ref60 doi: 10.1109/JAS.2020.1003414 – ident: ref28 doi: 10.1109/TSMC.2023.3327142 – ident: ref25 doi: 10.1109/ojits.2021.3109423 – ident: ref31 doi: 10.1109/TSMC.2023.3324797 – ident: ref55 doi: 10.1109/OJITS.2022.3176471 – ident: ref27 doi: 10.1631/FITEE.2200380 – ident: ref69 doi: 10.1109/TPAMI.2021.3104172 – ident: ref68 doi: 10.1109/CVPR52688.2022.00823 – ident: ref22 doi: 10.1109/OJITS.2022.3214094 – ident: ref56 doi: 10.1109/TIV.2023.3260040 – ident: ref39 doi: 10.1109/CVPR.2019.01298 – ident: ref23 doi: 10.1109/OJITS.2022.3160888 – ident: ref24 doi: 10.1109/TSMC.2023.3301881 – ident: ref41 doi: 10.1109/CVPR.2018.00472 – ident: ref49 doi: 10.1109/TIV.2023.3298528 – ident: ref5 doi: 10.1109/TIV.2023.3274536 – ident: ref3 doi: 10.1109/TSMC.2019.2945053 – start-page: 652 volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. ident: ref35 article-title: PointNet: Deep learning on point sets for 3D classification and segmentation contributor: fullname: Qi – ident: ref61 doi: 10.1109/JAS.2022.105893 – ident: ref75 doi: 10.1109/ICCV.2017.324 – ident: ref58 doi: 10.1016/j.trc.2023.104120 – year: 2022 ident: ref65 article-title: Scaling up kernels in 3D CNNs publication-title: arXiv:2206.10555 contributor: fullname: Chen – ident: ref63 doi: 10.1016/j.displa.2022.102322 – ident: ref34 doi: 10.1109/JAS.2023.123696 – ident: ref47 doi: 10.1109/TSMC.2022.3228594 – ident: ref73 doi: 10.1109/CVPR.2012.6248074 – ident: ref48 doi: 10.1109/TIV.2023.3311949 – ident: ref6 doi: 10.1109/IAVVC57316.2023.10328081 – ident: ref8 doi: 10.1109/JAS.2023.123210 – ident: ref26 doi: 10.1631/FITEE.2200323 – ident: ref59 doi: 10.1109/TSMC.2023.3320750 – ident: ref16 doi: 10.1109/TSMC.2020.3005231 – ident: ref72 doi: 10.1109/CVPR52688.2022.00828 – ident: ref76 doi: 10.1007/s11263-022-01710-9 – ident: ref62 doi: 10.1109/ICCV.2019.00651 – year: 2022 ident: ref14 article-title: Assessment of US department of transportation lane-level map for connected vehicle applications publication-title: arXiv:2206.13774 contributor: fullname: Hu – ident: ref15 doi: 10.1016/j.ifacol.2023.10.511 – ident: ref64 doi: 10.1109/CVPR52688.2022.00535 – ident: ref4 doi: 10.1109/TIV.2022.3225340 – ident: ref32 doi: 10.1109/TSMC.2023.3325248 – ident: ref43 doi: 10.1109/CVPR46437.2021.01161 – ident: ref71 doi: 10.1007/978-3-031-20080-9_25 – year: 2017 ident: ref38 article-title: Submanifold sparse convolutional networks publication-title: arXiv:1706.01307 contributor: fullname: Graham – ident: ref11 doi: 10.1109/TSMC.2023.3328319 – ident: ref20 doi: 10.1109/OJITS.2022.3205504 – ident: ref30 doi: 10.1109/TSMC.2022.3228314 |
SSID | ssj0001286306 |
Score | 2.3504014 |
Snippet | The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the... |
SourceID | crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 6570 |
SubjectTerms | 3-dimensional (3-D) object detection automated driving Convolution Feature extraction geometry-aware convolution (GA Conv) Kernel Object detection Pedestrians Point cloud compression Roads vulnerable road user (VRU) perception |
Title | Toward Foundation Models for Inclusive Object Detection: Geometry- and Category-Aware Feature Extraction Across Road User Categories |
URI | https://ieeexplore.ieee.org/document/10507865 |
Volume | 54 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJz34OXF-kYMnobNpmrTxNubmEDZBN9itpGl6cbYyW1DP_uG-pB1WQfCSNqUJaV6a93sv7wOhC0poKHVojgZd6viBL5xQha4TyFQooSSwPKOHnEz5eO7fLdiidla3vjBaa2t8pnvm1p7lJ7kqjaoM_nBALyFnLdQKhKictRoKlZBTm0vTIxyoD2V9iklccTV7nAxAGvT8HshkLCDkBx9qJFaxfGW0g6brEVXmJE-9soh76uNXsMZ_D3kXbdcIE_erJbGHNnS2j7YacQcP0OfMGsvi75xK2OREW75igLAYtoxlaaza8X1stDT4RhfWYCu7xrc6f9bF6t3BMkvwwISZyKHWh-40NnCyhOvwrVhV_hK4b78eP-QywXNY7esmIJ930Hw0nA3GTp2OwVGAAwonTqmMmfR8pVwFRRxoL-VKSCoBA3lJypmgKRcASZgKYPcCcMcJSVySAoz0FD1E7SzP9BHCYQJMkWnG_RDEP19BFzIllEmfaekK2kWXa-JEL1XUjchKK66IDCUjQ8mopmQXdcy8N16spvz4j-cnaNM0rxwKT1G7WJX6DJBFEZ_bFfUFgiDKNA |
link.rule.ids | 315,783,787,799,27936,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELVYDsCBHbHjAyeklDiOHZtbVZaytEjQStwix3EuQIJKIgFnPpyxk4qChMTFWZSMHI_jeTOeBaFDSqhQRtitQZ96YRRKT2jhe5HKpJZagcizdshen3eH4dUDe2iC1V0sjDHGOZ-Zlj11e_lpoStrKoM_HNCL4GwazQKwFrwO15owqQhOXTXNgHDgP7TNPibx5fHgvtcBfTAIW6CVsYiQH5JoorSKkyznS6g_7lPtUPLYqsqkpT9-pWv8d6eX0WKDMXG7nhQraMrkq2hhIvPgGvocOHdZ_F1VCduqaE-vGEAshkXjqbJ-7fg2sXYafGpK57KVn-ALUzybcvTuYZWnuGMTTRRw1QZyBltAWcHx7K0c1RETuO2-Ht8VKsVDmO_jV0BDX0fD87NBp-s1BRk8DUig9JKMqoSpINTa19AkkQkyrqWiClBQkGacSZpxCaCE6QjWL4B3nJDUJxkAyUDTDTSTF7nZRFikIBaZYTwUoACGGkiojFCmQmaUL-kWOhozJ36p827ETl_xZWw5GVtOxg0nt9C6HfeJB-sh3_7j_gGa6w56N_HNZf96B81bUnV44S6aKUeV2QOcUSb7bnZ9AT8AzX8 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Toward+Foundation+Models+for+Inclusive+Object+Detection%3A+Geometry-+and+Category-Aware+Feature+Extraction+Across+Road+User+Categories&rft.jtitle=IEEE+transactions+on+systems%2C+man%2C+and+cybernetics.+Systems&rft.au=Meng%2C+Zonglin&rft.au=Xia%2C+Xin&rft.au=Ma%2C+Jiaqi&rft.date=2024-11-01&rft.pub=IEEE&rft.issn=2168-2216&rft.volume=54&rft.issue=11&rft.spage=6570&rft.epage=6580&rft_id=info:doi/10.1109%2FTSMC.2024.3385711&rft.externalDocID=10507865 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2216&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2216&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2216&client=summon |