How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing
As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance becomes increasingly urgent to ensure cost-effective data collection and model performance. In this work, we empirically investigate how phy...
Saved in:
Published in | Proceedings of the ... IEEE/RSJ International Conference on Intelligent Robots and Systems pp. 7391 - 7398 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
14.10.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 2153-0866 |
DOI | 10.1109/IROS58592.2024.10802583 |
Cover
Loading…
Abstract | As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance becomes increasingly urgent to ensure cost-effective data collection and model performance. In this work, we empirically investigate how physics attributes (color, friction coefficient, shape) and scene background characteristics, such as the complexity and dynamics of interactions with background objects, influence the performance of Video Transformers in predicting planar pushing trajectories. We investigate three primary questions: How do physics attributes and background scene characteristics influence model performance? What kind of changes in attributes are most detrimental to model generalization? What proportion of fine-tuning data is required to adapt models to novel scenarios? To facilitate this research, we present CloudGripper-Push-1K, a large real-world vision-based robot pushing dataset comprising 1278 hours and 460,000 videos of planar pushing interactions with objects with different physics and background attributes. We also propose Video Occlusion Transformer (VOT), a generic modular video-transformer-based trajectory prediction framework which features 3 choices of 2D-spatial encoders as the subject of our case study. The dataset and source code are available at https://cloudgripper.org. |
---|---|
AbstractList | As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance becomes increasingly urgent to ensure cost-effective data collection and model performance. In this work, we empirically investigate how physics attributes (color, friction coefficient, shape) and scene background characteristics, such as the complexity and dynamics of interactions with background objects, influence the performance of Video Transformers in predicting planar pushing trajectories. We investigate three primary questions: How do physics attributes and background scene characteristics influence model performance? What kind of changes in attributes are most detrimental to model generalization? What proportion of fine-tuning data is required to adapt models to novel scenarios? To facilitate this research, we present CloudGripper-Push-1K, a large real-world vision-based robot pushing dataset comprising 1278 hours and 460,000 videos of planar pushing interactions with objects with different physics and background attributes. We also propose Video Occlusion Transformer (VOT), a generic modular video-transformer-based trajectory prediction framework which features 3 choices of 2D-spatial encoders as the subject of our case study. The dataset and source code are available at https://cloudgripper.org. |
Author | Zahid, Muhammad Jin, Shutong Pokorny, Florian T. Wang, Ruiyu |
Author_xml | – sequence: 1 givenname: Shutong surname: Jin fullname: Jin, Shutong email: shutong@kth.se organization: KTH Royal Institute of Technology,School of Electrical Engineering and Computer Science – sequence: 2 givenname: Ruiyu surname: Wang fullname: Wang, Ruiyu organization: KTH Royal Institute of Technology,School of Electrical Engineering and Computer Science – sequence: 3 givenname: Muhammad surname: Zahid fullname: Zahid, Muhammad organization: KTH Royal Institute of Technology,School of Electrical Engineering and Computer Science – sequence: 4 givenname: Florian T. surname: Pokorny fullname: Pokorny, Florian T. email: fpokorny@kth.se organization: KTH Royal Institute of Technology,School of Electrical Engineering and Computer Science |
BookMark | eNo1kMFOAjEURavRRET-wMT-wOBrO9PpuEOiQoKBALolj2kHqtCSthPC30uibu49i5uzuLfkynlnCHlg0GcMqsfxfLooVFHxPgee9xko4IUSF6RXlZUSBYiyLKG4JB3OCpGBkvKG9GL8AgAG50klO-Q48kc6256irSNFp-kz1t-b4NszDlIKdt0mE-l4f8A60U-rjafLgC42PuxNiNQ6Ovdrn2xN39HZQ7vDZL17ogM6xGjoIrX6RL2jsx06DHTWxq11mzty3eAumt5fd8nH68tyOMom07fxcDDJLGd5ytaAqHldgQDdMK6NVFw2koOotMplWUsGeA4FWGho8lwCagkCwchSAhNdcv_rtcaY1SHYPYbT6v8r8QNjTWBi |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/IROS58592.2024.10802583 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Physics |
EISBN | 9798350377705 |
EISSN | 2153-0866 |
EndPage | 7398 |
ExternalDocumentID | 10802583 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IH 6IL 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i214t-b0aad2c9030df12de6826f62039d8467c610ac6180a5d0f4460ad603a0e676013 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:29:41 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i214t-b0aad2c9030df12de6826f62039d8467c610ac6180a5d0f4460ad603a0e676013 |
PageCount | 8 |
ParticipantIDs | ieee_primary_10802583 |
PublicationCentury | 2000 |
PublicationDate | 2024-Oct.-14 |
PublicationDateYYYYMMDD | 2024-10-14 |
PublicationDate_xml | – month: 10 year: 2024 text: 2024-Oct.-14 day: 14 |
PublicationDecade | 2020 |
PublicationTitle | Proceedings of the ... IEEE/RSJ International Conference on Intelligent Robots and Systems |
PublicationTitleAbbrev | IROS |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0001079896 |
Score | 2.2713387 |
Snippet | As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 7391 |
SubjectTerms | Adaptation models Data models Image color analysis Intelligent robots Physics Robot learning Shape Source coding Trajectory Transformers |
Title | How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing |
URI | https://ieeexplore.ieee.org/document/10802583 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1sQdCLWit-MwevSTdpkibearFUwVpqK72V2Y9AKSTSJhT99e5uEquC4CVsAoFls-y8mbz3hpAbhjxkMVenn-9Ly-MhtRDDyMJAoBdLl_GCIDsMBlPvcebPSrG60cJIKQ35TNp6aP7li5TnulTW0nw41w_bNVJTmVsh1toWVGgnCqOg5HA5NGo9jJ9fFBqOtN7K9ezq7R99VEwY6R-QYTWBgj2ytPOM2fzjlzfjv2d4SJpbxR6MvmLREdmRSYPsfzMbbJBdQ_bk62OyGaQbKO8AEwF3yJda3qGG3axogSXX8GAElPC6EDKFSQVwFVyERQLjlKVqz8ETJouqBdgtdKGnoiJocuI7pAnolki4glFuCl1NMu3fT3oDq2zAYC1cx8ssRhGFyyN1EIjYcYUMVDISBy5tR0LjFq6wF6pLSNEXNFaZJUUR0DZSGWiuTfuE1JM0kacEuGDYQcqYjGMNCkLHkX7H-O9p0z88I029mvO3wmNjXi3k-R_PL8ie_qg6ijjeJalnq1xeKXiQsWuzLT4BpY-6fQ |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFA06EfVFnRO_vQ--dku7tmt9m8Ox6TbH3MS3cfNRGINWto6hv94kXZ0Kgi8lLQRCGnJPbs45l5AbhjxgEVe7n-dJy-UBtRCD0EJfoBtJh_GMINvzWyP34dV7XYnVjRZGSmnIZ7Ksm-YuXyR8oVNlFc2Hc7ygukm2PK3GzeRa65QKrYVB6K9YXDYNK-3B07PCw6FWXDluOe__o5KKCSTNfdLLh5DxR6blRcrK_OOXO-O_x3hASmvNHvS_otEh2ZBxkex9sxsskm1D9-TzI7JsJUtYvQHGAu6QT7XAQzXraVYES86hbSSU8DIRMoFhDnEVYIRJDIOEJWrVQRfjSV4E7Bbq0FBxETQ98R2SGHRRJJxBf2FSXSUyat4PGy1rVYLBmji2m1qMIgqHh2orEJHtCOmr40jkO7QaCo1cuEJfqB4BRU_QSJ0tKQqfVpFKX7NtqsekECexPCHABcMaUsZkFGlYENi29GrGgU_b_uEpKenZHL9lLhvjfCLP_vh-TXZaw25n3Gn3Hs_Jrv7BOqbY7gUppLOFvFRgIWVXZol8AowIvcU |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE%2FRSJ+International+Conference+on+Intelligent+Robots+and+Systems&rft.atitle=How+Physics+and+Background+Attributes+Impact+Video+Transformers+in+Robotic+Manipulation%3A+A+Case+Study+on+Planar+Pushing&rft.au=Jin%2C+Shutong&rft.au=Wang%2C+Ruiyu&rft.au=Zahid%2C+Muhammad&rft.au=Pokorny%2C+Florian+T.&rft.date=2024-10-14&rft.pub=IEEE&rft.eissn=2153-0866&rft.spage=7391&rft.epage=7398&rft_id=info:doi/10.1109%2FIROS58592.2024.10802583&rft.externalDocID=10802583 |