How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing

As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance becomes increasingly urgent to ensure cost-effective data collection and model performance. In this work, we empirically investigate how phy...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the ... IEEE/RSJ International Conference on Intelligent Robots and Systems pp. 7391 - 7398
Main Authors Jin, Shutong, Wang, Ruiyu, Zahid, Muhammad, Pokorny, Florian T.
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.10.2024
Subjects
Online AccessGet full text
ISSN2153-0866
DOI10.1109/IROS58592.2024.10802583

Cover

Loading…
Abstract As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance becomes increasingly urgent to ensure cost-effective data collection and model performance. In this work, we empirically investigate how physics attributes (color, friction coefficient, shape) and scene background characteristics, such as the complexity and dynamics of interactions with background objects, influence the performance of Video Transformers in predicting planar pushing trajectories. We investigate three primary questions: How do physics attributes and background scene characteristics influence model performance? What kind of changes in attributes are most detrimental to model generalization? What proportion of fine-tuning data is required to adapt models to novel scenarios? To facilitate this research, we present CloudGripper-Push-1K, a large real-world vision-based robot pushing dataset comprising 1278 hours and 460,000 videos of planar pushing interactions with objects with different physics and background attributes. We also propose Video Occlusion Transformer (VOT), a generic modular video-transformer-based trajectory prediction framework which features 3 choices of 2D-spatial encoders as the subject of our case study. The dataset and source code are available at https://cloudgripper.org.
AbstractList As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance becomes increasingly urgent to ensure cost-effective data collection and model performance. In this work, we empirically investigate how physics attributes (color, friction coefficient, shape) and scene background characteristics, such as the complexity and dynamics of interactions with background objects, influence the performance of Video Transformers in predicting planar pushing trajectories. We investigate three primary questions: How do physics attributes and background scene characteristics influence model performance? What kind of changes in attributes are most detrimental to model generalization? What proportion of fine-tuning data is required to adapt models to novel scenarios? To facilitate this research, we present CloudGripper-Push-1K, a large real-world vision-based robot pushing dataset comprising 1278 hours and 460,000 videos of planar pushing interactions with objects with different physics and background attributes. We also propose Video Occlusion Transformer (VOT), a generic modular video-transformer-based trajectory prediction framework which features 3 choices of 2D-spatial encoders as the subject of our case study. The dataset and source code are available at https://cloudgripper.org.
Author Zahid, Muhammad
Jin, Shutong
Pokorny, Florian T.
Wang, Ruiyu
Author_xml – sequence: 1
  givenname: Shutong
  surname: Jin
  fullname: Jin, Shutong
  email: shutong@kth.se
  organization: KTH Royal Institute of Technology,School of Electrical Engineering and Computer Science
– sequence: 2
  givenname: Ruiyu
  surname: Wang
  fullname: Wang, Ruiyu
  organization: KTH Royal Institute of Technology,School of Electrical Engineering and Computer Science
– sequence: 3
  givenname: Muhammad
  surname: Zahid
  fullname: Zahid, Muhammad
  organization: KTH Royal Institute of Technology,School of Electrical Engineering and Computer Science
– sequence: 4
  givenname: Florian T.
  surname: Pokorny
  fullname: Pokorny, Florian T.
  email: fpokorny@kth.se
  organization: KTH Royal Institute of Technology,School of Electrical Engineering and Computer Science
BookMark eNo1kMFOAjEURavRRET-wMT-wOBrO9PpuEOiQoKBALolj2kHqtCSthPC30uibu49i5uzuLfkynlnCHlg0GcMqsfxfLooVFHxPgee9xko4IUSF6RXlZUSBYiyLKG4JB3OCpGBkvKG9GL8AgAG50klO-Q48kc6256irSNFp-kz1t-b4NszDlIKdt0mE-l4f8A60U-rjafLgC42PuxNiNQ6Ovdrn2xN39HZQ7vDZL17ogM6xGjoIrX6RL2jsx06DHTWxq11mzty3eAumt5fd8nH68tyOMom07fxcDDJLGd5ytaAqHldgQDdMK6NVFw2koOotMplWUsGeA4FWGho8lwCagkCwchSAhNdcv_rtcaY1SHYPYbT6v8r8QNjTWBi
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/IROS58592.2024.10802583
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Physics
EISBN 9798350377705
EISSN 2153-0866
EndPage 7398
ExternalDocumentID 10802583
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i214t-b0aad2c9030df12de6826f62039d8467c610ac6180a5d0f4460ad603a0e676013
IEDL.DBID RIE
IngestDate Wed Aug 27 02:29:41 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i214t-b0aad2c9030df12de6826f62039d8467c610ac6180a5d0f4460ad603a0e676013
PageCount 8
ParticipantIDs ieee_primary_10802583
PublicationCentury 2000
PublicationDate 2024-Oct.-14
PublicationDateYYYYMMDD 2024-10-14
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE/RSJ International Conference on Intelligent Robots and Systems
PublicationTitleAbbrev IROS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001079896
Score 2.2713387
Snippet As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance...
SourceID ieee
SourceType Publisher
StartPage 7391
SubjectTerms Adaptation models
Data models
Image color analysis
Intelligent robots
Physics
Robot learning
Shape
Source coding
Trajectory
Transformers
Title How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing
URI https://ieeexplore.ieee.org/document/10802583
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1sQdCLWit-MwevSTdpkibearFUwVpqK72V2Y9AKSTSJhT99e5uEquC4CVsAoFls-y8mbz3hpAbhjxkMVenn-9Ly-MhtRDDyMJAoBdLl_GCIDsMBlPvcebPSrG60cJIKQ35TNp6aP7li5TnulTW0nw41w_bNVJTmVsh1toWVGgnCqOg5HA5NGo9jJ9fFBqOtN7K9ezq7R99VEwY6R-QYTWBgj2ytPOM2fzjlzfjv2d4SJpbxR6MvmLREdmRSYPsfzMbbJBdQ_bk62OyGaQbKO8AEwF3yJda3qGG3axogSXX8GAElPC6EDKFSQVwFVyERQLjlKVqz8ETJouqBdgtdKGnoiJocuI7pAnolki4glFuCl1NMu3fT3oDq2zAYC1cx8ssRhGFyyN1EIjYcYUMVDISBy5tR0LjFq6wF6pLSNEXNFaZJUUR0DZSGWiuTfuE1JM0kacEuGDYQcqYjGMNCkLHkX7H-O9p0z88I029mvO3wmNjXi3k-R_PL8ie_qg6ijjeJalnq1xeKXiQsWuzLT4BpY-6fQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFA06EfVFnRO_vQ--dku7tmt9m8Ox6TbH3MS3cfNRGINWto6hv94kXZ0Kgi8lLQRCGnJPbs45l5AbhjxgEVe7n-dJy-UBtRCD0EJfoBtJh_GMINvzWyP34dV7XYnVjRZGSmnIZ7Ksm-YuXyR8oVNlFc2Hc7ygukm2PK3GzeRa65QKrYVB6K9YXDYNK-3B07PCw6FWXDluOe__o5KKCSTNfdLLh5DxR6blRcrK_OOXO-O_x3hASmvNHvS_otEh2ZBxkex9sxsskm1D9-TzI7JsJUtYvQHGAu6QT7XAQzXraVYES86hbSSU8DIRMoFhDnEVYIRJDIOEJWrVQRfjSV4E7Bbq0FBxETQ98R2SGHRRJJxBf2FSXSUyat4PGy1rVYLBmji2m1qMIgqHh2orEJHtCOmr40jkO7QaCo1cuEJfqB4BRU_QSJ0tKQqfVpFKX7NtqsekECexPCHABcMaUsZkFGlYENi29GrGgU_b_uEpKenZHL9lLhvjfCLP_vh-TXZaw25n3Gn3Hs_Jrv7BOqbY7gUppLOFvFRgIWVXZol8AowIvcU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE%2FRSJ+International+Conference+on+Intelligent+Robots+and+Systems&rft.atitle=How+Physics+and+Background+Attributes+Impact+Video+Transformers+in+Robotic+Manipulation%3A+A+Case+Study+on+Planar+Pushing&rft.au=Jin%2C+Shutong&rft.au=Wang%2C+Ruiyu&rft.au=Zahid%2C+Muhammad&rft.au=Pokorny%2C+Florian+T.&rft.date=2024-10-14&rft.pub=IEEE&rft.eissn=2153-0866&rft.spage=7391&rft.epage=7398&rft_id=info:doi/10.1109%2FIROS58592.2024.10802583&rft.externalDocID=10802583