Do Vision and Language Models Share Concepts? A Vector Space Alignment Study

Large-scale pretrained language models (LMs) are said to “lack the ability to connect utterances to the world” (Bender and Koller, ), because they do not have “mental models of the world” (Mitchell and Krakauer, ). If so, one would expect LM representations to be unrelated to representations induced...

Full description

Saved in:
Bibliographic Details
Published inTransactions of the Association for Computational Linguistics Vol. 12; pp. 1232 - 1249
Main Authors Li, Jiaang, Kementchedjhieva, Yova, Fierro, Constanza, Søgaard, Anders
Format Journal Article
LanguageEnglish
Published 255 Main Street, 9th Floor, Cambridge, Massachusetts 02142, USA MIT Press 30.09.2024
The MIT Press
Online AccessGet full text
ISSN2307-387X
2307-387X
DOI10.1162/tacl_a_00698

Cover

Abstract Large-scale pretrained language models (LMs) are said to “lack the ability to connect utterances to the world” (Bender and Koller, ), because they do not have “mental models of the world” (Mitchell and Krakauer, ). If so, one would expect LM representations to be unrelated to representations induced by vision models. We present an empirical evaluation across four families of LMs (BERT, GPT-2, OPT, and LLaMA-2) and three vision model architectures (ResNet, SegFormer, and MAE). Our experiments show that LMs partially converge towards representations isomorphic to those of vision models, subject to dispersion, polysemy, and frequency. This has important implications for multi-modal processing and the LM understanding debate (Mitchell and Krakauer, ).
AbstractList Large-scale pretrained language models (LMs) are said to “lack the ability to connect utterances to the world” (Bender and Koller, 2020), because they do not have “mental models of the world” (Mitchell and Krakauer, 2023). If so, one would expect LM representations to be unrelated to representations induced by vision models. We present an empirical evaluation across four families of LMs (BERT, GPT-2, OPT, and LLaMA-2) and three vision model architectures (ResNet, SegFormer, and MAE). Our experiments show that LMs partially converge towards representations isomorphic to those of vision models, subject to dispersion, polysemy, and frequency. This has important implications for both multi-modal processing and the LM understanding debate (Mitchell and Krakauer, 2023).1
Large-scale pretrained language models (LMs) are said to “lack the ability to connect utterances to the world” (Bender and Koller, ), because they do not have “mental models of the world” (Mitchell and Krakauer, ). If so, one would expect LM representations to be unrelated to representations induced by vision models. We present an empirical evaluation across four families of LMs (BERT, GPT-2, OPT, and LLaMA-2) and three vision model architectures (ResNet, SegFormer, and MAE). Our experiments show that LMs partially converge towards representations isomorphic to those of vision models, subject to dispersion, polysemy, and frequency. This has important implications for multi-modal processing and the LM understanding debate (Mitchell and Krakauer, ).
Author Fierro, Constanza
Søgaard, Anders
Li, Jiaang
Kementchedjhieva, Yova
Author_xml – sequence: 1
  givenname: Jiaang
  surname: Li
  fullname: Li, Jiaang
  email: jili@di.ku.dk
  organization: University of Copenhagen, Denmark. jili@di.ku.dk
– sequence: 2
  givenname: Yova
  surname: Kementchedjhieva
  fullname: Kementchedjhieva, Yova
  email: yova.kementchedjhieva@mbzuai.ac.ae
  organization: Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates. yova.kementchedjhieva@mbzuai.ac.ae
– sequence: 3
  givenname: Constanza
  surname: Fierro
  fullname: Fierro, Constanza
  email: c.fierro@di.ku.dk
  organization: University of Copenhagen, Denmark. c.fierro@di.ku.dk
– sequence: 4
  givenname: Anders
  surname: Søgaard
  fullname: Søgaard, Anders
  organization: University of Copenhagen, Denmark. soegaard@di.ku.dk
BookMark eNp1kE1LAzEURYMoWGt3_oAsXVjNZGaSdCWlfkLFRUtxF94kb2rKNCmZdNF_72hFiujqPS73nsU5I8c-eCTkImPXWSb4TQLTaNCMiZE6Ij2eMznMlXw7PvhPyaBtV4yxTGWKCd4j07tAF651wVPwlk7BL7ewRPoSLDYtnb1DRDoJ3uAmtbd0TBdoUoh0tgGDdNy4pV-jT3SWtnZ3Tk5qaFocfN8-mT_czydPw-nr4_NkPB2aXPI0BJRmJAQU3CiUspBVJWspMlHVCkydlwwq20UF8rKsVAVYFyqXoCpkxpR5nzzvsTbASm-iW0Pc6QBOfwUhLjXE5EyDGlFZhsIKZUeFFKhKqGpeS17KUW4sdKyrPcvE0LYR6x9exvSnV33otavzX3XjEqROX4rgmv9Gl_vR2iW9CtvoOzl_Vz8AToyNIw
CitedBy_id crossref_primary_10_3390_e26121092
Cites_doi 10.1007/s11023-017-9441-6
10.18653/v1/D15-1015
10.1093/oso/9780192894724.001.0001
10.1073/pnas.2215907120
10.1017/S0140525X00005756
10.1162/coli_a_00522
10.1109/CVPR.2016.90
10.1038/s42003-022-03036-1
10.18653/v1/P19-2021
10.1109/ICCV.2015.11
10.1007/s11263-015-0816-y
10.1007/978-90-481-8847-5_10
10.1609/aaai.v35i14.17524
10.1101/2020.12.02.403477
10.1016/j.artint.2012.07.001
10.1162/nol_a_00003
10.18653/v1/P18-1072
10.18653/v1/2020.acl-main.675
10.3115/v1/P14-1132
10.1162/nol_a_00087
10.1109/CVPR52688.2022.01553
10.48550/arXiv.2111.14232
10.1111/j.1746-8361.2004.tb00293.x
10.1101/2022.03.17.484712
10.1073/pnas.1907367117
10.1007/BF02289451
10.18653/v1/2020.emnlp-demos.6
10.3389/frai.2021.682578
10.18653/v1/2021.conll-1.9
10.18653/v1/2022.bigscience-1.11
10.1109/CVPR.2017.544
10.18653/v1/W18-3021
10.1101/407007
10.1073/pnas.2105646118
10.3115/v1/D14-1005
10.18653/v1/2020.acl-main.463
10.1093/oso/9780198812883.001.0001
10.1109/ICCV48922.2021.00951
10.18653/v1/D18-1047
10.1007/s10670-021-00491-w
10.1023/a:1013765011735
10.18653/v1/P16-2031
10.18653/v1/D18-1056
10.18653/v1/P18-1073
10.18653/v1/D18-1043
10.5840/monist199881427
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.1162/tacl_a_00698
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

Database_xml – sequence: 1
  dbid: DOA
  name: Open Access资源_DOAJ
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
EISSN 2307-387X
EndPage 1249
ExternalDocumentID oai_doaj_org_article_ee8d0e6d68d9476e85abf2f725793cda
10_1162_tacl_a_00698
tacl_a_00698.pdf
GroupedDBID AAFWJ
ABUWG
AFKRA
AFPKN
ALMA_UNASSIGNED_HOLDINGS
ALSLI
ARAPS
BENPR
BGLVJ
CCPQU
CPGLG
CRLPW
DWQXO
EBS
GROUPED_DOAJ
HCIFZ
JMNJE
K7-
M~E
OJV
OK1
PHGZT
PIMPY
RMI
AAYXX
CITATION
PHGZM
PQGLB
PRQQA
PUEGO
ID FETCH-LOGICAL-c372t-ae7c966a42c8e7747bb7f7616bf8acf350abdb7f4e255b8baef4837a8be0cc53
IEDL.DBID DOA
ISSN 2307-387X
IngestDate Wed Aug 27 01:18:24 EDT 2025
Tue Jul 01 03:28:36 EDT 2025
Thu Apr 24 23:08:34 EDT 2025
Thu Apr 10 09:08:59 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c372t-ae7c966a42c8e7747bb7f7616bf8acf350abdb7f4e255b8baef4837a8be0cc53
Notes 2024
OpenAccessLink https://doaj.org/article/ee8d0e6d68d9476e85abf2f725793cda
PageCount 18
ParticipantIDs crossref_primary_10_1162_tacl_a_00698
doaj_primary_oai_doaj_org_article_ee8d0e6d68d9476e85abf2f725793cda
mit_journals_10_1162_tacl_a_00698
crossref_citationtrail_10_1162_tacl_a_00698
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-09-30
PublicationDateYYYYMMDD 2024-09-30
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-30
  day: 30
PublicationDecade 2020
PublicationPlace 255 Main Street, 9th Floor, Cambridge, Massachusetts 02142, USA
PublicationPlace_xml – name: 255 Main Street, 9th Floor, Cambridge, Massachusetts 02142, USA
PublicationTitle Transactions of the Association for Computational Linguistics
PublicationYear 2024
Publisher MIT Press
The MIT Press
Publisher_xml – name: MIT Press
– name: The MIT Press
References Fellbaum (2024100217173473200_bib16) 2010
Kiela (2024100217173473200_bib28) 2014
Orhan (2024100217173473200_bib43) 2020
Sahlgren (2024100217173473200_bib50) 2021; 4
Cappelen (2024100217173473200_bib9) 2021
Vulić (2024100217173473200_bib62) 2016
Sassenhagen (2024100217173473200_bib51) 2020; 1
Minnema (2024100217173473200_bib38) 2019
Garneau (2024100217173473200_bib18) 2021; 35
Caucheteux (2024100217173473200_bib11) 2022
Zou (2024100217173473200_bib71) 2023
Bergsma (2024100217173473200_bib5) 2011
Hoshen (2024100217173473200_bib26) 2018
Xie (2024100217173473200_bib66) 2021
He (2024100217173473200_bib24) 2022
Artetxe (2024100217173473200_bib3) 2018
Wolf (2024100217173473200_bib65) 2020
Schrimpf (2024100217173473200_bib54) 2018
2024100217173473200_bib17
Huh (2024100217173473200_bib27) 2024
Mitchell (2024100217173473200_bib39) 2023; 120
Manning (2024100217173473200_bib34) 2020; 117
Li (2024100217173473200_bib31) 2023
Radford (2024100217173473200_bib46) 2021
Zhao (2024100217173473200_bib68) 2020
Piantadosi (2024100217173473200_bib45) 2022
Abdou (2024100217173473200_bib1) 2021
Dosovitskiy (2024100217173473200_bib15) 2021
Hartmann (2024100217173473200_bib23) 2018
Mollo (2024100217173473200_bib40) 2023
Devlin (2024100217173473200_bib14) 2019
Zhou (2024100217173473200_bib69) 2017
Bender (2024100217173473200_bib4) 2020
Søgaard (2024100217173473200_bib57) 2018
Radford (2024100217173473200_bib47) 2019
Antonello (2024100217173473200_bib2) 2022
Marconi (2024100217173473200_bib35) 1997
Turc (2024100217173473200_bib61) 2019
Butlin (2024100217173473200_bib8) 2021
Lodge (2024100217173473200_bib32) 1998; 81
Merullo (2024100217173473200_bib37) 2023
Brendel (2024100217173473200_bib7) 2004; 58
Wei (2024100217173473200_bib63) 2022
Goldstein (2024100217173473200_bib20) 2021
Marcus (2024100217173473200_bib36) 2023
Shea (2024100217173473200_bib56) 2018
He (2024100217173473200_bib25) 2016
Touvron (2024100217173473200_bib60) 2023
Bird (2024100217173473200_bib6) 2009
Halvagal (2024100217173473200_bib21) 2022
Conneau (2024100217173473200_bib13) 2018
Rapaport (2024100217173473200_bib48) 2002; 12
Hartmann (2024100217173473200_bib22) 2018
Toneva (2024100217173473200_bib59) 2019; 32
Zhang (2024100217173473200_bib67) 2022
Kiela (2024100217173473200_bib29) 2015
Zhu (2024100217173473200_bib70) 2015
Russakovsky (2024100217173473200_bib49) 2015; 115
Navigli (2024100217173473200_bib42) 2012; 193
Searle (2024100217173473200_bib55) 1980; 3
Paszke (2024100217173473200_bib44) 2019
Caron (2024100217173473200_bib10) 2021
Schrimpf (2024100217173473200_bib53) 2021
Teehan (2024100217173473200_bib58) 2022
Glavaš (2024100217173473200_bib19) 2020
Williams (2024100217173473200_bib64) 2018; 28
Caucheteux (2024100217173473200_bib12) 2022
Mandelkern (2024100217173473200_bib33) 2023
Schönemann (2024100217173473200_bib52) 1966; 31
Nakashole (2024100217173473200_bib41) 2018
Lazaridou (2024100217173473200_bib30) 2014
References_xml – volume: 28
  start-page: 141
  issue: 1
  year: 2018
  ident: 2024100217173473200_bib64
  article-title: Predictive processing and the representation wars
  publication-title: Minds and Machines
  doi: 10.1007/s11023-017-9441-6
– volume-title: Natural language processing with Python: Analyzing text with the natural language toolkit
  year: 2009
  ident: 2024100217173473200_bib6
– start-page: 148
  volume-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
  year: 2015
  ident: 2024100217173473200_bib29
  article-title: Visual bilingual lexicon induction with transferred ConvNet features
  doi: 10.18653/v1/D15-1015
– volume-title: Making AI Intelligible: Philosophical Foundations
  year: 2021
  ident: 2024100217173473200_bib9
  doi: 10.1093/oso/9780192894724.001.0001
– volume: 120
  start-page: e2215907120
  issue: 13
  year: 2023
  ident: 2024100217173473200_bib39
  article-title: The debate over understanding in AI’s large language models
  publication-title: Proceedings of the National Academy of Sciences
  doi: 10.1073/pnas.2215907120
– volume: 3
  start-page: 417
  year: 1980
  ident: 2024100217173473200_bib55
  article-title: Minds, brains, and programs
  publication-title: Behavioral and Brain Sciences
  doi: 10.1017/S0140525X00005756
– volume-title: NeurIPS 2022 Workshop on Neuro Causal and Symbolic AI (nCSI)
  year: 2022
  ident: 2024100217173473200_bib45
  article-title: Meaning without reference in large language models
– year: 2023
  ident: 2024100217173473200_bib33
  article-title: Do language models refer?
  doi: 10.1162/coli_a_00522
– start-page: 3583
  volume-title: Proceedings of the 12th Language Resources and Evaluation Conference
  year: 2020
  ident: 2024100217173473200_bib68
  article-title: Non-linearity in mapping based cross-lingual word embeddings
– start-page: 8748
  volume-title: International Conference on Machine Learning
  year: 2021
  ident: 2024100217173473200_bib46
  article-title: Learning transferable visual models from natural language supervision
– start-page: 770
  volume-title: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition
  year: 2016
  ident: 2024100217173473200_bib25
  article-title: Deep residual learning for image recognition
  doi: 10.1109/CVPR.2016.90
– year: 2022
  ident: 2024100217173473200_bib12
  article-title: Brains and algorithms partially converge in natural language processing
  publication-title: Communications Biology
  doi: 10.1038/s42003-022-03036-1
– start-page: 12077
  volume-title: Advances in Neural Information Processing Systems
  year: 2021
  ident: 2024100217173473200_bib66
  article-title: Segformer: Simple and efficient design for semantic segmentation with transformers
– start-page: 155
  volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
  year: 2019
  ident: 2024100217173473200_bib38
  article-title: From brain space to distributional space: The perilous journeys of fMRI decoding
  doi: 10.18653/v1/P19-2021
– year: 2024
  ident: 2024100217173473200_bib27
  article-title: The platonic representation hypothesis
  publication-title: arXiv preprint arXiv:2405.07987
– volume-title: NeurIPS 2023 Workshop on Symmetry and Geometry in Neural Representations
  year: 2023
  ident: 2024100217173473200_bib31
  article-title: Structural similarities between language models and neural response measurements
– start-page: 19
  year: 2015
  ident: 2024100217173473200_bib70
  article-title: Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
  publication-title: 2015 IEEE International Conference on Computer Vision (ICCV)
  doi: 10.1109/ICCV.2015.11
– volume: 115
  start-page: 211
  issue: 3
  year: 2015
  ident: 2024100217173473200_bib49
  article-title: ImageNet large scale visual recognition challenge
  publication-title: International Journal of Computer Vision (IJCV)
  doi: 10.1007/s11263-015-0816-y
– start-page: 231
  volume-title: Theory and Applications of Ontology: Computer Applications
  year: 2010
  ident: 2024100217173473200_bib16
  article-title: Wordnet
  doi: 10.1007/978-90-481-8847-5_10
– volume: 35
  start-page: 12884
  issue: 14
  year: 2021
  ident: 2024100217173473200_bib18
  article-title: Analogy training multilingual encoders
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
  doi: 10.1609/aaai.v35i14.17524
– year: 2021
  ident: 2024100217173473200_bib20
  article-title: Thinking ahead: Spontaneous prediction in context as a keystone of language in humans and machines
  publication-title: bioRxiv
  doi: 10.1101/2020.12.02.403477
– year: 2019
  ident: 2024100217173473200_bib47
  article-title: Language models are unsupervised multitask learners
– volume: 193
  start-page: 217
  year: 2012
  ident: 2024100217173473200_bib42
  article-title: Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
  publication-title: Artificial Intelligence
  doi: 10.1016/j.artint.2012.07.001
– volume: 1
  start-page: 54
  issue: 1
  year: 2020
  ident: 2024100217173473200_bib51
  article-title: Traces of meaning itself: Encoding distributional word vectors in brain activity
  publication-title: Neurobiology of Language
  doi: 10.1162/nol_a_00003
– volume-title: On the limitations of unsupervised bilingual dictionary induction
  year: 2018
  ident: 2024100217173473200_bib57
  doi: 10.18653/v1/P18-1072
– start-page: 7548
  volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
  year: 2020
  ident: 2024100217173473200_bib19
  article-title: Non-linear instance-based cross-lingual mapping for non-isomorphic embedding spaces
  doi: 10.18653/v1/2020.acl-main.675
– start-page: 1403
  volume-title: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2014
  ident: 2024100217173473200_bib30
  article-title: Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world
  doi: 10.3115/v1/P14-1132
– start-page: 1
  year: 2022
  ident: 2024100217173473200_bib2
  article-title: Predictive coding or just feature discovery? an alternative account of why language models fit brain data
  publication-title: Neurobiology of Language
  doi: 10.1162/nol_a_00087
– start-page: 16000
  volume-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
  year: 2022
  ident: 2024100217173473200_bib24
  article-title: Masked autoencoders are scalable vision learners
  doi: 10.1109/CVPR52688.2022.01553
– year: 2022
  ident: 2024100217173473200_bib11
  article-title: Long-range and hierarchical language predictions in brains and algorithms
  publication-title: Nature Human Behaviour
  doi: 10.48550/arXiv.2111.14232
– start-page: 4171
  volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
  year: 2019
  ident: 2024100217173473200_bib14
  article-title: BERT: Pre-training of deep bidirectional transformers for language understanding
– volume: 58
  start-page: 89
  issue: 1
  year: 2004
  ident: 2024100217173473200_bib7
  article-title: Intuition pumps and the proper use of thought experiments
  publication-title: Dialectica
  doi: 10.1111/j.1746-8361.2004.tb00293.x
– start-page: 9960
  volume-title: Advances in Neural Information Processing Systems
  year: 2020
  ident: 2024100217173473200_bib43
  article-title: Self-supervised learning through the eyes of a child
– year: 2022
  ident: 2024100217173473200_bib21
  article-title: The combination of hebbian and predictive plasticity learns invariant object representations in deep sensory networks
  publication-title: bioRxiv
  doi: 10.1101/2022.03.17.484712
– volume: 117
  start-page: 30046
  issue: 48
  year: 2020
  ident: 2024100217173473200_bib34
  article-title: Emergent linguistic structure in artificial neural networks trained by self-supervision
  publication-title: Proceedings of the National Academy of Sciences
  doi: 10.1073/pnas.1907367117
– volume: 31
  start-page: 1
  issue: 1
  year: 1966
  ident: 2024100217173473200_bib52
  article-title: A generalized solution of the orthogonal procrustes problem
  publication-title: Psychometrika
  doi: 10.1007/BF02289451
– start-page: 38
  volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
  year: 2020
  ident: 2024100217173473200_bib65
  article-title: Transformers: State-of-the-art natural language processing
  doi: 10.18653/v1/2020.emnlp-demos.6
– year: 2023
  ident: 2024100217173473200_bib60
  article-title: Llama 2: Open foundation and fine-tuned chat models
  publication-title: arXiv preprint arXiv:2307.09288
– start-page: 1764
  volume-title: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three
  year: 2011
  ident: 2024100217173473200_bib5
  article-title: Learning bilingual lexicons using the visual similarity of labeled web images
– volume: 4
  issue: 682578
  year: 2021
  ident: 2024100217173473200_bib50
  article-title: The singleton fallacy: Why current critiques of language models miss the point
  publication-title: Frontiers in Artificial Intelligence
  doi: 10.3389/frai.2021.682578
– start-page: 109
  volume-title: Proceedings of the 25th Conference on Computational Natural Language Learning
  year: 2021
  ident: 2024100217173473200_bib1
  article-title: Can language models encode perceptual structure without grounding? A case study in color
  doi: 10.18653/v1/2021.conll-1.9
– year: 2023
  ident: 2024100217173473200_bib36
  article-title: A sentence is worth a thousand pictures: Can large language models understand human language?
– volume-title: Proceedings of ICLR 2018
  year: 2018
  ident: 2024100217173473200_bib13
  article-title: Word translation without parallel data
– start-page: 146
  volume-title: Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in CreatingLarge Language Models
  year: 2022
  ident: 2024100217173473200_bib58
  article-title: Emergent structures and training dynamics in large language models
  doi: 10.18653/v1/2022.bigscience-1.11
– year: 2017
  ident: 2024100217173473200_bib69
  article-title: Scene parsing through ade20k dataset
  publication-title: Computer Vision and Pattern Recognition
  doi: 10.1109/CVPR.2017.544
– start-page: 159
  volume-title: Proceedings of The Third Workshop on Representation Learning for NLP
  year: 2018
  ident: 2024100217173473200_bib23
  article-title: Limitations of cross-lingual learning from image search
  doi: 10.18653/v1/W18-3021
– year: 2018
  ident: 2024100217173473200_bib54
  article-title: Brain-score: Which artificial neural network for object recognition is most brain-like?
  publication-title: bioRxiv
  doi: 10.1101/407007
– volume-title: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021
  year: 2021
  ident: 2024100217173473200_bib15
  article-title: An image is worth 16x16 words: Transformers for image recognition at scale
– year: 2021
  ident: 2024100217173473200_bib53
  article-title: The neural architecture of language: Integrative modeling converges on predictive processing
  publication-title: bioRxiv
  doi: 10.1073/pnas.2105646118
– start-page: 36
  volume-title: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  year: 2014
  ident: 2024100217173473200_bib28
  article-title: Learning image embeddings using convolutional neural networks for improved multi-modal semantics
  doi: 10.3115/v1/D14-1005
– start-page: 5185
  volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
  year: 2020
  ident: 2024100217173473200_bib4
  article-title: Climbing towards NLU: On meaning, form, and understanding in the age of data
  doi: 10.18653/v1/2020.acl-main.463
– volume-title: Representation in Cognitive Science
  year: 2018
  ident: 2024100217173473200_bib56
  doi: 10.1093/oso/9780198812883.001.0001
– start-page: 9650
  volume-title: Proceedings of the IEEE/CVF International Conference on Computer Vision
  year: 2021
  ident: 2024100217173473200_bib10
  article-title: Emerging properties in self-supervised vision transformers
  doi: 10.1109/ICCV48922.2021.00951
– start-page: 512
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2024100217173473200_bib41
  article-title: NORMA: Neighborhood sensitive maps for multilingual word embeddings
  doi: 10.18653/v1/D18-1047
– year: 2023
  ident: 2024100217173473200_bib71
  article-title: Representation engineering: A top-down approach to ai transparency
  publication-title: arXiv preprint arXiv:2310.01405
– volume: 32
  year: 2019
  ident: 2024100217173473200_bib59
  article-title: Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
  publication-title: Advances in Neural Information Processing Systems
– start-page: 1
  year: 2021
  ident: 2024100217173473200_bib8
  article-title: Sharing our concepts with machines
  publication-title: Erkenntnis
  doi: 10.1007/s10670-021-00491-w
– volume: 12
  start-page: 3
  issue: 1
  year: 2002
  ident: 2024100217173473200_bib48
  article-title: Holism, conceptual- role semantics, and syntactic semantics
  publication-title: Minds and Machines
  doi: 10.1023/a:1013765011735
– start-page: 188
  volume-title: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
  year: 2016
  ident: 2024100217173473200_bib62
  article-title: Multi-modal representations for improved bilingual lexicon learning
  doi: 10.18653/v1/P16-2031
– start-page: 582
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2024100217173473200_bib22
  article-title: Why is unsupervised alignment of English embeddings from different algorithms so hard?
  doi: 10.18653/v1/D18-1056
– year: 2022
  ident: 2024100217173473200_bib63
  article-title: Emergent abilities of large language models
  publication-title: Transactions on Machine Learning Research
– year: 2023
  ident: 2024100217173473200_bib40
  article-title: The vector grounding problem
– volume-title: Lexical Competence
  year: 1997
  ident: 2024100217173473200_bib35
– start-page: 8024
  volume-title: Advances in Neural Information Processing Systems 32
  year: 2019
  ident: 2024100217173473200_bib44
  article-title: Pytorch: An imperative style, high-performance deep learning library
– ident: 2024100217173473200_bib17
– volume-title: ACL
  year: 2018
  ident: 2024100217173473200_bib3
  article-title: A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
  doi: 10.18653/v1/P18-1073
– volume-title: The Eleventh International Conference on Learning Representations
  year: 2023
  ident: 2024100217173473200_bib37
  article-title: Linearly mapping from image to text space
– year: 2022
  ident: 2024100217173473200_bib67
  article-title: Opt: Open pre-trained transformer language models
– year: 2019
  ident: 2024100217173473200_bib61
  article-title: Well-read students learn better: On the importance of pre-training compact models
  publication-title: arXiv preprint arXiv:1908.08962v2
– start-page: 1801.06126
  volume-title: CoRR
  year: 2018
  ident: 2024100217173473200_bib26
  article-title: An iterative closest point method for unsupervised word translation
  doi: 10.18653/v1/D18-1043
– volume: 81
  start-page: 553
  issue: 4
  year: 1998
  ident: 2024100217173473200_bib32
  article-title: Stepping back inside Leibniz’s mill
  publication-title: The Monist
  doi: 10.5840/monist199881427
SSID ssj0001818062
Score 2.276172
Snippet Large-scale pretrained language models (LMs) are said to “lack the ability to connect utterances to the world” (Bender and Koller, ), because they do not have...
Large-scale pretrained language models (LMs) are said to “lack the ability to connect utterances to the world” (Bender and Koller, 2020), because they do not...
SourceID doaj
crossref
mit
SourceType Open Website
Enrichment Source
Index Database
Publisher
StartPage 1232
Title Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
URI https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00698
https://doaj.org/article/ee8d0e6d68d9476e85abf2f725793cda
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA5SL15EUbE-SgQ9ydJ95bEnaWtLkVpEa-ktJNmJCEsrdj34702y21JB8eI1DGSZmTDzzc58g9Cl49gPJUkCbZ0pSKNIBhkH965YFGkVmUy5aeT7MR0-p3czMttY9eV6wip64EpxbQCeh0BzyvMsZRQ4kcrEhllXyxKd-9QozMINMOWrK26EmcarTncat0upCyGFY-bl32KQp-q3kWU9Pu8jy2AP7dYpIe5Un7KPtmB-gEa3Czz1c9_YYn08qsuK2O0uK5bY8SwD7lUzh8sb3MFTX37HTxYDA-4Ury_-Nz92fYKfh2gy6E96w6DefGBVxuIykMC0xSEyjTUHm6AxpZhhNKLKcKlNQkKpcnuUgkUEiisJxjHDS64g1JokR6gxX8zhGGGtVQyZ0jQjLJU6svBKGpJKl1kRAqyJrleqELpmBXfLKQrh0QGNxabimuhqLf1WsWH8Itd1Wl3LOA5rf2AtK2rLir8s20QX1iaiflPLHy86-Y-LTtFObFOVqgvkDDXK9w84t6lGqVpou9sfPzy2vHd9Ac-D1KA
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60HvQiior1GUFPstps81oQpL6ote3FKt5Ckp0tQm3F1oP_3mR3W1QUvGZn2WXymEfm-wbgMHDs1wyvR84vpohRaqJEYdhXklJnaZbYgEbudEXzgbWe-NMcnE2xMMVBfvLyXFTRTIwbnJY6nJENUBHnD7TRgWhXzcOCYJTxCiy0Ot3WlxxLADKLeFrv_uO1b5YoJ-z39mUGos_ty80KLJeOIWkUf7EKczhcg_bViDzm6G_iI37SLpOLJHQwG4xJYFtGclkgD8fnpEEe8yQ8ufeRMJLG4LmfX_aTUC34sQ69m-veZTMq-x94xcl4EhmUzkcjhsVOoXfTpLUyk4IKmynjsjqvGZv6IYY-LrDKGswCP7xRFmvO8foGVIajIW4Ccc7GmFgnEi6ZcdQHWSbjzAT_inOUVTieqkK7khs8tKgY6DxGELH-qrgqHM2kXwtOjD_kLoJWZzKByTofGL31dTmpGlGlNRSpUGnCpEDFjc3iTPqjJKm71FThwM-JLnfW-NcPbf1DZh8Wm71OW7dvu3fbsBR736Qo-9iByuTtHXe9bzGxe-US-gRkPcz7
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Do+Vision+and+Language+Models+Share+Concepts%3F+A+Vector+Space+Alignment+Study&rft.jtitle=Transactions+of+the+Association+for+Computational+Linguistics&rft.au=Li%2C+Jiaang&rft.au=Kementchedjhieva%2C+Yova&rft.au=Fierro%2C+Constanza&rft.au=S%C3%B8gaard%2C+Anders&rft.date=2024-09-30&rft.pub=MIT+Press&rft.eissn=2307-387X&rft.volume=12&rft.spage=1232&rft.epage=1249&rft_id=info:doi/10.1162%2Ftacl_a_00698&rft.externalDBID=n%2Fa&rft.externalDocID=tacl_a_00698.pdf
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2307-387X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2307-387X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2307-387X&client=summon