Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio

Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on unde...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Peterson, Ralph E, Tanelus, Aramis, Ick, Christopher, Mimica, Bartul, Francis, Niegil, Ivan, Violet J, Choudhri, Aman, Falkner, Annegret L, Murthy, Mala, Schneider, David M, Sanes, Dan H, Williams, Alex H
Format Journal Article
LanguageEnglish
Published United States Cold Spring Harbor Laboratory 21.09.2024
Online AccessGet full text

Cover

Loading…
Abstract Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap.
AbstractList Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap.
Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap.Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap.
Author Peterson, Ralph E
Mimica, Bartul
Williams, Alex H
Ivan, Violet J
Falkner, Annegret L
Ick, Christopher
Choudhri, Aman
Sanes, Dan H
Schneider, David M
Murthy, Mala
Tanelus, Aramis
Francis, Niegil
Author_xml – sequence: 1
  givenname: Ralph E
  surname: Peterson
  fullname: Peterson, Ralph E
  organization: Flatiron Institute, Center for Computational Neuroscience
– sequence: 2
  givenname: Aramis
  surname: Tanelus
  fullname: Tanelus, Aramis
  organization: Flatiron Institute, Center for Computational Neuroscience
– sequence: 3
  givenname: Christopher
  surname: Ick
  fullname: Ick, Christopher
  organization: NYU, Center for Data Science
– sequence: 4
  givenname: Bartul
  surname: Mimica
  fullname: Mimica, Bartul
  organization: Princeton Neuroscience Institute
– sequence: 5
  givenname: Niegil
  surname: Francis
  fullname: Francis, Niegil
  organization: NYU, Tandon School of Engineering
– sequence: 6
  givenname: Violet J
  surname: Ivan
  fullname: Ivan, Violet J
  organization: NYU, Center for Neural Science
– sequence: 7
  givenname: Aman
  surname: Choudhri
  fullname: Choudhri, Aman
  organization: Columbia Univsersity
– sequence: 8
  givenname: Annegret L
  surname: Falkner
  fullname: Falkner, Annegret L
  organization: Princeton Neuroscience Institute
– sequence: 9
  givenname: Mala
  surname: Murthy
  fullname: Murthy, Mala
  organization: Princeton Neuroscience Institute
– sequence: 10
  givenname: David M
  surname: Schneider
  fullname: Schneider, David M
  organization: NYU, Center for Neural Science
– sequence: 11
  givenname: Dan H
  surname: Sanes
  fullname: Sanes, Dan H
  organization: NYU, Center for Neural Science
– sequence: 12
  givenname: Alex H
  surname: Williams
  fullname: Williams, Alex H
  organization: Flatiron Institute, Center for Computational Neuroscience
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39345431$$D View this record in MEDLINE/PubMed
BookMark eNpVUMtOwzAQtFARLdAP4IJ8LIcEP9PkhCDiJUXiAr1aju20BscueVSCrydVCyqnHc2OZnbnFIx88AaAC4xijBG-JoiwGGUxQXGC6ZynR2BCkoxEKUF8dIDHYNq27wghkm2F7ASMaUYZZxRPgFgEJR3MpXOwGGAXGnhnvFrVsvmAs0VeXMFq4NxWZr-tX8ImaOM7uNkxsrPBt7BqQg3r3nU2UivpvXFQ9tqGc3BcSdea6X6egbeH-9f8KSpeHp_z2yJaY87TSBnFCaoSmimGSCkx5QnhUs2NYaXGSZoyzdLS8HKe6QxhnZUpVlIlSEulNaVn4Gbnu-7L2mg1XNhIJ9aNHR75EkFa8X_j7Uosw0ZgzOhQTTI4zPYOTfjsTduJ2rbKOCe9CX0rKMaYIJbOySC9PAz7S_mtlf4Awqh98Q
ContentType Journal Article
DBID NPM
7X8
5PM
DOI 10.1101/2024.09.20.613758
DatabaseName PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2692-8205
ExternalDocumentID 39345431
Genre Journal Article
Preprint
GrantInformation_xml – fundername: NIDCD NIH HHS
  grantid: R01 DC020279
– fundername: NIDA NIH HHS
  grantid: T90 DA059110
– fundername: NIDCD NIH HHS
  grantid: R01 DC018802
– fundername: NIDA NIH HHS
  grantid: R34 DA059513
GroupedDBID 8FE
8FH
AFKRA
ALMA_UNASSIGNED_HOLDINGS
BBNVY
BENPR
BHPHI
HCIFZ
LK8
M7P
NPM
NQS
PIMPY
PROAC
RHI
7X8
5PM
ID FETCH-LOGICAL-p1558-cec520f639c402ba135625ac7ee4bd16884d48be5b79d901d9b81cac60dacdd33
ISSN 2692-8205
IngestDate Mon Sep 30 11:48:16 EDT 2024
Thu Oct 24 02:23:48 EDT 2024
Sat Nov 02 12:25:41 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
License This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p1558-cec520f639c402ba135625ac7ee4bd16884d48be5b79d901d9b81cac60dacdd33
Notes ObjectType-Working Paper/Pre-Print-3
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC11430026
PMID 39345431
PQID 3111204872
PQPubID 23479
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_11430026
proquest_miscellaneous_3111204872
pubmed_primary_39345431
PublicationCentury 2000
PublicationDate 2024-Sep-21
20240921
PublicationDateYYYYMMDD 2024-09-21
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-Sep-21
  day: 21
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle bioRxiv
PublicationTitleAlternate bioRxiv
PublicationYear 2024
Publisher Cold Spring Harbor Laboratory
Publisher_xml – name: Cold Spring Harbor Laboratory
SSID ssj0002961374
Score 1.9321067
Snippet Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in...
SourceID pubmedcentral
proquest
pubmed
SourceType Open Access Repository
Aggregation Database
Index Database
Title Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio
URI https://www.ncbi.nlm.nih.gov/pubmed/39345431
https://www.proquest.com/docview/3111204872
https://pubmed.ncbi.nlm.nih.gov/PMC11430026
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ3Nb9MwFMAt6ITEBcH42IAhI3EAVRmJ4ybOcZs6DVQKmtKpt8h2HC1iS6eSTrC_nvfsLFnWHQaXKHLTOPLPen7P78OEfBCBr5QUuaeKkHmcS-YlkUEvuy-BuF8YW1Lo2zQ6mvGv89G8OwTRZpfUaldf3ZlX8j9UoQ24YpbsP5BtXwoNcA984QqE4Xovxie4EGF61dlwArdgPg_34ftOz-XyJ6qOJwcTtPoxktCuWeUVbgyAyMQAgEvX0oTC2TQTG13oYS5wZc6GcpW7IK1r5VWVi-Pf5WUnTWurrltKmLPbZTWkEl6wcgcdL-V52SruX5z0vVHSoCVeYuGCxgVSN-GKzXYE4xg74XKcndRiUQIilvnOVW3W29Zltj0rAN-FFWeZvwsaRuzquffrY0-_Z4ezySRLx_P0IdlgIFrEgGzsj6c_jtt9NZbg3zGYoO21cWZDP5_XernLsLgdH3tD4UifkieNpUD3HPZn5IGpNskjd3bon-cks_ApwqcNfNrCpx8B_ScK4GkHnjrwtAeeInjaA08t-BdkdjhOD4685rQM7wJ0QuFpo0fML0Dj1NxnSgYh2rZSx8ZwlQeREDznQpmRipMctMA8USLQUkd-LnWeh-FLMqgWldkiVMtIBiopAj9WfKSM0EUSa3Q4h4UpJNsm768HLYMphC4mmFaL1a8shKUTS0HH8MwrN4jZhSubkoVJyLHywjYRveFtH8BK5_1fqvLUVjwHoz3E3YLX9-j4DXnczcu3ZFAvV2YHFMdavWvmyl-PXnAw
link.rule.ids 230,315,783,787,888,27938,27939,33759
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Vocal+Call+Locator+Benchmark+%28VCL%29+for+localizing+rodent+vocalizations+from+multi-channel+audio&rft.jtitle=bioRxiv&rft.au=Peterson%2C+Ralph+E&rft.au=Tanelus%2C+Aramis&rft.au=Ick%2C+Christopher&rft.au=Mimica%2C+Bartul&rft.date=2024-09-21&rft.issn=2692-8205&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2024.09.20.613758&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon