Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio
Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on unde...
Saved in:
Published in | bioRxiv |
---|---|
Main Authors | , , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
Cold Spring Harbor Laboratory
21.09.2024
|
Online Access | Get full text |
Cover
Loading…
Abstract | Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap. |
---|---|
AbstractList | Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap. Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap.Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap. |
Author | Peterson, Ralph E Mimica, Bartul Williams, Alex H Ivan, Violet J Falkner, Annegret L Ick, Christopher Choudhri, Aman Sanes, Dan H Schneider, David M Murthy, Mala Tanelus, Aramis Francis, Niegil |
Author_xml | – sequence: 1 givenname: Ralph E surname: Peterson fullname: Peterson, Ralph E organization: Flatiron Institute, Center for Computational Neuroscience – sequence: 2 givenname: Aramis surname: Tanelus fullname: Tanelus, Aramis organization: Flatiron Institute, Center for Computational Neuroscience – sequence: 3 givenname: Christopher surname: Ick fullname: Ick, Christopher organization: NYU, Center for Data Science – sequence: 4 givenname: Bartul surname: Mimica fullname: Mimica, Bartul organization: Princeton Neuroscience Institute – sequence: 5 givenname: Niegil surname: Francis fullname: Francis, Niegil organization: NYU, Tandon School of Engineering – sequence: 6 givenname: Violet J surname: Ivan fullname: Ivan, Violet J organization: NYU, Center for Neural Science – sequence: 7 givenname: Aman surname: Choudhri fullname: Choudhri, Aman organization: Columbia Univsersity – sequence: 8 givenname: Annegret L surname: Falkner fullname: Falkner, Annegret L organization: Princeton Neuroscience Institute – sequence: 9 givenname: Mala surname: Murthy fullname: Murthy, Mala organization: Princeton Neuroscience Institute – sequence: 10 givenname: David M surname: Schneider fullname: Schneider, David M organization: NYU, Center for Neural Science – sequence: 11 givenname: Dan H surname: Sanes fullname: Sanes, Dan H organization: NYU, Center for Neural Science – sequence: 12 givenname: Alex H surname: Williams fullname: Williams, Alex H organization: Flatiron Institute, Center for Computational Neuroscience |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39345431$$D View this record in MEDLINE/PubMed |
BookMark | eNpVUMtOwzAQtFARLdAP4IJ8LIcEP9PkhCDiJUXiAr1aju20BscueVSCrydVCyqnHc2OZnbnFIx88AaAC4xijBG-JoiwGGUxQXGC6ZynR2BCkoxEKUF8dIDHYNq27wghkm2F7ASMaUYZZxRPgFgEJR3MpXOwGGAXGnhnvFrVsvmAs0VeXMFq4NxWZr-tX8ImaOM7uNkxsrPBt7BqQg3r3nU2UivpvXFQ9tqGc3BcSdea6X6egbeH-9f8KSpeHp_z2yJaY87TSBnFCaoSmimGSCkx5QnhUs2NYaXGSZoyzdLS8HKe6QxhnZUpVlIlSEulNaVn4Gbnu-7L2mg1XNhIJ9aNHR75EkFa8X_j7Uosw0ZgzOhQTTI4zPYOTfjsTduJ2rbKOCe9CX0rKMaYIJbOySC9PAz7S_mtlf4Awqh98Q |
ContentType | Journal Article |
DBID | NPM 7X8 5PM |
DOI | 10.1101/2024.09.20.613758 |
DatabaseName | PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | PubMed MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic PubMed |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 2692-8205 |
ExternalDocumentID | 39345431 |
Genre | Journal Article Preprint |
GrantInformation_xml | – fundername: NIDCD NIH HHS grantid: R01 DC020279 – fundername: NIDA NIH HHS grantid: T90 DA059110 – fundername: NIDCD NIH HHS grantid: R01 DC018802 – fundername: NIDA NIH HHS grantid: R34 DA059513 |
GroupedDBID | 8FE 8FH AFKRA ALMA_UNASSIGNED_HOLDINGS BBNVY BENPR BHPHI HCIFZ LK8 M7P NPM NQS PIMPY PROAC RHI 7X8 5PM |
ID | FETCH-LOGICAL-p1558-cec520f639c402ba135625ac7ee4bd16884d48be5b79d901d9b81cac60dacdd33 |
ISSN | 2692-8205 |
IngestDate | Mon Sep 30 11:48:16 EDT 2024 Thu Oct 24 02:23:48 EDT 2024 Sat Nov 02 12:25:41 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
License | This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p1558-cec520f639c402ba135625ac7ee4bd16884d48be5b79d901d9b81cac60dacdd33 |
Notes | ObjectType-Working Paper/Pre-Print-3 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC11430026 |
PMID | 39345431 |
PQID | 3111204872 |
PQPubID | 23479 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_11430026 proquest_miscellaneous_3111204872 pubmed_primary_39345431 |
PublicationCentury | 2000 |
PublicationDate | 2024-Sep-21 20240921 |
PublicationDateYYYYMMDD | 2024-09-21 |
PublicationDate_xml | – month: 09 year: 2024 text: 2024-Sep-21 day: 21 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | bioRxiv |
PublicationTitleAlternate | bioRxiv |
PublicationYear | 2024 |
Publisher | Cold Spring Harbor Laboratory |
Publisher_xml | – name: Cold Spring Harbor Laboratory |
SSID | ssj0002961374 |
Score | 1.9321067 |
Snippet | Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in... |
SourceID | pubmedcentral proquest pubmed |
SourceType | Open Access Repository Aggregation Database Index Database |
Title | Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio |
URI | https://www.ncbi.nlm.nih.gov/pubmed/39345431 https://www.proquest.com/docview/3111204872 https://pubmed.ncbi.nlm.nih.gov/PMC11430026 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ3Nb9MwFMAt6ITEBcH42IAhI3EAVRmJ4ybOcZs6DVQKmtKpt8h2HC1iS6eSTrC_nvfsLFnWHQaXKHLTOPLPen7P78OEfBCBr5QUuaeKkHmcS-YlkUEvuy-BuF8YW1Lo2zQ6mvGv89G8OwTRZpfUaldf3ZlX8j9UoQ24YpbsP5BtXwoNcA984QqE4Xovxie4EGF61dlwArdgPg_34ftOz-XyJ6qOJwcTtPoxktCuWeUVbgyAyMQAgEvX0oTC2TQTG13oYS5wZc6GcpW7IK1r5VWVi-Pf5WUnTWurrltKmLPbZTWkEl6wcgcdL-V52SruX5z0vVHSoCVeYuGCxgVSN-GKzXYE4xg74XKcndRiUQIilvnOVW3W29Zltj0rAN-FFWeZvwsaRuzquffrY0-_Z4ezySRLx_P0IdlgIFrEgGzsj6c_jtt9NZbg3zGYoO21cWZDP5_XernLsLgdH3tD4UifkieNpUD3HPZn5IGpNskjd3bon-cks_ApwqcNfNrCpx8B_ScK4GkHnjrwtAeeInjaA08t-BdkdjhOD4685rQM7wJ0QuFpo0fML0Dj1NxnSgYh2rZSx8ZwlQeREDznQpmRipMctMA8USLQUkd-LnWeh-FLMqgWldkiVMtIBiopAj9WfKSM0EUSa3Q4h4UpJNsm768HLYMphC4mmFaL1a8shKUTS0HH8MwrN4jZhSubkoVJyLHywjYRveFtH8BK5_1fqvLUVjwHoz3E3YLX9-j4DXnczcu3ZFAvV2YHFMdavWvmyl-PXnAw |
link.rule.ids | 230,315,783,787,888,27938,27939,33759 |
linkProvider | ProQuest |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Vocal+Call+Locator+Benchmark+%28VCL%29+for+localizing+rodent+vocalizations+from+multi-channel+audio&rft.jtitle=bioRxiv&rft.au=Peterson%2C+Ralph+E&rft.au=Tanelus%2C+Aramis&rft.au=Ick%2C+Christopher&rft.au=Mimica%2C+Bartul&rft.date=2024-09-21&rft.issn=2692-8205&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2024.09.20.613758&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon |