Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around an...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on pattern analysis and machine intelligence Vol. 40; no. 5; pp. 1086 - 1099
Main Authors Gebru, Israel D., Ba, Sileye, Xiaofei Li, Horaud, Radu
Format Journal Article
LanguageEnglish
Published United States IEEE 01.05.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
AbstractList Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
Author Xiaofei Li
Ba, Sileye
Horaud, Radu
Gebru, Israel D.
Author_xml – sequence: 1
  givenname: Israel D.
  surname: Gebru
  fullname: Gebru, Israel D.
  email: israel-dejene.gebru@inria.fr
  organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France
– sequence: 2
  givenname: Sileye
  surname: Ba
  fullname: Ba, Sileye
  email: sileye.ba@inria.fr
  organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France
– sequence: 3
  surname: Xiaofei Li
  fullname: Xiaofei Li
  email: xiaofei.li@inria.fr
  organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France
– sequence: 4
  givenname: Radu
  surname: Horaud
  fullname: Horaud, Radu
  email: radu.horaud@inria.fr
  organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France
BackLink https://www.ncbi.nlm.nih.gov/pubmed/28103192$$D View this record in MEDLINE/PubMed
https://inria.hal.science/hal-01413403$$DView record in HAL
BookMark eNp9kU9v1DAQxS1URLeFLwASisQFDln8385xWyhbaRFILVyt2WQiXLJxaidI5dPjdLc99MBpRqPfm9G8d0KO-tAjIa8ZXTJGq4_X31dfL5ecMrPkWlpTiWdkwZmmZcUrfkQWlGleWsvtMTlJ6YZSJhUVL8gxt4wKVvEFWa-mxofyp08TdMXVgPAbY_HJQ_R_YfShL84gYVPk5mqYByPuhhAzewZ3mDz0xcWUMveSPG-hS_jqUE_Jj4vP1-frcvPty-X5alPWUrGxlLoWIFQFvKHApZKqBtsIBQBG13LbNriVW11DxluwtDUV1ZVRDQeFFJk4JR_2e39B54bodxDvXADv1quNm2f5SSYkFX9m9v2eHWK4nTCNbudTjV0HPYYpOWY1U5YzaTP67gl6E6bY508cZybbVimjM_X2QE3bHTaP9x_8zIDdA3UMKUVsXe3Hex_HCL5zjLo5OncfnZujc4fospQ_kT5s_6_ozV7kEfFRYCw1QkjxDxp2olU
CODEN ITPIDJ
CitedBy_id crossref_primary_10_3390_bioengineering11121233
crossref_primary_10_1109_TASLP_2023_3346643
crossref_primary_10_1109_TMM_2020_3007350
crossref_primary_10_3233_IDT_211005
crossref_primary_10_1109_ACCESS_2020_3007312
crossref_primary_10_1109_JSTSP_2020_2987728
crossref_primary_10_1109_TPAMI_2019_2953020
crossref_primary_10_1109_TBIOM_2024_3412821
crossref_primary_10_1109_TPAMI_2022_3167045
crossref_primary_10_1007_s10462_022_10224_2
crossref_primary_10_1109_TPAMI_2022_3232854
crossref_primary_10_1121_10_0002924
crossref_primary_10_1007_s11042_024_18457_9
crossref_primary_10_1016_j_inffus_2023_102204
crossref_primary_10_3390_info16030233
crossref_primary_10_1109_TMM_2023_3301221
crossref_primary_10_1109_ACCESS_2021_3074797
crossref_primary_10_1007_s12369_025_01213_w
crossref_primary_10_1109_TMM_2019_2937185
crossref_primary_10_1145_3657030
crossref_primary_10_1109_ACCESS_2023_3325474
crossref_primary_10_1109_ACCESS_2024_3426670
crossref_primary_10_1109_TASLP_2020_2980974
crossref_primary_10_1109_TMM_2021_3061800
crossref_primary_10_1007_s11042_018_5944_2
crossref_primary_10_3390_s19235163
crossref_primary_10_3390_sym11091154
crossref_primary_10_1007_s10772_020_09681_3
crossref_primary_10_1109_JSTSP_2019_2903472
crossref_primary_10_1109_TPAMI_2018_2798607
crossref_primary_10_3390_s24134229
crossref_primary_10_1109_TMM_2019_2902489
crossref_primary_10_1016_j_neunet_2020_10_003
crossref_primary_10_1109_OJSP_2024_3363649
crossref_primary_10_3390_s23156969
crossref_primary_10_3390_s20102948
Cites_doi 10.1109/CVPR.2014.159
10.1109/JPROC.2003.817150
10.1109/TPAMI.2016.2522425
10.21437/Interspeech.2010-704
10.1007/s11042-014-2274-x
10.21437/Interspeech.2012-579
10.1109/TSP.2006.888095
10.1109/TASL.2006.872619
10.1007/3-540-45113-7_48
10.1109/ICASSP.2016.7471661
10.1145/1322192.1322254
10.1145/2070481.2070507
10.1109/TASL.2011.2125954
10.1109/EUSIPCO.2015.7362413
10.1016/j.sigpro.2011.09.032
10.1109/TASLP.2015.2405475
10.1162/NECO_a_00074
10.1109/MMSP.2013.6659295
10.1109/53.665
10.1007/978-3-540-68585-2_47
10.1109/TASL.2009.2029711
10.1109/TMM.2009.2037387
10.1109/TPAMI.2011.47
10.1109/ICASSP.2007.366271
10.1007/s11042-012-1080-6
10.1016/S0167-6393(98)00048-X
10.1016/j.specom.2014.05.005
10.1109/TASLP.2015.2444654
10.1145/2733373.2806238
10.1109/TMM.2007.906583
10.1109/ICASSP.2015.7177983
10.1109/TASL.2006.881678
10.1007/978-3-319-22482-4_17
10.1109/ICCVW.2015.96
10.1109/TMM.2015.2463722
10.1109/ICCV.2013.150
10.1109/JSTSP.2010.2057198
10.1109/TMM.2014.2377515
10.1109/ICCV.2009.5459303
10.1109/CVPR.2005.274
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
1XC
VOOES
DOI 10.1109/TPAMI.2017.2648793
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
PubMed
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList

MEDLINE - Academic
PubMed
Technology Research Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2160-9292
1939-3539
EndPage 1099
ExternalDocumentID oai_HAL_hal_01413403v1
28103192
10_1109_TPAMI_2017_2648793
7807334
Genre orig-research
Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: European Union FP7 ERC Advanced
  grantid: VHIA (#340113)
– fundername: XEROX University Affairs Committee (UAC)
  grantid: 2015-2017
GroupedDBID ---
-DZ
-~X
.DC
0R~
29I
4.4
53G
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
~02
AAYXX
CITATION
RIG
5VS
9M8
AAYOK
ABFSI
ADRHT
AETEA
AETIX
AGSQL
AI.
AIBXA
ALLEH
FA8
H~9
IBMZZ
ICLAB
IFJZH
NPM
RNI
RZB
VH1
XJT
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
1XC
VOOES
ID FETCH-LOGICAL-c451t-46c3a359a2d0a24545ca8d35aaa76c4bfdeb4b6cac45fa80f7906975d2a5e0e13
IEDL.DBID RIE
ISSN 0162-8828
IngestDate Fri May 09 12:14:06 EDT 2025
Fri Jul 11 15:44:20 EDT 2025
Sun Jun 29 16:57:37 EDT 2025
Tue Apr 08 05:57:00 EDT 2025
Thu Apr 24 22:51:11 EDT 2025
Tue Jul 01 03:18:23 EDT 2025
Wed Aug 27 02:47:49 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords speaker diarization
audio-visual tracking
dynamic Bayesian network
sound source localization
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c451t-46c3a359a2d0a24545ca8d35aaa76c4bfdeb4b6cac45fa80f7906975d2a5e0e13
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0001-5232-024X
OpenAccessLink https://inria.hal.science/hal-01413403
PMID 28103192
PQID 2174509576
PQPubID 85458
PageCount 14
ParticipantIDs hal_primary_oai_HAL_hal_01413403v1
ieee_primary_7807334
crossref_citationtrail_10_1109_TPAMI_2017_2648793
crossref_primary_10_1109_TPAMI_2017_2648793
proquest_miscellaneous_1861582148
proquest_journals_2174509576
pubmed_primary_28103192
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2018-05-01
PublicationDateYYYYMMDD 2018-05-01
PublicationDate_xml – month: 05
  year: 2018
  text: 2018-05-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev TPAMI
PublicationTitleAlternate IEEE Trans Pattern Anal Mach Intell
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
– name: Institute of Electrical and Electronics Engineers
References ref35
ref34
ref37
ref15
ref36
ref14
ref30
ref33
ref11
ref32
ref10
ref1
ref39
ref38
ref16
ref19
ref18
fisher iii (ref13) 2000
lathoud (ref42) 2004
hershey (ref12) 2000
nock (ref22) 2003
ref24
ref45
ref23
ref26
ref25
garau (ref2) 2010
ref20
ref41
ref44
ref21
ref28
gebru (ref31) 2015
ref27
ref29
kapsouras (ref7) 2016
ref9
wooters (ref17) 2008
ref4
ref3
ref6
carletta (ref8) 2005
vijayasenan (ref43) 2012
ref5
ref40
References_xml – ident: ref37
  doi: 10.1109/CVPR.2014.159
– ident: ref19
  doi: 10.1109/JPROC.2003.817150
– ident: ref34
  doi: 10.1109/TPAMI.2016.2522425
– start-page: 2654
  year: 2010
  ident: ref2
  article-title: Audio-visual synchronisation for speaker diarisation
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2010-704
– ident: ref6
  doi: 10.1007/s11042-014-2274-x
– start-page: 28
  year: 2005
  ident: ref8
  article-title: The ami meeting corpus: A pre-announcement
  publication-title: Proc Workshop Machine Learning for Multimodal Interaction
– start-page: 2170
  year: 2012
  ident: ref43
  article-title: Diartk: An open source toolkit for research in multistream speaker diarization and its application to meetings recordings.
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2012-579
– ident: ref10
  doi: 10.1109/TSP.2006.888095
– ident: ref20
  doi: 10.1109/TASL.2006.872619
– start-page: 488
  year: 2003
  ident: ref22
  article-title: Speaker localisation using audio-visual synchrony: An empirical study
  publication-title: Proc Int Conf Image Video Retrieval
  doi: 10.1007/3-540-45113-7_48
– ident: ref41
  doi: 10.1109/ICASSP.2016.7471661
– ident: ref24
  doi: 10.1145/1322192.1322254
– ident: ref35
  doi: 10.1145/2070481.2070507
– ident: ref1
  doi: 10.1109/TASL.2011.2125954
– ident: ref40
  doi: 10.1109/EUSIPCO.2015.7362413
– ident: ref26
  doi: 10.1016/j.sigpro.2011.09.032
– ident: ref33
  doi: 10.1109/TASLP.2015.2405475
– ident: ref29
  doi: 10.1162/NECO_a_00074
– ident: ref28
  doi: 10.1109/MMSP.2013.6659295
– ident: ref30
  doi: 10.1109/53.665
– start-page: 509
  year: 2008
  ident: ref17
  article-title: The ICSI rt07s speaker diarization system
  publication-title: Multimodal Technologies for Perception of Humans
  doi: 10.1007/978-3-540-68585-2_47
– ident: ref25
  doi: 10.1109/TASL.2009.2029711
– ident: ref21
  doi: 10.1109/TMM.2009.2037387
– ident: ref3
  doi: 10.1109/TPAMI.2011.47
– ident: ref23
  doi: 10.1109/ICASSP.2007.366271
– ident: ref4
  doi: 10.1007/s11042-012-1080-6
– start-page: 1
  year: 2016
  ident: ref7
  article-title: Multimodal speaker clustering in full length movies
  publication-title: Multimedia Tools Appl
– ident: ref18
  doi: 10.1016/S0167-6393(98)00048-X
– ident: ref36
  doi: 10.1016/j.specom.2014.05.005
– start-page: 813
  year: 2000
  ident: ref12
  article-title: Audio-vision: Using audio-visual synchrony to locate sounds
  publication-title: Proc Adv Neural Inform Process Syst
– start-page: 182
  year: 2004
  ident: ref42
  article-title: Av16. 3: An audio-visual corpus for speaker localization and tracking
  publication-title: Machine Learning for Multimodal Interaction
– ident: ref27
  doi: 10.1109/TASLP.2015.2444654
– ident: ref45
  doi: 10.1145/2733373.2806238
– ident: ref11
  doi: 10.1109/TMM.2007.906583
– ident: ref39
  doi: 10.1109/ICASSP.2015.7177983
– start-page: 772
  year: 2000
  ident: ref13
  article-title: Learning joint statistical models for audio-visual fusion and segregation
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref14
  doi: 10.1109/TASL.2006.881678
– start-page: 143
  year: 2015
  ident: ref31
  article-title: Audio-visual speech-turn detection and tracking
  publication-title: Proc Int Conf Latent Variable Anal Signal Separat
  doi: 10.1007/978-3-319-22482-4_17
– ident: ref32
  doi: 10.1109/ICCVW.2015.96
– ident: ref5
  doi: 10.1109/TMM.2015.2463722
– ident: ref44
  doi: 10.1109/ICCV.2013.150
– ident: ref15
  doi: 10.1109/JSTSP.2010.2057198
– ident: ref16
  doi: 10.1109/TMM.2014.2377515
– ident: ref38
  doi: 10.1109/ICCV.2009.5459303
– ident: ref9
  doi: 10.1109/CVPR.2005.274
SSID ssj0014503
Score 2.5608726
Snippet Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The...
SourceID hal
proquest
pubmed
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1086
SubjectTerms Audio data
Audio equipment
audio-visual tracking
Bayesian analysis
Cameras
Clustering
Computer Science
Computer Vision and Pattern Recognition
dynamic Bayesian network
Face
Feature extraction
Mel frequency cepstral coefficient
Microphones
Optical tracking
Signal processing
Sound
sound source localization
Speaker diarization
Speech
Visual signals
Visualization
Title Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
URI https://ieeexplore.ieee.org/document/7807334
https://www.ncbi.nlm.nih.gov/pubmed/28103192
https://www.proquest.com/docview/2174509576
https://www.proquest.com/docview/1861582148
https://inria.hal.science/hal-01413403
Volume 40
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6xnNpDefURCihUvbVZnMRJ7OPSslqqblUJqLhZtuOIFWgXdTdI5dcz4zzUVgX1ZjmTxNGM4_nsb2YA3tPWEiukjDRaT8RdnkfaViYyLK2qPGGS-Tzb02_55IJ_ucwu1-BjHwvjnPPkMzekpj_LLxe2pq2yo0JQiUE-gAECtyZWqz8x4JmvgoweDM5whBFdgAyTR-ffR9NTYnEVQ-JzoUVSCmBBBQ5k8sd6NLgiNqQvs_K4x-lXnvEGTLsxN4ST62G9MkN7_1c6x__9qE140bqg4aixmS1Yc_Nt2OjKO4TtbN-G57_lKtyByaguZ4vox2xZ481nt05fo_DnGWLtJpIzPMYFsQyxceZZ2m3Sqxvs_-UoVDMc17Q19xIuxifnnyZRW4YhsjyLVxHPbarTTOqkZDrh6HJZLco001oXueWmKp3hJrcaxSstWFVIlssiKxOdOebi9BWszxdz9wZCXmj8hcSiiiuLnmImSicMIco4YdgyAcSdMpRtc5RTqYwb5bEKk8rrUpEuVavLAD7099w2GTqelH6HOu4FKbn2ZPRVUR9RXlPO0rs4gB3SUy_VqiiAvc4kVDvTl4ogHTpdCNsCOOwv4xylgxc9d4t6qWKBfqNIEHkG8Loxpf7ZnR3u_vudb-EZDl80FMs9WF_9rN0-ukErc-Dt_wHQaP4L
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9QwDLe28QA8MNiAFQYUxBv0lrRJmz4eH6cO7iak3dDeojRNxWnT3bS7IsFfj51-CBAg3qLUaZvabuzkZxvgJW0tsSzPI4PSEwmXppGxdRmVLKnrNGY583m2ZydpcSY-nMvzLXg9xMI45zz4zI2o6c_yq5VtaKvsKFNUYlBsww1c9yVvo7WGMwMhfR1ktGFQx9GR6ENkWH40_zSeHROOKxsRogtlkpIAKypxkMe_rEjbXwgP6Qut_N3m9GvPZBdm_Vu3kJOLUbMpR_b7bwkd_3dad-FOZ4SG41Zq7sGWW-7Bbl_gIez0fQ9u_5StcB-KcVMtVtHnxbrBwadXzlwg8bsFetttLGf4BpfEKsTGqcdpd2mvLrH_m6NgzXDS0ObcfTibvJ-_LaKuEENkheSbSKQ2MYnMTVwxEwv8-NaoKpHGmCy1oqwrV4oytQbJa6NYneUszTNZxUY65njyAHaWq6U7gFBkBn8iXNW8tmgrSlU5VZJPyWOGrTIA3jND2y5LORXLuNTeW2G59rzUxEvd8TKAV8OYqzZHxz-pXyCPB0JKr12Mp5r6CPSaCJZ85QHsE58Gqo5FARz2IqE7XV9rcurQ7ELHLYDnw2XUUjp6MUu3ataaK7QcVYy-ZwAPW1Ea7t3L4aM_P_MZ3Czms6meHp98fAy3cCqqBVwews7munFP0CjalE-9LvwANtQBYw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Audio-Visual+Speaker+Diarization+Based+on+Spatiotemporal+Bayesian+Fusion&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Gebru%2C+Israel+D.&rft.au=Ba%2C+Sileye&rft.au=Li%2C+Xiaofei&rft.au=Horaud%2C+Radu&rft.date=2018-05-01&rft.issn=0162-8828&rft.eissn=2160-9292&rft.volume=40&rft.issue=5&rft.spage=1086&rft.epage=1099&rft_id=info:doi/10.1109%2FTPAMI.2017.2648793&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPAMI_2017_2648793
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon