Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around an...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 40; no. 5; pp. 1086 - 1099
Main Authors	Gebru, Israel D., Ba, Sileye, Xiaofei Li, Horaud, Radu
Format	Journal Article
Language	English
Published	United States IEEE 01.05.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers
Subjects	Audio data Audio equipment audio-visual tracking Bayesian analysis Cameras Clustering Computer Science Computer Vision and Pattern Recognition dynamic Bayesian network Face Feature extraction Mel frequency cepstral coefficient Microphones Optical tracking Signal processing Sound sound source localization Speaker diarization Speech Visual signals Visualization speaker diarization audio-visual tracking dynamic Bayesian network sound source localization
Online Access	Get full text

Cover

Loading…

Abstract	Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
AbstractList	Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
Author	Xiaofei Li Ba, Sileye Horaud, Radu Gebru, Israel D.
Author_xml	– sequence: 1 givenname: Israel D. surname: Gebru fullname: Gebru, Israel D. email: israel-dejene.gebru@inria.fr organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France – sequence: 2 givenname: Sileye surname: Ba fullname: Ba, Sileye email: sileye.ba@inria.fr organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France – sequence: 3 surname: Xiaofei Li fullname: Xiaofei Li email: xiaofei.li@inria.fr organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France – sequence: 4 givenname: Radu surname: Horaud fullname: Horaud, Radu email: radu.horaud@inria.fr organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/28103192$$D View this record in MEDLINE/PubMed https://inria.hal.science/hal-01413403$$DView record in HAL
BookMark	eNp9kU9v1DAQxS1URLeFLwASisQFDln8385xWyhbaRFILVyt2WQiXLJxaidI5dPjdLc99MBpRqPfm9G8d0KO-tAjIa8ZXTJGq4_X31dfL5ecMrPkWlpTiWdkwZmmZcUrfkQWlGleWsvtMTlJ6YZSJhUVL8gxt4wKVvEFWa-mxofyp08TdMXVgPAbY_HJQ_R_YfShL84gYVPk5mqYByPuhhAzewZ3mDz0xcWUMveSPG-hS_jqUE_Jj4vP1-frcvPty-X5alPWUrGxlLoWIFQFvKHApZKqBtsIBQBG13LbNriVW11DxluwtDUV1ZVRDQeFFJk4JR_2e39B54bodxDvXADv1quNm2f5SSYkFX9m9v2eHWK4nTCNbudTjV0HPYYpOWY1U5YzaTP67gl6E6bY508cZybbVimjM_X2QE3bHTaP9x_8zIDdA3UMKUVsXe3Hex_HCL5zjLo5OncfnZujc4fospQ_kT5s_6_ozV7kEfFRYCw1QkjxDxp2olU
CODEN	ITPIDJ
CitedBy_id	crossref_primary_10_3390_bioengineering11121233 crossref_primary_10_1109_TASLP_2023_3346643 crossref_primary_10_1109_TMM_2020_3007350 crossref_primary_10_3233_IDT_211005 crossref_primary_10_1109_ACCESS_2020_3007312 crossref_primary_10_1109_JSTSP_2020_2987728 crossref_primary_10_1109_TPAMI_2019_2953020 crossref_primary_10_1109_TBIOM_2024_3412821 crossref_primary_10_1109_TPAMI_2022_3167045 crossref_primary_10_1007_s10462_022_10224_2 crossref_primary_10_1109_TPAMI_2022_3232854 crossref_primary_10_1121_10_0002924 crossref_primary_10_1007_s11042_024_18457_9 crossref_primary_10_1016_j_inffus_2023_102204 crossref_primary_10_3390_info16030233 crossref_primary_10_1109_TMM_2023_3301221 crossref_primary_10_1109_ACCESS_2021_3074797 crossref_primary_10_1007_s12369_025_01213_w crossref_primary_10_1109_TMM_2019_2937185 crossref_primary_10_1145_3657030 crossref_primary_10_1109_ACCESS_2023_3325474 crossref_primary_10_1109_ACCESS_2024_3426670 crossref_primary_10_1109_TASLP_2020_2980974 crossref_primary_10_1109_TMM_2021_3061800 crossref_primary_10_1007_s11042_018_5944_2 crossref_primary_10_3390_s19235163 crossref_primary_10_3390_sym11091154 crossref_primary_10_1007_s10772_020_09681_3 crossref_primary_10_1109_JSTSP_2019_2903472 crossref_primary_10_1109_TPAMI_2018_2798607 crossref_primary_10_3390_s24134229 crossref_primary_10_1109_TMM_2019_2902489 crossref_primary_10_1016_j_neunet_2020_10_003 crossref_primary_10_1109_OJSP_2024_3363649 crossref_primary_10_3390_s23156969 crossref_primary_10_3390_s20102948
Cites_doi	10.1109/CVPR.2014.159 10.1109/JPROC.2003.817150 10.1109/TPAMI.2016.2522425 10.21437/Interspeech.2010-704 10.1007/s11042-014-2274-x 10.21437/Interspeech.2012-579 10.1109/TSP.2006.888095 10.1109/TASL.2006.872619 10.1007/3-540-45113-7_48 10.1109/ICASSP.2016.7471661 10.1145/1322192.1322254 10.1145/2070481.2070507 10.1109/TASL.2011.2125954 10.1109/EUSIPCO.2015.7362413 10.1016/j.sigpro.2011.09.032 10.1109/TASLP.2015.2405475 10.1162/NECO_a_00074 10.1109/MMSP.2013.6659295 10.1109/53.665 10.1007/978-3-540-68585-2_47 10.1109/TASL.2009.2029711 10.1109/TMM.2009.2037387 10.1109/TPAMI.2011.47 10.1109/ICASSP.2007.366271 10.1007/s11042-012-1080-6 10.1016/S0167-6393(98)00048-X 10.1016/j.specom.2014.05.005 10.1109/TASLP.2015.2444654 10.1145/2733373.2806238 10.1109/TMM.2007.906583 10.1109/ICASSP.2015.7177983 10.1109/TASL.2006.881678 10.1007/978-3-319-22482-4_17 10.1109/ICCVW.2015.96 10.1109/TMM.2015.2463722 10.1109/ICCV.2013.150 10.1109/JSTSP.2010.2057198 10.1109/TMM.2014.2377515 10.1109/ICCV.2009.5459303 10.1109/CVPR.2005.274
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	97E RIA RIE AAYXX CITATION NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 1XC VOOES
DOI	10.1109/TPAMI.2017.2648793
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle	CrossRef PubMed Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic PubMed Technology Research Database
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	2160-9292 1939-3539
EndPage	1099
ExternalDocumentID	oai_HAL_hal_01413403v1 28103192 10_1109_TPAMI_2017_2648793 7807334
Genre	orig-research Research Support, Non-U.S. Gov't Journal Article
GrantInformation_xml	– fundername: European Union FP7 ERC Advanced grantid: VHIA (#340113) – fundername: XEROX University Affairs Committee (UAC) grantid: 2015-2017
GroupedDBID	--- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB ~02 AAYXX CITATION RIG 5VS 9M8 AAYOK ABFSI ADRHT AETEA AETIX AGSQL AI. AIBXA ALLEH FA8 H~9 IBMZZ ICLAB IFJZH NPM RNI RZB VH1 XJT 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 1XC VOOES
ID	FETCH-LOGICAL-c451t-46c3a359a2d0a24545ca8d35aaa76c4bfdeb4b6cac45fa80f7906975d2a5e0e13
IEDL.DBID	RIE
ISSN	0162-8828
IngestDate	Fri May 09 12:14:06 EDT 2025 Fri Jul 11 15:44:20 EDT 2025 Sun Jun 29 16:57:37 EDT 2025 Tue Apr 08 05:57:00 EDT 2025 Thu Apr 24 22:51:11 EDT 2025 Tue Jul 01 03:18:23 EDT 2025 Wed Aug 27 02:47:49 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	5
Keywords	speaker diarization audio-visual tracking dynamic Bayesian network sound source localization
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c451t-46c3a359a2d0a24545ca8d35aaa76c4bfdeb4b6cac45fa80f7906975d2a5e0e13
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0001-5232-024X
OpenAccessLink	https://inria.hal.science/hal-01413403
PMID	28103192
PQID	2174509576
PQPubID	85458
PageCount	14
ParticipantIDs	hal_primary_oai_HAL_hal_01413403v1 ieee_primary_7807334 crossref_citationtrail_10_1109_TPAMI_2017_2648793 crossref_primary_10_1109_TPAMI_2017_2648793 proquest_miscellaneous_1861582148 proquest_journals_2174509576 pubmed_primary_28103192
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2018-05-01
PublicationDateYYYYMMDD	2018-05-01
PublicationDate_xml	– month: 05 year: 2018 text: 2018-05-01 day: 01
PublicationDecade	2010
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: New York
PublicationTitle	IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev	TPAMI
PublicationTitleAlternate	IEEE Trans Pattern Anal Mach Intell
PublicationYear	2018
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) – name: Institute of Electrical and Electronics Engineers
References	ref35 ref34 ref37 ref15 ref36 ref14 ref30 ref33 ref11 ref32 ref10 ref1 ref39 ref38 ref16 ref19 ref18 fisher iii (ref13) 2000 lathoud (ref42) 2004 hershey (ref12) 2000 nock (ref22) 2003 ref24 ref45 ref23 ref26 ref25 garau (ref2) 2010 ref20 ref41 ref44 ref21 ref28 gebru (ref31) 2015 ref27 ref29 kapsouras (ref7) 2016 ref9 wooters (ref17) 2008 ref4 ref3 ref6 carletta (ref8) 2005 vijayasenan (ref43) 2012 ref5 ref40
References_xml	– ident: ref37 doi: 10.1109/CVPR.2014.159 – ident: ref19 doi: 10.1109/JPROC.2003.817150 – ident: ref34 doi: 10.1109/TPAMI.2016.2522425 – start-page: 2654 year: 2010 ident: ref2 article-title: Audio-visual synchronisation for speaker diarisation publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2010-704 – ident: ref6 doi: 10.1007/s11042-014-2274-x – start-page: 28 year: 2005 ident: ref8 article-title: The ami meeting corpus: A pre-announcement publication-title: Proc Workshop Machine Learning for Multimodal Interaction – start-page: 2170 year: 2012 ident: ref43 article-title: Diartk: An open source toolkit for research in multistream speaker diarization and its application to meetings recordings. publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2012-579 – ident: ref10 doi: 10.1109/TSP.2006.888095 – ident: ref20 doi: 10.1109/TASL.2006.872619 – start-page: 488 year: 2003 ident: ref22 article-title: Speaker localisation using audio-visual synchrony: An empirical study publication-title: Proc Int Conf Image Video Retrieval doi: 10.1007/3-540-45113-7_48 – ident: ref41 doi: 10.1109/ICASSP.2016.7471661 – ident: ref24 doi: 10.1145/1322192.1322254 – ident: ref35 doi: 10.1145/2070481.2070507 – ident: ref1 doi: 10.1109/TASL.2011.2125954 – ident: ref40 doi: 10.1109/EUSIPCO.2015.7362413 – ident: ref26 doi: 10.1016/j.sigpro.2011.09.032 – ident: ref33 doi: 10.1109/TASLP.2015.2405475 – ident: ref29 doi: 10.1162/NECO_a_00074 – ident: ref28 doi: 10.1109/MMSP.2013.6659295 – ident: ref30 doi: 10.1109/53.665 – start-page: 509 year: 2008 ident: ref17 article-title: The ICSI rt07s speaker diarization system publication-title: Multimodal Technologies for Perception of Humans doi: 10.1007/978-3-540-68585-2_47 – ident: ref25 doi: 10.1109/TASL.2009.2029711 – ident: ref21 doi: 10.1109/TMM.2009.2037387 – ident: ref3 doi: 10.1109/TPAMI.2011.47 – ident: ref23 doi: 10.1109/ICASSP.2007.366271 – ident: ref4 doi: 10.1007/s11042-012-1080-6 – start-page: 1 year: 2016 ident: ref7 article-title: Multimodal speaker clustering in full length movies publication-title: Multimedia Tools Appl – ident: ref18 doi: 10.1016/S0167-6393(98)00048-X – ident: ref36 doi: 10.1016/j.specom.2014.05.005 – start-page: 813 year: 2000 ident: ref12 article-title: Audio-vision: Using audio-visual synchrony to locate sounds publication-title: Proc Adv Neural Inform Process Syst – start-page: 182 year: 2004 ident: ref42 article-title: Av16. 3: An audio-visual corpus for speaker localization and tracking publication-title: Machine Learning for Multimodal Interaction – ident: ref27 doi: 10.1109/TASLP.2015.2444654 – ident: ref45 doi: 10.1145/2733373.2806238 – ident: ref11 doi: 10.1109/TMM.2007.906583 – ident: ref39 doi: 10.1109/ICASSP.2015.7177983 – start-page: 772 year: 2000 ident: ref13 article-title: Learning joint statistical models for audio-visual fusion and segregation publication-title: Proc Adv Neural Inf Process Syst – ident: ref14 doi: 10.1109/TASL.2006.881678 – start-page: 143 year: 2015 ident: ref31 article-title: Audio-visual speech-turn detection and tracking publication-title: Proc Int Conf Latent Variable Anal Signal Separat doi: 10.1007/978-3-319-22482-4_17 – ident: ref32 doi: 10.1109/ICCVW.2015.96 – ident: ref5 doi: 10.1109/TMM.2015.2463722 – ident: ref44 doi: 10.1109/ICCV.2013.150 – ident: ref15 doi: 10.1109/JSTSP.2010.2057198 – ident: ref16 doi: 10.1109/TMM.2014.2377515 – ident: ref38 doi: 10.1109/ICCV.2009.5459303 – ident: ref9 doi: 10.1109/CVPR.2005.274
SSID	ssj0014503
Score	2.5608726
Snippet	Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The...
SourceID	hal proquest pubmed crossref ieee
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	1086
SubjectTerms	Audio data Audio equipment audio-visual tracking Bayesian analysis Cameras Clustering Computer Science Computer Vision and Pattern Recognition dynamic Bayesian network Face Feature extraction Mel frequency cepstral coefficient Microphones Optical tracking Signal processing Sound sound source localization Speaker diarization Speech Visual signals Visualization
Title	Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
URI	https://ieeexplore.ieee.org/document/7807334 https://www.ncbi.nlm.nih.gov/pubmed/28103192 https://www.proquest.com/docview/2174509576 https://www.proquest.com/docview/1861582148 https://inria.hal.science/hal-01413403
Volume	40
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6xnNpDefURCihUvbVZnMRJ7OPSslqqblUJqLhZtuOIFWgXdTdI5dcz4zzUVgX1ZjmTxNGM4_nsb2YA3tPWEiukjDRaT8RdnkfaViYyLK2qPGGS-Tzb02_55IJ_ucwu1-BjHwvjnPPkMzekpj_LLxe2pq2yo0JQiUE-gAECtyZWqz8x4JmvgoweDM5whBFdgAyTR-ffR9NTYnEVQ-JzoUVSCmBBBQ5k8sd6NLgiNqQvs_K4x-lXnvEGTLsxN4ST62G9MkN7_1c6x__9qE140bqg4aixmS1Yc_Nt2OjKO4TtbN-G57_lKtyByaguZ4vox2xZ481nt05fo_DnGWLtJpIzPMYFsQyxceZZ2m3Sqxvs_-UoVDMc17Q19xIuxifnnyZRW4YhsjyLVxHPbarTTOqkZDrh6HJZLco001oXueWmKp3hJrcaxSstWFVIlssiKxOdOebi9BWszxdz9wZCXmj8hcSiiiuLnmImSicMIco4YdgyAcSdMpRtc5RTqYwb5bEKk8rrUpEuVavLAD7099w2GTqelH6HOu4FKbn2ZPRVUR9RXlPO0rs4gB3SUy_VqiiAvc4kVDvTl4ogHTpdCNsCOOwv4xylgxc9d4t6qWKBfqNIEHkG8Loxpf7ZnR3u_vudb-EZDl80FMs9WF_9rN0-ukErc-Dt_wHQaP4L
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9QwDLe28QA8MNiAFQYUxBv0lrRJmz4eH6cO7iak3dDeojRNxWnT3bS7IsFfj51-CBAg3qLUaZvabuzkZxvgJW0tsSzPI4PSEwmXppGxdRmVLKnrNGY583m2ZydpcSY-nMvzLXg9xMI45zz4zI2o6c_yq5VtaKvsKFNUYlBsww1c9yVvo7WGMwMhfR1ktGFQx9GR6ENkWH40_zSeHROOKxsRogtlkpIAKypxkMe_rEjbXwgP6Qut_N3m9GvPZBdm_Vu3kJOLUbMpR_b7bwkd_3dad-FOZ4SG41Zq7sGWW-7Bbl_gIez0fQ9u_5StcB-KcVMtVtHnxbrBwadXzlwg8bsFetttLGf4BpfEKsTGqcdpd2mvLrH_m6NgzXDS0ObcfTibvJ-_LaKuEENkheSbSKQ2MYnMTVwxEwv8-NaoKpHGmCy1oqwrV4oytQbJa6NYneUszTNZxUY65njyAHaWq6U7gFBkBn8iXNW8tmgrSlU5VZJPyWOGrTIA3jND2y5LORXLuNTeW2G59rzUxEvd8TKAV8OYqzZHxz-pXyCPB0JKr12Mp5r6CPSaCJZ85QHsE58Gqo5FARz2IqE7XV9rcurQ7ELHLYDnw2XUUjp6MUu3ataaK7QcVYy-ZwAPW1Ea7t3L4aM_P_MZ3Czms6meHp98fAy3cCqqBVwews7munFP0CjalE-9LvwANtQBYw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Audio-Visual+Speaker+Diarization+Based+on+Spatiotemporal+Bayesian+Fusion&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Gebru%2C+Israel+D.&rft.au=Ba%2C+Sileye&rft.au=Li%2C+Xiaofei&rft.au=Horaud%2C+Radu&rft.date=2018-05-01&rft.issn=0162-8828&rft.eissn=2160-9292&rft.volume=40&rft.issue=5&rft.spage=1086&rft.epage=1099&rft_id=info:doi/10.1109%2FTPAMI.2017.2648793&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPAMI_2017_2648793
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon