Multiple Linear Regression: Bayesian Inference for Distributed and Big Data in the Medical Informatics Platform of the Human Brain Project

We propose a Multiple Linear Regression (MLR) methodology for the analysis of distributed and Big Data in the framework of the Medical Informatics Platform (MIP) of the Human Brain Project (HBP). MLR is a very versatile model, and is considered one of the workhorses for estimating dependences betwee...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Melie-Garcia, Lester, Draganski, Bogdan, Ashburner, John, Kherif, Ferath
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 05.01.2018
Cold Spring Harbor Laboratory
Edition1.1
Subjects
Online AccessGet full text
ISSN2692-8205
2692-8205
DOI10.1101/242883

Cover

Abstract We propose a Multiple Linear Regression (MLR) methodology for the analysis of distributed and Big Data in the framework of the Medical Informatics Platform (MIP) of the Human Brain Project (HBP). MLR is a very versatile model, and is considered one of the workhorses for estimating dependences between clinical, neuropsychological and neurophysiological variables in the field of neuroimaging. One of the main concepts behind MIP is to federate data, which is stored locally in geographically distributed sites (hospitals, customized databases, etc.) around the world. We restrain from using a unique federation node for two main reasons: first the maintenance of data privacy, and second the efficiency in management of big volumes of data in terms of latency and storage resources needed in the federation node. Considering these conditions and the distributed nature of data, MLR cannot be estimated in the classical way, which raises the necessity of modifications of the standard algorithms. We use the Bayesian formalism that provides the armamentarium necessary to implement the MLR methodology for distributed Big Data. It allows us to account for the heterogeneity of the possible mechanisms that explain data sets across sites expressed through different models of explanatory variables. This approach enables the integration of highly heterogeneous data coming from different subjects and hospitals across the globe. Additionally, it offers general and sophisticated ways, which are extendable to other statistical models, to suit high-dimensional and distributed multimodal data. This work forms part of a series of papers related to the methodological developments embedded in the MIP.
AbstractList We propose a Multiple Linear Regression (MLR) methodology for the analysis of distributed and Big Data in the framework of the Medical Informatics Platform (MIP) of the Human Brain Project (HBP). MLR is a very versatile model, and is considered one of the workhorses for estimating dependences between clinical, neuropsychological and neurophysiological variables in the field of neuroimaging. One of the main concepts behind MIP is to federate data, which is stored locally in geographically distributed sites (hospitals, customized databases, etc.) around the world. We restrain from using a unique federation node for two main reasons: first the maintenance of data privacy, and second the efficiency in management of big volumes of data in terms of latency and storage resources needed in the federation node. Considering these conditions and the distributed nature of data, MLR cannot be estimated in the classical way, which raises the necessity of modifications of the standard algorithms. We use the Bayesian formalism that provides the armamentarium necessary to implement the MLR methodology for distributed Big Data. It allows us to account for the heterogeneity of the possible mechanisms that explain data sets across sites expressed through different models of explanatory variables. This approach enables the integration of highly heterogeneous data coming from different subjects and hospitals across the globe. Additionally, it offers general and sophisticated ways, which are extendable to other statistical models, to suit high-dimensional and distributed multimodal data. This work forms part of a series of papers related to the methodological developments embedded in the MIP.
Author Melie-Garcia, Lester
Kherif, Ferath
Ashburner, John
Draganski, Bogdan
Author_xml – sequence: 1
  givenname: Lester
  surname: Melie-Garcia
  fullname: Melie-Garcia, Lester
– sequence: 2
  givenname: Bogdan
  surname: Draganski
  fullname: Draganski, Bogdan
– sequence: 3
  givenname: John
  surname: Ashburner
  fullname: Ashburner, John
– sequence: 4
  givenname: Ferath
  surname: Kherif
  fullname: Kherif, Ferath
BookMark eNpNkN1OAjEQhRuDiYj4BiZNvF5tu__eCaiQQCRGr8l0O4slS4tt18gr-NQu4oU385Mz38nJnJOesQYJueTshnPGb0UiiiI-IX2RlSIqBEt7_-YzMvR-wxgTZcbjPOmT70XbBL1rkM61QXD0BdcOvdfW3NER7NFrMHRmanRoKqS1dXSifXBatgEVBaPoSK_pBAJQbWh4R7pApStoDpR1Wwi68nTZQDhs1Na_N9N22_mOHHTM0tkNVuGCnNbQeBz-9QF5e3x4HU-j-fPTbHw_jyRPRRxVWArIZZ1IpgBiJfIkQxUnharSFIVUXU2ZYFmZVHFRZzIrZYGIspa5Yh0xIFdHX6mt-9Kfq53TW3D71fF3nX591HfOfrTow2pjW2e6SCvBcs5F0sWIfwCfZHCG
Cites_doi 10.1016/j.neuroimage.2009.03.025
10.1371/journal.pcbi.1000709
10.1109/MSP.2008.929620
10.1016/j.neuroimage.2007.07.062
10.1162/neco.1992.4.3.415
10.1016/j.neuroimage.2004.08.034
ContentType Paper
Copyright 2018. This article is published under http://creativecommons.org/licenses/by/4.0/ ( the License ). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2018, Posted by Cold Spring Harbor Laboratory
Copyright_xml – notice: 2018. This article is published under http://creativecommons.org/licenses/by/4.0/ ( the License ). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2018, Posted by Cold Spring Harbor Laboratory
DBID 8FE
8FH
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
FX.
DOI 10.1101/242883
DatabaseName ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
Biological Sciences
Biological Science Database
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
bioRxiv
DatabaseTitle Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Biological Science Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
Biological Science Database
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Publicly Available Content Database

Database_xml – sequence: 1
  dbid: FX.
  name: bioRxiv
  url: https://www.biorxiv.org/
  sourceTypes: Open Access Repository
– sequence: 2
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2692-8205
Edition 1.1
ExternalDocumentID 242883v1
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FH
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
NQS
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PROAC
RHI
FX.
ID FETCH-LOGICAL-b1523-ce92a7bf4b0daa3d2746ed348dc55e2bd55e5020694c38f6b69b8eeebfb7d0aa3
IEDL.DBID BENPR
ISSN 2692-8205
IngestDate Tue Jan 07 18:56:08 EST 2025
Fri Jul 25 09:18:25 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Keywords Bayesian linear model
Bayesian linear regression
parallel computation
model averaging
linear regression
distributed computation
variational Bayes
Human Brain Project
general linear model
MLR
Bayesian modeling
linear model
multiple linear regression
Language English
License This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-b1523-ce92a7bf4b0daa3d2746ed348dc55e2bd55e5020694c38f6b69b8eeebfb7d0aa3
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
ORCID 0000-0001-7605-2518
0000-0001-5602-8916
0000-0002-5159-5919
0000-0001-5698-0413
OpenAccessLink https://www.proquest.com/docview/2071124523?pq-origsite=%requestingapplication%
PQID 2071124523
PQPubID 2050091
PageCount 18
ParticipantIDs biorxiv_primary_242883
proquest_journals_2071124523
PublicationCentury 2000
PublicationDate 20180105
PublicationDateYYYYMMDD 2018-01-05
PublicationDate_xml – month: 01
  year: 2018
  text: 20180105
  day: 05
PublicationDecade 2010
PublicationPlace Cold Spring Harbor
PublicationPlace_xml – name: Cold Spring Harbor
PublicationTitle bioRxiv
PublicationYear 2018
Publisher Cold Spring Harbor Laboratory Press
Cold Spring Harbor Laboratory
Publisher_xml – name: Cold Spring Harbor Laboratory Press
– name: Cold Spring Harbor Laboratory
References Beal (242883v1.1) 2003
Stephan, Marshall, Penny, Friston, Fink (242883v1.9) 2007
Tzikas, Likas, Galatsanos (242883v1.12) 2008; 25
Penny, Mattout, Trujillo-Barreto (242883v1.6) 2006
Broderick, Boyd, Wibisono, Wilson, Jordan (242883v1.2) 2013
Lappalainen, Miskin (242883v1.4) 2000
Penny, Trujillo-Barreto, Friston (242883v1.8) 2005; 24
Stephan, Penny, Daunizeau, Moran, Friston (242883v1.10) 2009; 46
MacKay (242883v1.5) 1992; 4
Hoeting, Madigan, Raftery, Volinsky (242883v1.3) 1999
Penny, Stephan, Daunizeau, Rosa, Friston, Schofield, Leff (242883v1.7) 2010; 6
Trujillo-Barreto, Aubert-Vázquez, Penny (242883v1.11) 2008; 39
References_xml – year: 2000
  ident: 242883v1.4
  article-title: Ensemble learning
  publication-title: Advances in Independent Component Analysis
– start-page: 1727
  year: 2013
  end-page: 1735
  ident: 242883v1.2
  article-title: Streaming Variational Bayes
  publication-title: Advances in Neural Information Processing Systems
– year: 2003
  ident: 242883v1.1
  publication-title: PhD thesis, Gatsby Computational Neuroscience Unit
– volume: 46
  start-page: 1004
  year: 2009
  end-page: 1017
  ident: 242883v1.10
  article-title: Bayesian model selection for group studies
  publication-title: Neuroimage
  doi: 10.1016/j.neuroimage.2009.03.025
– volume: 6
  start-page: e1000709
  year: 2010
  ident: 242883v1.7
  article-title: Comparing Families of Dynamic Causal Models
  publication-title: PLoS Comput. Biol.
  doi: 10.1371/journal.pcbi.1000709
– volume: 25
  start-page: 131
  year: 2008
  end-page: 146
  ident: 242883v1.12
  article-title: The variational approximation for Bayesian inference
  publication-title: IEEE Signal Process. Mag.
  doi: 10.1109/MSP.2008.929620
– volume: 39
  start-page: 318
  year: 2008
  end-page: 335
  ident: 242883v1.11
  article-title: Bayesian M/EEG source reconstruction with spatio-temporal priors
  publication-title: Neuroimage
  doi: 10.1016/j.neuroimage.2007.07.062
– start-page: 382
  year: 1999
  end-page: 401
  ident: 242883v1.3
  article-title: Bayesian model averaging: a tutorial
  publication-title: Stat. Sci.
– volume: 4
  start-page: 415
  year: 1992
  end-page: 447
  ident: 242883v1.5
  article-title: Bayesian Interpolation
  publication-title: Neural Comput
  doi: 10.1162/neco.1992.4.3.415
– year: 2006
  ident: 242883v1.6
  publication-title: Stat. Parametr. Mapp. Anal. Funct. brain images
– volume: 24
  start-page: 350
  year: 2005
  end-page: 362
  ident: 242883v1.8
  article-title: Bayesian fMRI time series analysis with spatial priors
  publication-title: Neuroimage
  doi: 10.1016/j.neuroimage.2004.08.034
– start-page: 27
  year: 2007
  ident: 242883v1.9
  article-title: Interhemispheric Integration of Visual Processing during Task-Driven Lateralization
  publication-title: J. NeuroSci.
SSID ssj0002961374
Score 1.5254166
SecondaryResourceType preprint
Snippet We propose a Multiple Linear Regression (MLR) methodology for the analysis of distributed and Big Data in the framework of the Medical Informatics Platform...
SourceID biorxiv
proquest
SourceType Open Access Repository
Aggregation Database
SubjectTerms Bayesian analysis
Big Data
Brain
Health informatics
Hospitals
Informatics
Latency
Mathematical models
Neuroimaging
Neuroscience
Regression analysis
Statistical analysis
SummonAdditionalLinks – databaseName: bioRxiv
  dbid: FX.
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LS8NAEF60RfDmq_ioMgev0TSbbDYeay1VqBSx0FvYZwlIWtoo9i_4q51NtnoQL4HAbA6T3Z1vnh8h1yrjYWqVCGQmqEsz0kBwlQUsZrFRCbdRXYw5fmajafw0S2beUVz7skpZLFafxUedx3cF23j7Noc77N2iReGc7pI27qPIUTUMZzc_MZUoQ-OUxp5C6Fccsa3_5p8btzYjwwPSnoilWR2SHVMekb2GB3JzTL7GvqwP0DXErQcvZt7Up5Z30Bcb4zod4XHbmwcINGHgJt46siqjQZQa-sUcBqISUJSAoA58BgZ8v5GbxgyTN1G5N1jYWqaO4EPfsUTApAnJnJDp8OH1fhR4koRAoumlgTJZJFJpYxlqIahGL5MZTWOuVZKYSGp8JogJWRYryi2TLJPcGCOtTHWIKzqkVS5Kc0pAWY4OayYZisQcoYmwMjI2dFFCizjyjHS8GvNlMwojb_R7Rrpbreb-CKzzCMFLz-V16fl_6y7IPuIPXkc0ki5pVat3c4k2vpJX9Y_9BgK0pyE
  priority: 102
  providerName: Cold Spring Harbor Laboratory Press
Title Multiple Linear Regression: Bayesian Inference for Distributed and Big Data in the Medical Informatics Platform of the Human Brain Project
URI https://www.proquest.com/docview/2071124523
https://www.biorxiv.org/content/10.1101/242883
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEA62peDNV7Fayxy8ri77SBMvwtoWFVpKsdBbybMUZFvbKvYv-Kud7KZ6ELwsLJu9JJOZb75M5iPkWnEWdqwSgeQidseMcSCY4gFNaGJUymxUFGMOhvRxkjxP06kn3Da-rHLvEwtHrZfKceSOCUFokGDedL96C5xqlDtd9RIaFVJDF8zQzmtZbzga_7AsEcdwVbRijijHrR-FqRcYQlO8xfDEXL_Aulws15-Ljz_-uAgy_SNSG4mVWR-TA5OfkHqpErk7JV8DX_QHmDiiYcLYzMvq1fwOMrEz7h4kPO1v7gHCUOi6frhOyspoELmGbDGHrtgKWOSAkA_8-Qz420iuVzOMXsXWvcHSFmMKfh8ypyEBo5KwOSOTfu_l4THwEgqBxMAcB8rwSHSkTWSohYg15qDU6DhhWqWpiaTGZ4qIkfJExcxSSblkxhhpZUeH-EeDVPNlbs4JKMswneWS4pCEIXARVkbGho5DtIgym6Thp3G2KhtlzMr5bZLWflZnfoNsZr_LefH_50tyiBiFFaxH2iLV7frdXCEO2Mq2X-w2qfSnN9-LUbQ7
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3dSxtBEB9sgtS31irV2nYe2sfDY29vs1sQIY2SqAlBFHy77qcEyiWNaTX_Qv8Y_0Zn76N9KPTNl4Pj9g5udnbmN98An6ySaS9YnRilsxhmzBItrUoEF9zbXAZWJWOOJ2J4zc9u8psNeGxrYWJaZSsTK0Ht5jb6yKMnhKABJ7vpePEjiVOjYnS1HaFRs8W5X9-TyXZ3NBrQ_n5m7PTk6uswaaYKJIZ0VZZYr5jumcBN6rTOHJllwruMS2fz3DPj6JoTiBKK20wGYYQy0ntvgum5lN6g776ALo8VrR3o9k8m08s_Xh2mSD1WrZ-ZUCRqWJo3A42I9Q9JHcrYn3DTzObLh9mvf-R_pdROX0F3qhd--Ro2fLkNm_VUyvUb-D1ukgyRDFX6Y7z0t3W2bPkF-3rtY90ljtpKQSTYi4PYfzeOzvIOdemwP7vFgV5pnJVIEBObeBA21U-xNzROv-tVvMN5qNZU8QTsx5kVOK0dRDtw_SzE3YVOOS_9W0AbJJnPyghawiUBJR0M8yGNPstAqHYPdhsyFou6MUdR03cPDlqqFs2BvCv-ss_-_x9_hJfDq_FFcTGanL-DLcJHsvK45AfQWS1_-veEQVbmQ7PxCN-em9eeADvW8hE
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LT8JAEN4oROPNFxFFnYPXYulz6xGxARXSGEm4NbvdXdLEFALVyF_wVzvbLnowXpo02elhO7vzzTcvQm6yiNqhypjFI-bqMKNrMZpFVuAFnsx8qpwqGXM8CYZT73Hmzwx1sTZplTxfrD7zjyqOrxO28fatD7fdu0WLQqnb1dx0dynULmmiQvW0Osez7g-54kRopULPzBL6lUOQaz7-5-qt7El8SJoJW8rVEdmRxTHZqwdCbk7I19jk9wH6iKiD8CLndaJqcQd9tpG65BFG2yI9QMQJA936Vk-tkgJYIaCfz2HASgZ5AYjuwIRiwBQe6bbMkLyxUr_BQlVrKiof-npcBCQ1N3NKpvHD6_3QMtMSLI422LUyGTks5MrjtmDMFehuBlK4HhWZ70uHC3z6CA6DyMtcqgIeRJxKKbniobBRokUaxaKQZwQyRdFzjXiASzyKGIUp7khla7pQIaBsk5bZxnRZ98RI6_1tk852V1NzFtapgyimpwO87vl_ctdkPxnE6fNo8nRBDhCT0Irl8DukUa7e5SXa_ZJfVf_4G9jXrQk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multiple+Linear+Regression%3A+Bayesian+Inference+for+Distributed+and+Big+Data+in+the+Medical+Informatics+Platform+of+the+Human+Brain+Project&rft.jtitle=bioRxiv&rft.au=Melie-Garcia%2C+Lester&rft.au=Draganski%2C+Bogdan&rft.au=Ashburner%2C+John&rft.au=Kherif%2C+Ferath&rft.date=2018-01-05&rft.pub=Cold+Spring+Harbor+Laboratory+Press&rft.issn=2692-8205&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F242883
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon