Bayesian Transfer Learning (2312.13484v1)
Abstract: Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical engineering have sought to bring these ideas into focus, primarily surveying general methodologies and works from these disciplines. This article highlights Bayesian approaches to transfer learning, which have received relatively limited attention despite their innate compatibility with the notion of drawing upon prior knowledge to guide new learning tasks. Our survey encompasses a wide range of Bayesian transfer learning frameworks applicable to a variety of practical settings. We discuss how these methods address the problem of finding the optimal information to transfer between domains, which is a central question in transfer learning. We illustrate the utility of Bayesian transfer learning methods via a simulation study where we compare performance against frequentist competitors.
- {barticle}[author] \bauthor\bsnmAbel Rodríguez, \bfnmDavid B Dunson\binitsD. B. D. and \bauthor\bsnmGelfand, \bfnmAlan E\binitsA. E. (\byear2008). \btitleThe nested Dirichlet process. \bjournalJournal of the American Statistical Association. \endbibitem
- {barticle}[author] \bauthor\bsnmAvrahami, \bfnmOmri\binitsO., \bauthor\bsnmLischinski, \bfnmDani\binitsD. and \bauthor\bsnmFried, \bfnmOhad\binitsO. (\byear2021). \btitleGAN cocktail: Mixing GANs without dataset access. \bjournalEuropean Conference on Computer Vision. \endbibitem
- {barticle}[author] \bauthor\bsnmBaglama, \bfnmJames\binitsJ. and \bauthor\bsnmReichel, \bfnmLothar\binitsL. (\byear2005). \btitleAugmented implicitly restarted Lanczos bidiagonalization methods. \bjournalSIAM Journal on Scientific Computing. \endbibitem
- {barticle}[author] \bauthor\bsnmBehseta, \bfnmSam\binitsS., \bauthor\bsnmKass, \bfnmRobert E\binitsR. E. and \bauthor\bsnmWallstrom, \bfnmGarrick L\binitsG. L. (\byear2005). \btitleHierarchical models for assessing variability among functions. \bjournalBiometrika. \endbibitem
- {barticle}[author] \bauthor\bsnmBoluki, \bfnmShahin\binitsS., \bauthor\bsnmQian, \bfnmXiaoning\binitsX. and \bauthor\bsnmDougherty, \bfnmEdward R\binitsE. R. (\byear2021). \btitleOptimal Bayesian supervised domain adaptation for RNA sequencing data. \bjournalBioinformatics. \endbibitem
- {barticle}[author] \bauthor\bsnmChandra, \bfnmNoirrit Kiran\binitsN. K., \bauthor\bsnmCanale, \bfnmAntonio\binitsA. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2023). \btitleEscaping the curse of dimensionality in Bayesian model-based clustering. \bjournalJournal of Machine Learning Research. \endbibitem
- {barticle}[author] \bauthor\bsnmChandra, \bfnmNoirrit Kiran\binitsN. K., \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. and \bauthor\bsnmXu, \bfnmJason\binitsJ. (\byear2023). \btitleInferring covariance structure from multiple data sources via subspace factor analysis. arXiv preprint arXiv 2305.04113. \endbibitem
- {barticle}[author] \bauthor\bsnmChandra, \bfnmRohitash\binitsR. and \bauthor\bsnmKapoor, \bfnmArpit\binitsA. (\byear2020). \btitleBayesian neural multi-source transfer learning. \bjournalNeurocomputing. \endbibitem
- {barticle}[author] \bauthor\bsnmChen, \bfnmMing-Hui\binitsM.-H. and \bauthor\bsnmIbrahim, \bfnmJoseph\binitsJ. (\byear2006). \btitleThe relationship between the power prior and hierarchical models. \bjournalBayesian Analysis. \endbibitem
- {barticle}[author] \bauthor\bsnmChen, \bfnmMing-Hui\binitsM.-H. and \bauthor\bsnmIbrahim, \bfnmJoseph G.\binitsJ. G. (\byear2000). \btitlePower prior distributions for regression models. \bjournalStatistical Science. \endbibitem
- {barticle}[author] \bauthor\bsnmDai, \bfnmBin\binitsB., \bauthor\bsnmWang, \bfnmZiyu\binitsZ. and \bauthor\bsnmWipf, \bfnmDavid\binitsD. (\byear2020). \btitleThe usual suspects? Reassessing blame for VAE posterior collapse. \bjournalInternational Conference on Machine Learning. \endbibitem
- {barticle}[author] \bauthor\bsnmDaniel R. Kowal, \bfnmDavid S. Matteson\binitsD. S. M. and \bauthor\bsnmRuppert, \bfnmDavid\binitsD. (\byear2019). \btitleFunctional autoregression for sparsely sampled data. \bjournalJournal of Business & Economic Statistics. \endbibitem
- {barticle}[author] \bauthor\bsnmDuan, \bfnmYuyan\binitsY., \bauthor\bsnmYe, \bfnmKeying\binitsK. and \bauthor\bsnmSmith, \bfnmEric P.\binitsE. P. (\byear2006). \btitleEvaluating water quality using power priors to incorporate historical information. \bjournalEnvironmetrics. \endbibitem
- {barticle}[author] \bauthor\bsnmDurante, \bfnmDaniele\binitsD. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2018). \btitleBayesian inference and testing of group differences in brain networks. \bjournalBayesian Analysis. \endbibitem
- {barticle}[author] \bauthor\bsnmEleftheriadis, \bfnmStefanos\binitsS., \bauthor\bsnmRudovic, \bfnmOgnjen\binitsO. and \bauthor\bsnmPantic, \bfnmMaja\binitsM. (\byear2014). \btitleDiscriminative shared gaussian processes for multiview and view-invariant facial expression recognition. \bjournalIEEE Transactions on Image Processing. \endbibitem
- {barticle}[author] \bauthor\bsnmFerrari, \bfnmFederico\binitsF. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2021). \btitleBayesian factor analysis for inference on interactions. \bjournalJournal of the American Statistical Association. \endbibitem
- {barticle}[author] \bauthor\bsnmGreen, \bfnmPeter J.\binitsP. J. (\byear1995). \btitleReversible jump Markov chain Monte Carlo computation and Bayesian model determination. \bjournalBiometrika. \endbibitem
- {barticle}[author] \bauthor\bsnmGönen, \bfnmMehmet\binitsM. and \bauthor\bsnmMargolin, \bfnmA. A.\binitsA. A. (\byear2014). \btitleKernelized Bayesian transfer learning. \bjournalAAAI Conference on Artificial Intelligence. \endbibitem
- {barticle}[author] \bauthor\bsnmHoff, \bfnmPeter\binitsP. (\byear2007). \btitleModeling homophily and stochastic equivalence in symmetric relational data. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
- {barticle}[author] \bauthor\bsnmIbrahim, \bfnmJ. G.\binitsJ. G., \bauthor\bsnmChen, \bfnmM. H.\binitsM. H. and \bauthor\bsnmSinha, \bfnmD.\binitsD. (\byear2001). \btitleBayesian semiparametric models for survival data with a cure fraction. \bjournalBiometrics. \endbibitem
- {barticle}[author] \bauthor\bsnmIbrahim, \bfnmJoseph G\binitsJ. G., \bauthor\bsnmChen, \bfnmMing-Hui\binitsM.-H. and \bauthor\bsnmSinha, \bfnmDebajyoti\binitsD. (\byear2003). \btitleOn optimality properties of the power prior. \bjournalJournal of the American Statistical Association. \endbibitem
- {barticle}[author] \bauthor\bsnmKapoor, \bfnmSanyam\binitsS., \bauthor\bsnmKaraletsos, \bfnmTheofanis\binitsT. and \bauthor\bsnmBui, \bfnmThang D\binitsT. D. (\byear2021). \btitleVariational auto-regressive Gaussian processes for continual learning. \bjournalInternational Conference on Machine Learning. \endbibitem
- {barticle}[author] \bauthor\bsnmKarbalayghareh, \bfnmAlireza\binitsA., \bauthor\bsnmQian, \bfnmXiaoning\binitsX. and \bauthor\bsnmDougherty, \bfnmEdward R.\binitsE. R. (\byear2018). \btitleOptimal Bayesian transfer learning. \bjournalIEEE Transactions on Signal Processing. \endbibitem
- {barticle}[author] \bauthor\bsnmKarbalayghareh, \bfnmAlireza\binitsA., \bauthor\bsnmQian, \bfnmXiaoning\binitsX. and \bauthor\bsnmDougherty, \bfnmEdward R.\binitsE. R. (\byear2018). \btitleOptimal Bayesian transfer regression. \bjournalIEEE Signal Processing Letters. \endbibitem
- {barticle}[author] \bauthor\bsnmKarbalayghareh, \bfnmAlireza\binitsA., \bauthor\bsnmQian, \bfnmXiaoning\binitsX. and \bauthor\bsnmDougherty, \bfnmEdward R.\binitsE. R. (\byear2021). \btitleOptimal Bayesian transfer learning for count data. \bjournalIEEE/ACM Transactions on Computational Biology and Bioinformatics. \endbibitem
- {barticle}[author] \bauthor\bsnmKouw, \bfnmWouter M.\binitsW. M. and \bauthor\bsnmLoog, \bfnmMarco\binitsM. (\byear2019). \btitleAn introduction to domain adaptation and transfer learning. arXiv preprint arXiv 1812.11806. \endbibitem
- {barticle}[author] \bauthor\bsnmKumar, \bfnmAbhishek\binitsA., \bauthor\bsnmChatterjee, \bfnmSunabha\binitsS. and \bauthor\bsnmRai, \bfnmPiyush\binitsP. (\byear2021). \btitleBayesian structural adaptation for continual learning. \bjournalInternational Conference on Machine Learning. \endbibitem
- {barticle}[author] \bauthor\bsnmLake, \bfnmBrenden M.\binitsB. M., \bauthor\bsnmSalakhutdinov, \bfnmRuslan\binitsR. and \bauthor\bsnmTenenbaum, \bfnmJoshua B.\binitsJ. B. (\byear2015). \btitleHuman-level concept learning through probabilistic program induction. \bjournalScience. \endbibitem
- {barticle}[author] \bauthor\bsnmLake, \bfnmBrenden M\binitsB. M., \bauthor\bsnmSalakhutdinov, \bfnmRuss R\binitsR. R. and \bauthor\bsnmTenenbaum, \bfnmJosh\binitsJ. (\byear2013). \btitleOne-shot learning by inverting a compositional causal process. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
- {barticle}[author] \bauthor\bsnmLawrence, \bfnmNeil D.\binitsN. D. and \bauthor\bsnmMoore, \bfnmAndrew J.\binitsA. J. (\byear2007). \btitleHierarchical Gaussian process latent variable models. \bjournalInternational Conference on Machine Learning. \endbibitem
- {bmisc}[author] \bauthor\bsnmLeBlanc, \bfnmPatrick M.\binitsP. M. and \bauthor\bsnmBanks, \bfnmDavid\binitsD. (\byear2023). \btitleTime-varying Bayesian network meta-analysis. arXiv preprint arXiv 2211.08312. \endbibitem
- {barticle}[author] \bauthor\bsnmLock, \bfnmEric F.\binitsE. F. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2015). \btitleShared kernel Bayesian screening. \bjournalBiometrika. \endbibitem
- {barticle}[author] \bauthor\bsnmLopes, \bfnmHedibert Freitas\binitsH. F. and \bauthor\bsnmWest, \bfnmMike\binitsM. (\byear2004). \btitleBayesian model assessment in factor analysis. \bjournalStatistica Sinica. \endbibitem
- {barticle}[author] \bauthor\bsnmLu, \bfnmG.\binitsG. and \bauthor\bsnmAdes, \bfnmA. E.\binitsA. E. (\byear2004). \btitleCombination of direct and indirect evidence in mixed treatment comparisons. \bjournalStatistics in Medicine. \endbibitem
- {barticle}[author] \bauthor\bsnmLu, \bfnmGuobing\binitsG. and \bauthor\bsnmAdes, \bfnmA. E.\binitsA. E. (\byear2006). \btitleAssessing evidence inconsistency in mixed treatment comparisons. \bjournalJournal of the American Statistical Association. \endbibitem
- {bincollection}[author] \bauthor\bsnmMcCloskey, \bfnmMichael\binitsM. and \bauthor\bsnmCohen, \bfnmNeal J.\binitsN. J. (\byear1989). \btitleCatastrophic interference in connectionist networks: The sequential learning problem. \bseriesPsychology of Learning and Motivation. \endbibitem
- {bmisc}[author] \bauthor\bsnmMolstad, \bfnmAaron J.\binitsA. J., \bauthor\bsnmEkvall, \bfnmKarl Oskar\binitsK. O. and \bauthor\bsnmSuder, \bfnmPiotr M.\binitsP. M. (\byear2022). \btitleDirect covariance matrix estimation with compositional data. arXiv preprint arXiv 2212.09833. \endbibitem
- {barticle}[author] \bauthor\bsnmMüller, \bfnmPeter\binitsP., \bauthor\bsnmQuintana, \bfnmFernando\binitsF. and \bauthor\bsnmRosner, \bfnmGary\binitsG. (\byear2004). \btitleA method for combining inference across related nonparametric Bayesian models. \bjournalJournal of the Royal Statistical Society: Series B (Statistical Methodology). \endbibitem
- {barticle}[author] \bauthor\bsnmPan, \bfnmSinno Jialin\binitsS. J. and \bauthor\bsnmYang, \bfnmQiang\binitsQ. (\byear2010). \btitleA survey on transfer learning. \bjournalIEEE Transactions on Knowledge and Data Engineering. \endbibitem
- {barticle}[author] \bauthor\bsnmQuefeng Li, \bfnmJianqing Fan\binitsJ. F. \bsuffixGuang Cheng and \bauthor\bsnmWang, \bfnmYuyan\binitsY. (\byear2018). \btitleEmbracing the blessing of dimensionality in factor models. \bjournalJournal of the American Statistical Association. \endbibitem
- {barticle}[author] \bauthor\bsnmRavi, \bfnmSachin\binitsS. and \bauthor\bsnmBeatson, \bfnmAlex\binitsA. (\byear2019). \btitleAmortized Bayesian meta-learning. \bjournalInternational Conference on Learning Representations. \endbibitem
- {barticle}[author] \bauthor\bsnmSai Li, \bfnmT. Tony Cai\binitsT. T. C. and \bauthor\bsnmLi, \bfnmHongzhe\binitsH. (\byear2022). \btitleTransfer learning in large-scale Gaussian graphical models with false discovery rate control. \bjournalJournal of the American Statistical Association. \endbibitem
- {barticle}[author] \bauthor\bsnmSalakhutdinov, \bfnmRuslan\binitsR., \bauthor\bsnmTenenbaum, \bfnmJoshua\binitsJ. and \bauthor\bsnmTorralba, \bfnmAntonio\binitsA. (\byear2012). \btitleOne-shot learning with a hierarchical nonparametric Bayesian model. \bjournalProceedings of ICML Workshop on Unsupervised and Transfer Learning. \endbibitem
- {barticle}[author] \bauthor\bsnmSamorodnitsky, \bfnmSarah\binitsS., \bauthor\bsnmHoadley, \bfnmKatherine\binitsK. and \bauthor\bsnmLock, \bfnmEric\binitsE. (\byear2020). \btitleA pan-cancer and polygenic Bayesian hierarchical model for the effect of somatic mutations on survival. \bjournalCancer Informatics. \endbibitem
- {barticle}[author] \bauthor\bsnmTony Cai, \bfnmWeidong Liu\binitsW. L. and \bauthor\bsnmLuo, \bfnmXi\binitsX. (\byear2011). \btitleA constrained L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT minimization approach to sparse precision matrix estimation. \bjournalJournal of the American Statistical Association. \endbibitem
- {barticle}[author] \bauthor\bsnmWang, \bfnmBoyu\binitsB. and \bauthor\bsnmPineau, \bfnmJoelle\binitsJ. (\byear2015). \btitleOnline boosting algorithms for anytime transfer and multitask learning. \bjournalAAAI Conference on Artificial Intelligence. \endbibitem
- {barticle}[author] \bauthor\bsnmWang, \bfnmYixin\binitsY., \bauthor\bsnmBlei, \bfnmDavid\binitsD. and \bauthor\bsnmCunningham, \bfnmJohn P\binitsJ. P. (\byear2021). \btitlePosterior collapse and latent variable non-identifiability. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
- {barticle}[author] \bauthor\bsnmWang, \bfnmZihao\binitsZ. and \bauthor\bsnmZiyin, \bfnmLiu\binitsL. (\byear2022). \btitlePosterior collapse of a linear latent variable model. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
- {barticle}[author] \bauthor\bsnmWilson, \bfnmAndrew Gordon\binitsA. G., \bauthor\bsnmKnowles, \bfnmDavid A.\binitsD. A. and \bauthor\bsnmGhahramani, \bfnmZoubin\binitsZ. (\byear2012). \btitleGaussian process regression networks. \bjournalInternational Conference on Machine Learning. \endbibitem
- {barticle}[author] \bauthor\bsnmWood, \bfnmFrank\binitsF. and \bauthor\bsnmTeh, \bfnmYee Whye\binitsY. W. (\byear2009). \btitleA hierarchical nonparametric Bayesian approach to statistical language model domain adaptation. \bjournalInternational Conference on Artificial Intelligence and Statistics. \endbibitem
- {barticle}[author] \bauthor\bsnmXu, \bfnmJason\binitsJ. and \bauthor\bsnmLange, \bfnmKenneth\binitsK. (\byear2022). \btitleA proximal distance algorithm for likelihood-based sparse covariance estimation. \bjournalBiometrika. \endbibitem
- {barticle}[author] \bauthor\bsnmXu, \bfnmJu\binitsJ. and \bauthor\bsnmZhu, \bfnmZhanxing\binitsZ. (\byear2018). \btitleReinforced continual learning. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
- {bmisc}[author] \bauthor\bsnmXu, \bfnmMaoran\binitsM., \bauthor\bsnmHerring, \bfnmAmy H.\binitsA. H. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2023). \btitleIdentifiable and interpretable nonparametric factor analysis. arXiv preprint arXiv 2311.08254. \endbibitem
- {barticle}[author] \bauthor\bsnmXuan, \bfnmJunyu\binitsJ., \bauthor\bsnmLu, \bfnmJie\binitsJ. and \bauthor\bsnmZhang, \bfnmGuangquan\binitsG. (\byear2021). \btitleBayesian transfer learning: An overview of probabilistic graphical models for transfer learning. arXiv preprint arXiv 2109.13233. \endbibitem
- {barticle}[author] \bauthor\bsnmYousefi, \bfnmFariba\binitsF., \bauthor\bsnmSmith, \bfnmMichael T\binitsM. T. and \bauthor\bsnmÁlvarez, \bfnmMauricio\binitsM. (\byear2019). \btitleMulti-task learning for aggregated data using Gaussian processes. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
- {barticle}[author] \bauthor\bsnmYuanpei Cao, \bfnmWei Lin\binitsW. L. and \bauthor\bsnmLi, \bfnmHongzhe\binitsH. (\byear2019). \btitleLarge covariance estimation for compositional data via composition-adjusted thresholding. \bjournalJournal of the American Statistical Association. \endbibitem
- {barticle}[author] \bauthor\bsnmZhou, \bfnmAurick\binitsA. and \bauthor\bsnmLevine, \bfnmSergey\binitsS. (\byear2021). \btitleBayesian adaptation for covariate shift. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
- Piotr M. Suder (4 papers)
- Jason Xu (38 papers)
- David B. Dunson (175 papers)