Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Transfer Learning (2312.13484v1)

Published 20 Dec 2023 in stat.ML and cs.LG

Abstract: Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical engineering have sought to bring these ideas into focus, primarily surveying general methodologies and works from these disciplines. This article highlights Bayesian approaches to transfer learning, which have received relatively limited attention despite their innate compatibility with the notion of drawing upon prior knowledge to guide new learning tasks. Our survey encompasses a wide range of Bayesian transfer learning frameworks applicable to a variety of practical settings. We discuss how these methods address the problem of finding the optimal information to transfer between domains, which is a central question in transfer learning. We illustrate the utility of Bayesian transfer learning methods via a simulation study where we compare performance against frequentist competitors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. {barticle}[author] \bauthor\bsnmAbel Rodríguez, \bfnmDavid B Dunson\binitsD. B. D. and \bauthor\bsnmGelfand, \bfnmAlan E\binitsA. E. (\byear2008). \btitleThe nested Dirichlet process. \bjournalJournal of the American Statistical Association. \endbibitem
  2. {barticle}[author] \bauthor\bsnmAvrahami, \bfnmOmri\binitsO., \bauthor\bsnmLischinski, \bfnmDani\binitsD. and \bauthor\bsnmFried, \bfnmOhad\binitsO. (\byear2021). \btitleGAN cocktail: Mixing GANs without dataset access. \bjournalEuropean Conference on Computer Vision. \endbibitem
  3. {barticle}[author] \bauthor\bsnmBaglama, \bfnmJames\binitsJ. and \bauthor\bsnmReichel, \bfnmLothar\binitsL. (\byear2005). \btitleAugmented implicitly restarted Lanczos bidiagonalization methods. \bjournalSIAM Journal on Scientific Computing. \endbibitem
  4. {barticle}[author] \bauthor\bsnmBehseta, \bfnmSam\binitsS., \bauthor\bsnmKass, \bfnmRobert E\binitsR. E. and \bauthor\bsnmWallstrom, \bfnmGarrick L\binitsG. L. (\byear2005). \btitleHierarchical models for assessing variability among functions. \bjournalBiometrika. \endbibitem
  5. {barticle}[author] \bauthor\bsnmBoluki, \bfnmShahin\binitsS., \bauthor\bsnmQian, \bfnmXiaoning\binitsX. and \bauthor\bsnmDougherty, \bfnmEdward R\binitsE. R. (\byear2021). \btitleOptimal Bayesian supervised domain adaptation for RNA sequencing data. \bjournalBioinformatics. \endbibitem
  6. {barticle}[author] \bauthor\bsnmChandra, \bfnmNoirrit Kiran\binitsN. K., \bauthor\bsnmCanale, \bfnmAntonio\binitsA. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2023). \btitleEscaping the curse of dimensionality in Bayesian model-based clustering. \bjournalJournal of Machine Learning Research. \endbibitem
  7. {barticle}[author] \bauthor\bsnmChandra, \bfnmNoirrit Kiran\binitsN. K., \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. and \bauthor\bsnmXu, \bfnmJason\binitsJ. (\byear2023). \btitleInferring covariance structure from multiple data sources via subspace factor analysis. arXiv preprint arXiv 2305.04113. \endbibitem
  8. {barticle}[author] \bauthor\bsnmChandra, \bfnmRohitash\binitsR. and \bauthor\bsnmKapoor, \bfnmArpit\binitsA. (\byear2020). \btitleBayesian neural multi-source transfer learning. \bjournalNeurocomputing. \endbibitem
  9. {barticle}[author] \bauthor\bsnmChen, \bfnmMing-Hui\binitsM.-H. and \bauthor\bsnmIbrahim, \bfnmJoseph\binitsJ. (\byear2006). \btitleThe relationship between the power prior and hierarchical models. \bjournalBayesian Analysis. \endbibitem
  10. {barticle}[author] \bauthor\bsnmChen, \bfnmMing-Hui\binitsM.-H. and \bauthor\bsnmIbrahim, \bfnmJoseph G.\binitsJ. G. (\byear2000). \btitlePower prior distributions for regression models. \bjournalStatistical Science. \endbibitem
  11. {barticle}[author] \bauthor\bsnmDai, \bfnmBin\binitsB., \bauthor\bsnmWang, \bfnmZiyu\binitsZ. and \bauthor\bsnmWipf, \bfnmDavid\binitsD. (\byear2020). \btitleThe usual suspects? Reassessing blame for VAE posterior collapse. \bjournalInternational Conference on Machine Learning. \endbibitem
  12. {barticle}[author] \bauthor\bsnmDaniel R. Kowal, \bfnmDavid S. Matteson\binitsD. S. M. and \bauthor\bsnmRuppert, \bfnmDavid\binitsD. (\byear2019). \btitleFunctional autoregression for sparsely sampled data. \bjournalJournal of Business & Economic Statistics. \endbibitem
  13. {barticle}[author] \bauthor\bsnmDuan, \bfnmYuyan\binitsY., \bauthor\bsnmYe, \bfnmKeying\binitsK. and \bauthor\bsnmSmith, \bfnmEric P.\binitsE. P. (\byear2006). \btitleEvaluating water quality using power priors to incorporate historical information. \bjournalEnvironmetrics. \endbibitem
  14. {barticle}[author] \bauthor\bsnmDurante, \bfnmDaniele\binitsD. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2018). \btitleBayesian inference and testing of group differences in brain networks. \bjournalBayesian Analysis. \endbibitem
  15. {barticle}[author] \bauthor\bsnmEleftheriadis, \bfnmStefanos\binitsS., \bauthor\bsnmRudovic, \bfnmOgnjen\binitsO. and \bauthor\bsnmPantic, \bfnmMaja\binitsM. (\byear2014). \btitleDiscriminative shared gaussian processes for multiview and view-invariant facial expression recognition. \bjournalIEEE Transactions on Image Processing. \endbibitem
  16. {barticle}[author] \bauthor\bsnmFerrari, \bfnmFederico\binitsF. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2021). \btitleBayesian factor analysis for inference on interactions. \bjournalJournal of the American Statistical Association. \endbibitem
  17. {barticle}[author] \bauthor\bsnmGreen, \bfnmPeter J.\binitsP. J. (\byear1995). \btitleReversible jump Markov chain Monte Carlo computation and Bayesian model determination. \bjournalBiometrika. \endbibitem
  18. {barticle}[author] \bauthor\bsnmGönen, \bfnmMehmet\binitsM. and \bauthor\bsnmMargolin, \bfnmA. A.\binitsA. A. (\byear2014). \btitleKernelized Bayesian transfer learning. \bjournalAAAI Conference on Artificial Intelligence. \endbibitem
  19. {barticle}[author] \bauthor\bsnmHoff, \bfnmPeter\binitsP. (\byear2007). \btitleModeling homophily and stochastic equivalence in symmetric relational data. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
  20. {barticle}[author] \bauthor\bsnmIbrahim, \bfnmJ. G.\binitsJ. G., \bauthor\bsnmChen, \bfnmM. H.\binitsM. H. and \bauthor\bsnmSinha, \bfnmD.\binitsD. (\byear2001). \btitleBayesian semiparametric models for survival data with a cure fraction. \bjournalBiometrics. \endbibitem
  21. {barticle}[author] \bauthor\bsnmIbrahim, \bfnmJoseph G\binitsJ. G., \bauthor\bsnmChen, \bfnmMing-Hui\binitsM.-H. and \bauthor\bsnmSinha, \bfnmDebajyoti\binitsD. (\byear2003). \btitleOn optimality properties of the power prior. \bjournalJournal of the American Statistical Association. \endbibitem
  22. {barticle}[author] \bauthor\bsnmKapoor, \bfnmSanyam\binitsS., \bauthor\bsnmKaraletsos, \bfnmTheofanis\binitsT. and \bauthor\bsnmBui, \bfnmThang D\binitsT. D. (\byear2021). \btitleVariational auto-regressive Gaussian processes for continual learning. \bjournalInternational Conference on Machine Learning. \endbibitem
  23. {barticle}[author] \bauthor\bsnmKarbalayghareh, \bfnmAlireza\binitsA., \bauthor\bsnmQian, \bfnmXiaoning\binitsX. and \bauthor\bsnmDougherty, \bfnmEdward R.\binitsE. R. (\byear2018). \btitleOptimal Bayesian transfer learning. \bjournalIEEE Transactions on Signal Processing. \endbibitem
  24. {barticle}[author] \bauthor\bsnmKarbalayghareh, \bfnmAlireza\binitsA., \bauthor\bsnmQian, \bfnmXiaoning\binitsX. and \bauthor\bsnmDougherty, \bfnmEdward R.\binitsE. R. (\byear2018). \btitleOptimal Bayesian transfer regression. \bjournalIEEE Signal Processing Letters. \endbibitem
  25. {barticle}[author] \bauthor\bsnmKarbalayghareh, \bfnmAlireza\binitsA., \bauthor\bsnmQian, \bfnmXiaoning\binitsX. and \bauthor\bsnmDougherty, \bfnmEdward R.\binitsE. R. (\byear2021). \btitleOptimal Bayesian transfer learning for count data. \bjournalIEEE/ACM Transactions on Computational Biology and Bioinformatics. \endbibitem
  26. {barticle}[author] \bauthor\bsnmKouw, \bfnmWouter M.\binitsW. M. and \bauthor\bsnmLoog, \bfnmMarco\binitsM. (\byear2019). \btitleAn introduction to domain adaptation and transfer learning. arXiv preprint arXiv 1812.11806. \endbibitem
  27. {barticle}[author] \bauthor\bsnmKumar, \bfnmAbhishek\binitsA., \bauthor\bsnmChatterjee, \bfnmSunabha\binitsS. and \bauthor\bsnmRai, \bfnmPiyush\binitsP. (\byear2021). \btitleBayesian structural adaptation for continual learning. \bjournalInternational Conference on Machine Learning. \endbibitem
  28. {barticle}[author] \bauthor\bsnmLake, \bfnmBrenden M.\binitsB. M., \bauthor\bsnmSalakhutdinov, \bfnmRuslan\binitsR. and \bauthor\bsnmTenenbaum, \bfnmJoshua B.\binitsJ. B. (\byear2015). \btitleHuman-level concept learning through probabilistic program induction. \bjournalScience. \endbibitem
  29. {barticle}[author] \bauthor\bsnmLake, \bfnmBrenden M\binitsB. M., \bauthor\bsnmSalakhutdinov, \bfnmRuss R\binitsR. R. and \bauthor\bsnmTenenbaum, \bfnmJosh\binitsJ. (\byear2013). \btitleOne-shot learning by inverting a compositional causal process. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
  30. {barticle}[author] \bauthor\bsnmLawrence, \bfnmNeil D.\binitsN. D. and \bauthor\bsnmMoore, \bfnmAndrew J.\binitsA. J. (\byear2007). \btitleHierarchical Gaussian process latent variable models. \bjournalInternational Conference on Machine Learning. \endbibitem
  31. {bmisc}[author] \bauthor\bsnmLeBlanc, \bfnmPatrick M.\binitsP. M. and \bauthor\bsnmBanks, \bfnmDavid\binitsD. (\byear2023). \btitleTime-varying Bayesian network meta-analysis. arXiv preprint arXiv 2211.08312. \endbibitem
  32. {barticle}[author] \bauthor\bsnmLock, \bfnmEric F.\binitsE. F. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2015). \btitleShared kernel Bayesian screening. \bjournalBiometrika. \endbibitem
  33. {barticle}[author] \bauthor\bsnmLopes, \bfnmHedibert Freitas\binitsH. F. and \bauthor\bsnmWest, \bfnmMike\binitsM. (\byear2004). \btitleBayesian model assessment in factor analysis. \bjournalStatistica Sinica. \endbibitem
  34. {barticle}[author] \bauthor\bsnmLu, \bfnmG.\binitsG. and \bauthor\bsnmAdes, \bfnmA. E.\binitsA. E. (\byear2004). \btitleCombination of direct and indirect evidence in mixed treatment comparisons. \bjournalStatistics in Medicine. \endbibitem
  35. {barticle}[author] \bauthor\bsnmLu, \bfnmGuobing\binitsG. and \bauthor\bsnmAdes, \bfnmA. E.\binitsA. E. (\byear2006). \btitleAssessing evidence inconsistency in mixed treatment comparisons. \bjournalJournal of the American Statistical Association. \endbibitem
  36. {bincollection}[author] \bauthor\bsnmMcCloskey, \bfnmMichael\binitsM. and \bauthor\bsnmCohen, \bfnmNeal J.\binitsN. J. (\byear1989). \btitleCatastrophic interference in connectionist networks: The sequential learning problem. \bseriesPsychology of Learning and Motivation. \endbibitem
  37. {bmisc}[author] \bauthor\bsnmMolstad, \bfnmAaron J.\binitsA. J., \bauthor\bsnmEkvall, \bfnmKarl Oskar\binitsK. O. and \bauthor\bsnmSuder, \bfnmPiotr M.\binitsP. M. (\byear2022). \btitleDirect covariance matrix estimation with compositional data. arXiv preprint arXiv 2212.09833. \endbibitem
  38. {barticle}[author] \bauthor\bsnmMüller, \bfnmPeter\binitsP., \bauthor\bsnmQuintana, \bfnmFernando\binitsF. and \bauthor\bsnmRosner, \bfnmGary\binitsG. (\byear2004). \btitleA method for combining inference across related nonparametric Bayesian models. \bjournalJournal of the Royal Statistical Society: Series B (Statistical Methodology). \endbibitem
  39. {barticle}[author] \bauthor\bsnmPan, \bfnmSinno Jialin\binitsS. J. and \bauthor\bsnmYang, \bfnmQiang\binitsQ. (\byear2010). \btitleA survey on transfer learning. \bjournalIEEE Transactions on Knowledge and Data Engineering. \endbibitem
  40. {barticle}[author] \bauthor\bsnmQuefeng Li, \bfnmJianqing Fan\binitsJ. F. \bsuffixGuang Cheng and \bauthor\bsnmWang, \bfnmYuyan\binitsY. (\byear2018). \btitleEmbracing the blessing of dimensionality in factor models. \bjournalJournal of the American Statistical Association. \endbibitem
  41. {barticle}[author] \bauthor\bsnmRavi, \bfnmSachin\binitsS. and \bauthor\bsnmBeatson, \bfnmAlex\binitsA. (\byear2019). \btitleAmortized Bayesian meta-learning. \bjournalInternational Conference on Learning Representations. \endbibitem
  42. {barticle}[author] \bauthor\bsnmSai Li, \bfnmT. Tony Cai\binitsT. T. C. and \bauthor\bsnmLi, \bfnmHongzhe\binitsH. (\byear2022). \btitleTransfer learning in large-scale Gaussian graphical models with false discovery rate control. \bjournalJournal of the American Statistical Association. \endbibitem
  43. {barticle}[author] \bauthor\bsnmSalakhutdinov, \bfnmRuslan\binitsR., \bauthor\bsnmTenenbaum, \bfnmJoshua\binitsJ. and \bauthor\bsnmTorralba, \bfnmAntonio\binitsA. (\byear2012). \btitleOne-shot learning with a hierarchical nonparametric Bayesian model. \bjournalProceedings of ICML Workshop on Unsupervised and Transfer Learning. \endbibitem
  44. {barticle}[author] \bauthor\bsnmSamorodnitsky, \bfnmSarah\binitsS., \bauthor\bsnmHoadley, \bfnmKatherine\binitsK. and \bauthor\bsnmLock, \bfnmEric\binitsE. (\byear2020). \btitleA pan-cancer and polygenic Bayesian hierarchical model for the effect of somatic mutations on survival. \bjournalCancer Informatics. \endbibitem
  45. {barticle}[author] \bauthor\bsnmTony Cai, \bfnmWeidong Liu\binitsW. L. and \bauthor\bsnmLuo, \bfnmXi\binitsX. (\byear2011). \btitleA constrained L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT minimization approach to sparse precision matrix estimation. \bjournalJournal of the American Statistical Association. \endbibitem
  46. {barticle}[author] \bauthor\bsnmWang, \bfnmBoyu\binitsB. and \bauthor\bsnmPineau, \bfnmJoelle\binitsJ. (\byear2015). \btitleOnline boosting algorithms for anytime transfer and multitask learning. \bjournalAAAI Conference on Artificial Intelligence. \endbibitem
  47. {barticle}[author] \bauthor\bsnmWang, \bfnmYixin\binitsY., \bauthor\bsnmBlei, \bfnmDavid\binitsD. and \bauthor\bsnmCunningham, \bfnmJohn P\binitsJ. P. (\byear2021). \btitlePosterior collapse and latent variable non-identifiability. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
  48. {barticle}[author] \bauthor\bsnmWang, \bfnmZihao\binitsZ. and \bauthor\bsnmZiyin, \bfnmLiu\binitsL. (\byear2022). \btitlePosterior collapse of a linear latent variable model. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
  49. {barticle}[author] \bauthor\bsnmWilson, \bfnmAndrew Gordon\binitsA. G., \bauthor\bsnmKnowles, \bfnmDavid A.\binitsD. A. and \bauthor\bsnmGhahramani, \bfnmZoubin\binitsZ. (\byear2012). \btitleGaussian process regression networks. \bjournalInternational Conference on Machine Learning. \endbibitem
  50. {barticle}[author] \bauthor\bsnmWood, \bfnmFrank\binitsF. and \bauthor\bsnmTeh, \bfnmYee Whye\binitsY. W. (\byear2009). \btitleA hierarchical nonparametric Bayesian approach to statistical language model domain adaptation. \bjournalInternational Conference on Artificial Intelligence and Statistics. \endbibitem
  51. {barticle}[author] \bauthor\bsnmXu, \bfnmJason\binitsJ. and \bauthor\bsnmLange, \bfnmKenneth\binitsK. (\byear2022). \btitleA proximal distance algorithm for likelihood-based sparse covariance estimation. \bjournalBiometrika. \endbibitem
  52. {barticle}[author] \bauthor\bsnmXu, \bfnmJu\binitsJ. and \bauthor\bsnmZhu, \bfnmZhanxing\binitsZ. (\byear2018). \btitleReinforced continual learning. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
  53. {bmisc}[author] \bauthor\bsnmXu, \bfnmMaoran\binitsM., \bauthor\bsnmHerring, \bfnmAmy H.\binitsA. H. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2023). \btitleIdentifiable and interpretable nonparametric factor analysis. arXiv preprint arXiv 2311.08254. \endbibitem
  54. {barticle}[author] \bauthor\bsnmXuan, \bfnmJunyu\binitsJ., \bauthor\bsnmLu, \bfnmJie\binitsJ. and \bauthor\bsnmZhang, \bfnmGuangquan\binitsG. (\byear2021). \btitleBayesian transfer learning: An overview of probabilistic graphical models for transfer learning. arXiv preprint arXiv 2109.13233. \endbibitem
  55. {barticle}[author] \bauthor\bsnmYousefi, \bfnmFariba\binitsF., \bauthor\bsnmSmith, \bfnmMichael T\binitsM. T. and \bauthor\bsnmÁlvarez, \bfnmMauricio\binitsM. (\byear2019). \btitleMulti-task learning for aggregated data using Gaussian processes. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
  56. {barticle}[author] \bauthor\bsnmYuanpei Cao, \bfnmWei Lin\binitsW. L. and \bauthor\bsnmLi, \bfnmHongzhe\binitsH. (\byear2019). \btitleLarge covariance estimation for compositional data via composition-adjusted thresholding. \bjournalJournal of the American Statistical Association. \endbibitem
  57. {barticle}[author] \bauthor\bsnmZhou, \bfnmAurick\binitsA. and \bauthor\bsnmLevine, \bfnmSergey\binitsS. (\byear2021). \btitleBayesian adaptation for covariate shift. \bjournalAdvances in Neural Information Processing Systems. \endbibitem
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Piotr M. Suder (4 papers)
  2. Jason Xu (38 papers)
  3. David B. Dunson (175 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com