Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 133 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 441 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Learning Shared Kernel Models: the Shared Kernel EM algorithm (2205.09041v1)

Published 15 May 2022 in cs.LG

Abstract: Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution. It works by introducing "hidden" or "latent" variables via Baum's auxiliary function $Q$ that allow the joint data likelihood to be expressed as a product of simple factors. The relevance of EM has increased since the introduction of the variational lower bound (VLB): the VLB differs from Baum's auxiliary function only by the entropy of the PDF of the latent variables $Z$. We first present a rederivation of the standard EM algorithm using data association ideas from the field of multiple target tracking, using $K$-valued scalar data association hypotheses rather than the usual binary indicator vectors. The same method is then applied to a little known but much more general type of supervised EM algorithm for shared kernel models, related to probabilistic radial basis function networks. We address a number of shortcomings in the derivations that have been published previously in this area. In particular, we give theoretically rigorous derivations of (i) the complete data likelihood; (ii) Baum's auxiliary function (the E-step) and (iii) the maximisation (M-step) in the case of Gaussian shared kernel models. The subsequent algorithm, called shared kernel EM (SKEM), is then applied to a digit recognition problem using a novel 7-segment digit representation. Variants of the algorithm that use different numbers of features and different EM algorithm dimensions are compared in terms of mean accuracy and mean IoU. A simplified classifier is proposed that decomposes the joint data PDF as a product of lower order PDFs over non-overlapping subsets of variables. The effect of different numbers of assumed mixture components $K$ is also investigated. High-level source code for the data generation and SKEM algorithm is provided.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.