2000 character limit reached
True Online Emphatic TD($λ$): Quick Reference and Implementation Guide (1507.07147v1)
Published 25 Jul 2015 in cs.LG
Abstract: This document is a guide to the implementation of true online emphatic TD($\lambda$), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the generality of general value functions, as well as the emphasis algorithm's notion of "interest".
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.