True Online Emphatic TD($λ$): Quick Reference and Implementation Guide (1507.07147v1)

Published 25 Jul 2015 in cs.LG

Abstract: This document is a guide to the implementation of true online emphatic TD($\lambda$), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the generality of general value functions, as well as the emphasis algorithm's notion of "interest".

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

True Online Emphatic TD($λ$): Quick Reference and Implementation Guide (1507.07147v1)

Summary

Related Papers