Ομιλητής: Κωνσταντίνος Καραμανής (University of Texas at Austin)
Τίτλος: Contextual Reinforcement Learning when we don't know the contexts
Ημερομηνία: 19-9-2022, 17:00
Αίθουσα: Αίθουσα 1.1.31, στο παλιό κτήριο της ΣΗΜΜΥ ΕΜΠ (και με ζωντανή αναμετάδοση μέσω Webex *)
Σύνοψη
Contextual Bandits and more generally, contextual reinforcement learning, studies the problem where the learner relies on revealed contexts, or labels, to adapt learning and optimization strategies. What can we do when those contexts are missing?
Statistical learning with missing or hidden information is ubiquitous in many theoretical and applied problems. A basic yet fundamental setting is that of mixtures, where each data point is generated by one of several possible (unknown) processes. In this talk, we are interested in the dynamic decision-making version of this problem. At the beginning of each (finite length, typically short) episode, we interact with an MDP drawn from a set of M possible MDPs. The identity of the MDP for each episode -- the context -- is unknown.
We review the basic setting of MDPs and Reinforcement Learning, and explain in that framework why this class of problems is both important and challenging. Then, we outline several of our recent results in this area, as time permits. We first show that without additional assumptions, the problem is statistically hard in the number of different Markov chains: finding an epsilon-optimal policy requires exponentially (in M) many episodes. We then study several special and natural classes of LMDPs. We show how ideas from the method-of-moments, in addition to the principle of optimism, can be applied here to derive new, sample efficient RL algorithms in the presence of latent contexts.
Βιογραφικό ομιλιτή
Constantine Caramanis is a Professor in the ECE department of The University of Texas at Austin. He received a PhD in EECS from The Massachusetts Institute of Technology, in the Laboratory for Information and Decision Systems (LIDS), and an AB in Mathematics from Harvard University. His current research interests focus on decision-making in large-scale complex systems, with a focus on statistical learning and optimization.
* Meeting link: https://centralntua.webex.com/centralntua/j.php?MTID=mb481f20664ab7ae97c8bc8e83d20be7c
Meeting number: 2730 257 0833
Password: NvX5iDdBW37
Host key: 937351