科学研究
报告题目:

Toward Safe Reinforcement Learning: A Quasi-optimal Approach

报告人:

Prof. Ruoqing Zhu (University of Illinois, Urbana Champaign)

报告时间:

报告地点:

老外楼概率统计教研室

报告摘要:

Determining the optimal treatment or dose level is the essential goal in personalized medicine. When there are many decision points involved, the problem falls into the reinforcement learning setting, where stochastic policies are often considered. However, existing approaches, in both discrete and continuous action spaces, may assign risky treatments that lead to poor outcomes, especially when data are collected offline. It’s important to ensure safety and control such behavior in the estimation procedure. We develop a novel quasi-optimal learning framework that can be easily estimated in off-policy settings with guaranteed performance. The key idea is to constrain the estimated action to a subspace that only yields near-optimal Q functions. We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset.