We ask whether reinforcement learning can find theoretically optimal algorithms for online optimization problems, and introduce a novel learning framework in this setting. By "algorithms", we mean uniform "pen-and-paper" algorithms, which work for any input length, and have worst-case guarantees. To answer this question, we introduce a number of key ideas from traditional algorithms and complexity theory. Specifically, we introduce the concept of adversarial distributions (universal and high-entropy training sets), which are distributions that encourage the learner to find algorithms that work well in the worst case. We present results for three different representative problems -- the AdWords problem (aka online Budgeted Allocation), the online Knapsack problem, and the Secretary problem. Our results indicate that the models learn behaviors that are consistent with the well-known optimal algorithms derived using the online primal-dual framework.
Based on the paper "A new dog learns old tricks: RL finds classic optimization algorithms". Joint work with Weiwei Kong, Christopher Liaw, and D. Sivakumar (ICLR 2019)
Back to the workshop page