We consider experimental design in dynamic systems where the treatment and control can change the state of the system (e.g., the pricing algorithm of a ridesharing system, the matching algorithm of a delivery system, or the reserve price in ad auctions). In such systems, naively applying classical A/B tests that randomize units to treatment or control and taking sample averages of the groups suffers from *temporal interference*: the application of policy A to some units affects the state of the system as seen by policy B on other units.
Motivated by this problem, our key innovation is to model it as one of estimating the difference of the steady state reward between two different policies in a Markov chain with unknown transitions and rewards that depend on the actions (treatment or control). We characterize the adaptive experimental design and estimator that are consist and efficient in this setting. For practical implementation, we propose alternative estimators and corresponding optimal policies, and study the efficiency gap of these designs.
Joint work with Peter Glynn and Mohammad Rasouli.
Talk slidesBack to the workshop page