Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning
This paper is an extended version of previous work, “Federated Reinforcement Learning with Proxy Experience Memory.” In previous work, we proposed a distribute reinforcement learning (RL) framework based on proxy experience replay memory (ProxRM), which is termed as federated reinforcement distillation (FRD). To make the ProxRM, RL learning agents have to average the experience replay memory. This may occur the blurring effect of experience replay memories. To tackle the problem, we propose mixup augmented FRD (MixFRD), which is improved version of FRD, that RL agents under MixFRD interpolate the ProxRM with mixup data augmentation method. It has compatible learning performance compared to other legacy distribute RL schemes, especially the communication cost is far less than any other schemes.
Figures: Mission completion time comparisons among MixFRD, policy distillation, and federated reinforcement learning (left), and uplink and downlink payload size comparison for two agents with the server (right).
Abstract: Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience replay memory (ProxRM), in which policies are locally averaged with respect to proxy states clustering actual states. To provide FRD design insights, we present ablation studies on the impact of ProxRM structures, neural network architectures, and communication intervals. Furthermore, we propose an improved version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is interpolated using the mixup data augmentation algorithm. Simulations in a Cartpole environment validate the effectiveness of MixFRD in reducing the variance of mission completion time and communication cost, compared to the benchmark schemes, vanilla FRD, federated reinforcement learning (FRL), and policy distillation (PD).
Authors: Han Cha, Jihong Park, Hyesung Kim, Mehdi Bennis, and Seong-Lyun Kim, “Proxy experience replay: federated distillation for distributed reinforcement learning,” accepted to IEEE Intelligent Systems, May 2020.