Taming Stragglers in Parallel Split Learning
3. 4. 2021, Jihong Park made a representation on ongoing work about communication-efficient parallel split learning at workshop in MIT (‘Split Learning for Distributed Machine Learning (SLDML’21))
Title: Taming Stragglers in Parallel Split Learning
Speakers: Jihong Park, Seungeun Oh, Hyelin Nam, Seong-Lyun Kim, Mehdi Bennis (Deakin Univ, Yonsei Univ, Univ. of Oulu)
In respect to ‘communication to learn’, there are some problems related to
lack of samples at a single location/ data privacy/ limited communication and computing resources
In this perspect, we focus on communication efficeint distributed learning, especially split learning.
With designing communication efficient split learning, we found several issues and suggest studies regarding each of them.
To control number of stragglers, we can adopt manifold mixup, and adjust average pooling with outputs of worker models.
To perform split learning, it needs to support periodic communication with low latency. By increasing batch size, latency decreases significantly as shannon capacity function can be achieved when packet length goes to infinity.
Finally there remains commuincation issue. When communication between server and straggler fails, the server cannot calculate gradient of the stragger. To overcome this problem, we propose federated split distillation. We can view entire model as a teacher and one chunk as a student. The server (teacher model) has workers output and locally train with it, and workers (student model) can localaly update themselves with the server’s output by locally running knowledge distillation.