Online Learning of Time-Varying Unbalanced Networks In Non-Convex Environments: A Multi-Armed Bandit Approach

Research output: Contribution to journalArticle

Abstract

This study discusses how agents in a time-varying distributed network can converge to the global minimizer of a time-varying graph network. Each agent knows only the local loss of its observation and must cooperate constructively with other agents to find the global minimizer of the network. Unlike most existing works in the literature that consider a convex loss function, this study assumes a generalized local Lipschitz loss function for each agent, which can be convex or non-convex. We propose a multi-armed bandit algorithm CD EXP3 where each agent does not know its loss function but only observes its losses. Through simulations using two different time-varying graph topologies, we show that the algorithm helps all agents converge to the minimizer of the network. In addition, we discuss the effects of the two different topologies and various simulation parameters on convergence. We obtain an upper bound on the expected regret and compare it with the sublinearity of the regret bounds of well-known online distributed algorithms.
Original languageEnglish
Pages (from-to)15567-15577
JournalIEEE Access
Volume11
StatePublished - 2023

Fingerprint

Dive into the research topics of 'Online Learning of Time-Varying Unbalanced Networks In Non-Convex Environments: A Multi-Armed Bandit Approach'. Together they form a unique fingerprint.

Cite this