Artificial intelligence researchers are tackling the long-standing problem of “data heterogeneity” for federated learning
Researchers at North Carolina State University have developed a new approach to federated learning that allows them to develop accurate artificial intelligence (AI) models faster and more accurately. The work focuses on a long-standing problem in federated learning that occurs when there is significant heterogeneity in the different datasets used to train AI.
Federated learning is an AI training technique that allows AI systems to improve their performance by learning from multiple sets of data without compromising the privacy of that data. For example, federated learning could be used to leverage privileged patient data from multiple hospitals to improve AI diagnostic tools, without hospitals having access to each other’s patient data.
Federated learning is a form of machine learning involving multiple devices, called clients. Clients and a centralized server all start with a base model designed to solve a specific problem. From this starting point, each of the clients then trains their local model from their own data, modifying the model to improve its performance. Clients then send these “updates” to the centralized server. The centralized server relies on these updates to create a hybrid model, with the goal of making the hybrid model work better than any of the clients on its own. The central server then sends this hybrid model back to each of the clients. This process is repeated until system performance has been optimized or reached an agreed level of accuracy.
“However, sometimes the nature of a customer’s personal data results in changes to the local model that only work well for the customer’s own data, but do not work well when applied to other datasets” , explains Chau-Wai Wong, corresponding author. of an article on the new technique and Assistant Professor of Electrical and Computer Engineering at NC State. “In other words, if there’s enough heterogeneity in the customer data, sometimes a customer changes their local model in a way that actually hurts the performance of the hybrid model.”
“Our new approach allows us to solve the heterogeneity problem more efficiently than previous techniques, while preserving confidentiality,” says Kai Yue, the paper’s first author and Ph.D. student at NC State. “Furthermore, if there is enough heterogeneity in customer data, it may be effectively impossible to develop an accurate model using traditional federated learning approaches. But our new approach allows us to develop an accurate model, regardless of the heterogeneity of the data.
In the new approach, updates that clients send to the centralized server are reformatted in a way that preserves data privacy, but gives the central server more information about data characteristics relevant to model performance. More precisely, the client sends information to the server in the form of Jacobian matrices. The central server then plugs these matrices into an algorithm that produces an improved model. The central server then distributes the new template to clients. This process is then repeated, with each iteration leading to model updates that improve system performance.
“One of the central ideas is to avoid iteratively training the local model at each client, instead letting the server directly produce an improved hybrid model based on the clients’ Jacobian matrices,” says co-author Ryan Pilgrim. of the article and former graduate. student at NC State. “By doing so, the algorithm not only avoids multiple communication cycles, but also prevents divergent local updates from degrading the model.”
The researchers tested their new approach against industry-standard datasets used to assess federated learning performance and found that the new technique was able to match or surpass the accuracy of the federated mean – which is the benchmark for federated learning. Additionally, the new approach was able to match this standard while reducing the number of communication cycles between server and clients by an order of magnitude.
“For example, it takes a federated average of 284 communication cycles to achieve 85% accuracy in one of the test datasets,” Yue says. “We were able to achieve 85% accuracy in 26 rounds.”
“This is a new alternative approach to federated learning, which makes this work exploratory,” says Wong. “We are effectively reorienting analytical tools for practical problem solving. We look forward to feedback from industry and the broader federated learning research community on its potential. »
The paper, “Federated learning reinforced by the Neural Tangent Corewill be presented at the 39th International Conference on Machine Learning (ICML), which is being held in Baltimore, Maryland, July 17-23. The article was co-authored by Richeng Jin, a former postdoctoral researcher at NC State; Dror Baron, associate professor of electrical and computer engineering at NC State; Huaiyu Dai, professor of electrical and computer engineering at NC State; and Ryan Pilgrim, a former NC State graduate student.
Note to Editors: The summary of the study follows.
“Federated Learning Reinforced by the Neural Tangent Core”
Authors: Kai Yue, Richeng Jin, Chau-Wai Wong, Dror Baron and Huaiyu Dai, North Carolina State University; Ryan Pilgrim, independent researcher
Present: 39th International Conference on Machine Learning (ICML), Baltimore, Maryland, July 17-23
Summary: Federated learning (FL) is a privacy-preserving paradigm in which multiple participants jointly solve a machine learning problem without sharing raw data. Unlike traditional distributed learning, a unique feature of FL is statistical heterogeneity, that is, distributions of data between participants are different from each other. Meanwhile, recent advances in the interpretation of neural networks have seen wide use of neural tangent kernels (NTKs) for convergence analyses. In this article, we propose a new FL paradigm reinforced by the NTK framework. The paradigm addresses the challenge of statistical heterogeneity by conveying update data that is more expressive than that of conventional FL paradigms. Specifically, per-sample Jacobian matrices, rather than model weights/gradients, are uploaded by participants. The server then constructs an empirical kernel matrix to update a global model without explicitly performing gradient descent. We are further developing a variant with improved communication efficiency and enhanced privacy. The numerical results show that the proposed paradigm can achieve the same accuracy while reducing the number of communication towers by an order of magnitude compared to the federated average.