Microsoft AI researchers develop ‘Ekya’ to solve data drift problem on Edge compute box and allow both reconversion and inference to coexist on it
This article is based on the research paper 'Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers'. All credit for this research goes to the researchers of this paper from Microsoft, UC Berkeley and University of Chicago 👏👏👏 Please don't forget to join our ML Subreddit
Deep Neural Network (DNN) models for object recognition and classification, such as Yolo, ResNet, and EfficientNet, are used in video analytics applications such as urban mobility and smart automobiles. There is a symbiotic link between edge computing and video analytics, claiming that live video analytics is the “killer app” for edge computing. Edge devices come in different sizes and designs, but they are still resource constrained compared to the cloud. Video analytics deployments send video to on-premises edge servers. The article discusses the difficulty of simultaneously supporting inference and recycling tasks on edge servers, which requires navigating the fundamental trade-off between recycled model accuracy and inference accuracy. Edge computing is preferred for video analytics because it eliminates the need for expensive network lines to stream video to the cloud while simultaneously maintaining video privacy. Edge computing has a finite amount of resources (e.g. with weak GPUs). The mismatch between the increasing rate of the model’s computational needs and the total number of processor cycles exacerbates this problem. Therefore, model compression is used in edge deployments.
Due to limited resources at the edge, lightweight machine learning (ML) models are needed. Using model specialization and compression techniques, the community has created edge models with significantly smaller compute and memory footprints (by 96x for object detector models). Such models are ideal for edge deployment.
On-premises edge servers are often used to analyze video in video analytics implementations (for example, from AWS or Azure). A typical edge server can handle dozens of video streams, such as those from construction cameras, and each stream has its specialized pattern (see Figure 1). Edge computing is used in video analytics applications for the following reasons. 1) Edge deployments are typical in places where the uplink network to the cloud is prohibitively expensive for sending continuous video streams, such as on oil rigs with expensive satellite networks or in cars smart with data-limited cellular networks. 2) There are failures in the network links outside the peripheral sites. Edge Compute ensures that data is not lost in the event of a cloud outage and disruptions are avoided. 3) Videos often contain sensitive and personal information that users do not want uploaded to the cloud (for example, many cities in the EU legally require traffic videos to be processed onsite). Therefore, it is better to run both inference and recycling on the edge compute device rather than relying on the cloud for network cost and video privacy. Cloud-based solutions are slower and less accurate than edge deployments with conventional bandwidths.
So far so good, but data drift is the villain of the story! This is a phenomenon where real-time field data differs significantly from training data. Continuous model retraining – Continuous learning is a promising strategy to deal with data drift. Even though some previous information is retained, the edge DNNs are progressively recycled on new video samples. Continuous learning techniques regularly recycle DNNs; the time between two relearnings is called “relearning window”, and a sample of the data accumulated during each window is used for relearning. This continuous learning helps compressed models maintain their high accuracy.
Ekya is a solution for data drift on edge computing boxes. On limited peripheral resources, continuing education requires making informed decisions about when to retrain the model of each video stream, how many resources to allocate, and what configurations to use. Making these choices brings two difficulties. First, the decision space of multidimensional configurations and resource allocations is computationally more complex than multidimensional knapsack and multi-armed bandits, two fundamentally difficult problems. As a result, a thief scheduler is devised, a technique that makes joint recycling and inference scheduling practical. Second, the planner requires the actual performance of the model (in terms of resource usage and inference accuracy), which requires retraining for all configurations. Our micro-profiler solves this problem by retraining only a few selected designs on a bit of data. The components of Ekya are shown in Figure 5.
Ekya’s performance is evaluated, and here are the significant results: 1) In classification and detection, Ekya achieves up to 29% improved accuracy for compressed vision models compared to static retraining baselines. It would take four additional GPU resources to reach Ekya’s accuracy for the baseline. 2) Ekya’s gains are mainly due to micro-profiling and the thief planner. The micro-profiler, in particular, estimates accuracy with low median errors of 5.8%. 3) When choosing 10 video streams across 8 GPUs with 18 configurations per model for a 200 second recycle window, the thief scheduler produces decisions in 9.4 seconds. 4) Ekya provides significantly higher accuracy without the network costs than alternative methods, such as reusing cached historical models trained on similar data/scenarios and recycling models in the cloud.