Huawei AI Researchers Propose Novel Decoupled Multitasking Learning with Cyclic Self-Regulation (DML-CSR) for Facial Analysis

Source: https://arxiv.org/pdf/2203.14448v1.pdf
This research summary is based on the paper 'Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing'

Please don't forget to join our ML Subreddit

Facial analysis aims to assign a pixel-perfect name to each component of the face, such as eyes, nose, and mouth, as a fine-grained semantic segmentation problem. Many high-level applications, such as face swapping, face editing, and face painting, require in-depth analysis of face semantic elements. Methods based on fully convolutional networks (FCNs) have shown promising results on fully supervised facial analysis, taking advantage of the learning capacity of deep convolutional neural networks (CNNs) and the work effort devoted to pixel-level annotations.

Nevertheless, FCNs are unable to capture global contextual information, which is needed to semantically analyze face components in an image, due to the local nature of the convolutional kernel. To solve this problem, most region-based face analysis algorithms learn global information by incorporating CNN features into variants of CRF. These methods, on the other hand, do not take into account the relationship between several things.

Previously, the EAGRNet approach was introduced to model a region-level graph representation on a face image by propagating information across all vertices of the graph. Even though EAGRNet achieves state-of-the-art performance by reasoning over non-local regions to obtain global dependencies between distant facial components, it still suffers from spatial inconsistency and boundary confusion. In EAGRNet, the PSP module uses an average pooling layer to register the global context beforehand, which results in an inconsistent spatial topology.

EAGRNet also integrates additional binary signals into the context integration to improve analysis results. Under crowded circumstances, however, EAGRNet struggles to manage boundaries between highly irregular facial features (such as hair and clothing) and to detect distinct boundaries between multiple face instances. Moreover, training a good face analysis model requires pixel-accurate annotations. However, on the training dataset, sloppy manual labeling errors are inevitable.

Since all ground-truth pixels are treated equally, the researchers use the typical fully supervised learning approach to train EAGRNet, which fails to find label noise. Specifically, not noticing these inadequate annotations limits generalization of the model and prevents performance improvement.

In a recent paper, Huawei researchers developed an end-to-end facial analysis system based on Decoupled Multitasking Learning with Cyclic Self-Regulation (DML-CSR). Given an input facial image, the pre-trained ResNet-101 on ImageNet is used as a backbone to extract features from different layers. After that, there are three tasks in the multitasking model: face analysis, binary edge detection, and category edge detection.

Source: https://arxiv.org/pdf/2203.14448v1.pdf

The backbone shares low-level weights with these activities, but there are no high-level interactions. As a result, at the inference stage, the multi-task learning approach can separate additional edge detection tasks from face analysis. To deal with the spatial inconsistencies caused by the pooling process, the team creates a dynamic dual-graph convolutional network (DDGCN) in the face analysis branch to collect long-range contextual data.

There is no additional pooling operation in the proposed DDGCN, and it can dynamically merge global context extracted from GCNs into spatial and feature spaces. The proposed category-aware edge detection module uses more semantic information than the binary edge detection module used in EARGNet to resolve boundary confusion in single-sided and multi-sided scenarios.

The team presents a self-training inspired cyclic learning scheduler to accomplish advanced cyclic self-regulation to solve the problem caused by noisy labels in training datasets. A self-assembly strategy is included in the proposed CSR, which can aggregate a series of historical models to produce a new reliable model and a self-distillation method that uses the soft labels provided by the aggregated model to drive the subsequent training of the model.

Finally, the suggested CSR iteration alternates between these two techniques, improving the generalization of the model by correcting noisy labels during training. The proposed CSR can improve model and label reliability in a cycle training scheduler without adding additional computational expense.

On the Helen (overall F1 score of 93.8%), LaPa (average F1 of 92.4%) and CelebAMask-HQ (average F1 of 86.1%) datasets, the approach achieves new peak performance . The method uses less computational resources than EARGNet since edge prediction modules can be separated from the entire network, reducing inference time from 89 ms to 31 ms while achieving significantly improved performance.

Conclusion

Huawei researchers have published DML-CSR, a decoupled multitasking learning technique for facial analysis with cyclic self-regulation. Extensive tests on Helen, CelebAMask-HQ and LaPa show that the proposed strategy is effective. DML-CSR outperforms other approaches on all datasets, according to the results. According to the researchers, DML-CSR is a valuable strategy to train a reliable facial analysis model on a large-scale dataset.

Paper: https://arxiv.org/pdf/2203.14448v1.pdf

Github: https://github.com/deepinsight/insightface/tree/master/parsing/dml_csr

James G. Williams