Candidate CVs are effectively impossible to degender, AI researchers say

Researchers at New York University have found that even very simple natural language processing (NLP) models are quite capable of determining a candidate’s gender from a resume “without gender distinction,” even in cases where machine learning methods were used to remove all gender indicators from the document.

Following a study that involved processing 348,000 well-matched male/female CVs, the researchers conclude:

‘[There] is a significant amount of gender-specific information in resumes. Even after significant attempts to mask gender in resumes, a simple Tf-Idf model can learn to discriminate between [genders]. This empirically validates concerns about models learning to discriminate gender and propagate bias in downstream training data.

The discovery matters not because it is realistic to hide gender during the selection and interview process (which it clearly is not), but rather because the mere fact of getting to this point may involve an AI-based resume review without a human. the-loop – and HR AI has earned a reputation tainted by gender bias in recent years.

The results of the researchers’ study demonstrate just how resistant the genre is to attempts at obfuscation:

Findings from the NYU article. Source:

The results above use a 0-1 Area under receiver operating characteristic (AUROC) metric, where “1” represents 100% certainty of genus identification. The table covers a range of eight experiments.

Even in the worst performing results (Experiments 7 and 8), where a CV was so severely stripped of gender-identifying information that it became unusable, a simple NLP template such as Word2Vec is still capable of accurate gender identification approaching 70%.

The researchers comment:

“In the context of algorithmic hiring, these results imply that unless the training data is perfectly unbiased, even simple NLP models will learn to discriminate gender from resumes and propagate biases downstream.”

The authors imply that there is no legitimate AI-based solution to “degender” resumes in a feasible hiring pipeline, and that machine learning techniques that actively enforce fair treatment are a better approach to the problem of gender bias in the labor market.

In AI terms, this amounts to “positive discrimination”, where gender-revealing resumes are accepted as unavoidable, but reclassification is actively enforced as an egalitarian measure. Approaches of this nature have been proposed by LinkedIn in 2019, and German, Italian and Spanish researchers in 2018.

the paper is titled Gendered language in CVs and its implications for algorithmic bias in recruitmentand is authored by Prasanna Parasurama, from NYU Stern Business School’s Technology, Operations and Statistics department, and João Sedoc, assistant professor of Technology, Operations and Statistics at Stern.

Gender bias in hiring

The authors highlight the extent to which gender bias in hiring procedures is literally becoming systematized, with HR managers using advanced algorithmic, machine learning-based “screening” processes that amount to a rejection based on AI based sex.

The authors cite the case of a hiring algorithm at Amazon that was revealed in 2018 of rote rejecting female applicants because he learned that historically, men were more likely to be hired

“The model had learned from historical hiring data that men were more likely to be hired, and therefore rated male CVs higher than female CVs.

“Although the candidate’s gender was not explicitly included in the model, he learned to distinguish between male and female CVs based on the gendered information contained in the CVs – for example, men were more likely to use words such as “executed” and “captured”.

Additionally, a 2011 study found that job postings that implicitly seek men explicitly attract them, and also discourage women from applying for this position. Digitization and big data schemes promise to further entrench these practices in automated systems, if the syndrome is not actively corrected.


NYU researchers trained a series of models to classify gender using predictive modeling. They also sought to establish how well the models’ ability to predict gender could survive the removal of increasing amounts of potentially gender-revealing information, while trying to preserve application-relevant content.

The dataset was drawn from a pool of candidate resumes from eight US-based IT companies, with each resume accompanied by details on name, gender, years of experience, area of ​​expertise or study, and the target job offer for which the CV was sent. .

To extract deeper contextual information from this data in the form of a vector representation, the authors trained a Word2Vec model. This was then parsed into tokens and filtered, ultimately resolving to an integrated representation for each CV.

The male and female samples were matched 1-1, and a subset was obtained by pairing the male and female applicants with the best objective fit for the job, with a margin of error of 2 years, in terms of experience in their field. Thus, the data set consists of 174,000 male CVs and 174,000 female CVs.

Architecture and Libraries

The three models used for the classification task were Term Frequency-Inverse Document Frequency (TF-IDF)+ LogisticWord Embeddings + Logistics, and Longform.

The first model proposes a bag-of-words baseline that discriminates gender based on lexical differences. The second approach was used both with an out-of-the-box built-in word system and with incorporation of sexist words.

The data was split 80/10/10 between training, assessment and testing,

As seen in the results displayed above, the transformer-based Longformer library, notably more sophisticated than previous approaches, was nearly able to match a completely “unprotected” CV in terms of its ability to detect gender at from documents that had been actively stripped of known gender identifiers.

Experiments conducted included data ablation studies, where an increasing amount of gender-revealing information was removed from CVs, and models were tested against these more taciturn records.

Information removed included hobbies (a criteria derived from Wikipedia’s definition of “hobby”), LinkedIn IDs and URLs that could reveal gender. Additionally, terms such as “fraternity”, “waitress”, and “vendor” were dropped in these more sparse versions.

Additional Results

In addition to the findings discussed above, NYU researchers found that biased word integrations did not reduce the models’ ability to predict gender. In the article, the authors allude to the extent to which gender permeates written language, noting that these mechanisms and signifiers are not yet well understood.

James G. Williams