Learn how these Illinois Tech AI researchers extracted personal information from anonymous cellphone data using machine learning

Many people who regularly use social media, cell phones, home security cameras and location trackers don’t realize the full extent of their data. They have no idea that the information they have gathered can be replicated by others using machine learning (ML) techniques. The consequence is that users always run the risk of having their anonymized information anonymized by ML algorithms, although most people treasure their privacy online, albeit to varying degrees.

Data security concerns have been raised after a team of researchers from the Illinois Institute of Technology used machine learning and artificial intelligence algorithms to extract personally identifiable information from data from otherwise anonymous mobile phone, including sensitive characteristics such as age and gender.

Although information such as age and gender may seem harmless at first glance, it is often exploited for malicious purposes. There are a variety of regulations in place to protect minors that are breached when an attacker targets young children for any reason, from sale to sexual exploitation. At the other end of the age spectrum, older people frequently target sophisticated spam and phishing campaigns due to their vulnerability and access to savings.

Researchers could easily determine the gender and age of customers based on their text messages by analyzing data from a Latin American cellphone provider. The accuracy of the team’s neural network model (67%, to be exact) in estimating gender far exceeded that of state-of-the-art methods, including decision trees, random forests and gradient boosting models. . The same model was also 78% accurate in estimating the age of users.

The information was extrapolated using standard hardware and software. The neural network model was run on a Linux computer (Fedora) with 16 GB of RAM and an Intel i5-6200U processor, each with four processing cores.

Researchers have found that such attacks occur only rarely. The study database has not been made available to the general public. On the other hand, they note that an adversary can amass a comparable data set by intercepting communications on open Wi-Fi hotspots or hacking into service provider data centers.

The purpose of this work is to open a discussion on how recent developments in AI and machine learning have affected privacy laws. Since the United States lacks comprehensive privacy laws, the study’s authors examined how these methods undermine provisions of the EU’s General Data Protection Regulation, which protect European Union consumers against privacy breaches.

It is impossible to avoid the inevitable rise of machine learning and automated decision-making in the commercial world. The challenge is to find the right legislative framework to protect personal information while protecting social and commercial interests against fraud.

This can be done, for example, by giving users the choice to share their data when installing an application (the “opt-out option”).

Recommendations include updating existing non-compliance measures and encouraging the use of synthetic data for machine learning models rather than user observation. They also encourage data holders to collaborate with machine learning specialists to develop best practices. That is, much more effort is needed to fill policy voids and examine the ethics of AI. According to them, GANs can also be used to generate anonymous synthetic data.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Predicting age and gender from network telemetry: Implications for privacy and impact on policy'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.

Please Don't Forget To Join Our ML Subreddit and Youtube Channel


Tanushree Shenwai is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new technological advancements and applying them to real life.


James G. Williams