Data enrichment companies are big business these days. They merge the data stored by other companies with third-party data in order for these companies to be able to take more informed decisions. Enriched data provides deeper insights into the companies’ customers, which allows them to adjust their business model so that it better adapts to their current or potential customers’ needs.

For this data to be relevant, the more information in contains the better; this is why data enrichment companies need to deal with databases with tens of millions of records. This information can include data such as email addresses, the number of children in a household, the value of the house, buying habits… This personal information would be priceless for any cybercriminal who tried to monetize it, and it is vital to protect it to stop it from making its way onto the black market.

Over 1 billion people’s data leaked

The dangers inherent to data enrichment were put in the spotlight in the middle of October when it was discovered that the personal data of 1.2 billion people had been exposed online. Bob Diachenko and Vinny Troia discovered an Elasticsearch server containing around 4 billion user accounts—around 4TB of data in total, in four datasets.

This data is believed to belong to two data enrichment companies. Three of the datasets were tagged with the name of a company of this kind called “People Data Labs”, while the third set is tagged “EXY”, which the security researchers believe could be Oxydata, another data enrichment firm.

Vinny Troia, chief of threat intelligence at Data Viper explains: “A total count of unique people across all data sets reached more than 1.2 billion people, making this one of the largest data leaks from a single source organization in history. The leaked data contained names, email addresses, phone numbers, LinkedIn and Facebook profile information.”

Massive data leak: over one billion people affected

The Elasticsearch server containing this data didn’t require a password or any other security measure to access it. However, it is not clear who is responsible for this server ending up exposed.

Elasticsearch servers and personal data

This is not the only data breach of the last few months involving an Elasticsearch server. At the beginning of November, security researchers discovered that the data of the clients of a hotel booking platform, Gekko Group, had been exposed on an unsecured Elasticsearch server.

The researchers discovered the database containing over 1TB of unencrypted data on the server in question. Gekko Group’s client list includes over 600,000 hotels worldwide. The exposed data included names, addresses and invoices containing unencrypted payment information.

The dangers of leaked data

Even if a data breach doesn’t contain payment information, it can still pose a serious risk for the people whose data is leaked. The main reason for this is the possibility of this data being used for identity theft. This identity theft can also be the first step in cyberthreats such as spear phishing.

For a company that suffers a data breach, there can be serious consequences too. Under the GDPR, if the companies responsible for this breach handle the personal data of European citizens, they could be facing fines of up to 4% of their annual global turnover, or €20 million.

How to protect personal data

The best way to ensure that that personal data handled by your company is secure is to keep comprehensive controls on it. This way, you can know where it is at all times. Panda Data Control is a module of Panda Adaptive Defense created specifically to stop access, modification and exfiltration of the data stored by your company. It audits and discovers all unstructured personal data (PII) on all endpoints. This way, not only will you know what data you have and where you have it, but you’ll also know if someone accesses it or tries to modify it.

This data breach is one of the largest in history, but it will not be the last. Make sure your company isn’t the next to suffer a data breach with Panda Data Control.