How to build searches compatible with the Panda Data Control normalization process?

Panda Data Control

The data extracted from the files found on users? computers is stored in a database on the computer itself after undergoing a process of normalization. This process varies depending on whether Panda Data Control considers the data as a PII data type or unidentified text.

The normalization process directly affects the searches, as it contrasts the search parameters with the data stored after normalization. That is, the search is performed on the normalized data and not on the original data contained in users? files. Let's see several aspects of the Panda Data Control normalization process:

Separating characters
Panda Data Control identifies a group of special characters that it considers as separators between words and which can be completely removed or replaced by a single space. These characters are as follows

  • Return: \r
  • Line break: \n
  • Tab key: \t
  • Characters: " : ; ! ? - + _ * = ( ) [ ] { } , . | % \ / ?

Transformation of indexed character strings to lowercase
Regardless of whether the character string is recognized as a PII type or not, before it is stored in the database, it is transformed to lowercase. Administrator searches are also transformed to lowercase, so writing in uppercase or lowercase does not affect the search result.

General rules for normalizing data recognized as personal data

  • In PII types formed by numeric characters (telephone numbers, bank account numbers, etc.) separating characters are deleted and the resulting string is stored as a single entity. For example "" would be stored as PII type IDCARD "14265116C".
  • IP addresses and email addresses are stored as they are.
  • For First Names and Last Names and Addresses, each word is stored independently and those containing numbers are deleted. For example "25 Upper Nelson Mandela Boulevard? would be stored as "upper", "nelson", "mandela", "boulevard?.
General rules for normalizing data not recognized as personal data
  • Numerical and alphanumeric data (words formed by letters and numbers) that are not detected as PII are deleted in the normalization process, and therefore they do not return any results in searches.
  • Each separating character detected divides the character string into two independent words and means that the separator character is not stored. For instance, the string "house.forest" is stored as "house" and "forest" and the separator character "." is deleted.
Tips for constructing searches that are compatible with the normalization process
  • It is preferable to use lowercase letters.
  • Numeric characters which are part of strings that are not identified as a PII entry compatible with Panda Data Control are deleted in the normalization process, and should not therefore be used in searches.
  • To search for bank account numbers, credit card numbers, ID card numbers, social security numbers, passport numbers, driver's license numbers don?t use separating characters.
  • To search for IP addresses and email addresses, enter them as they are.
  • To search for phone numbers, remove any separating characters, and enter the country code if necessary without the "+" sign.
  • To find postal addresses or first and last names, don?t use the numeric characters.
