Digital documents are now an important part of most business and government operations. Digitised information is easier and faster to process and ‘use’, creating a better experience for customers and employees.
One of the most popular digital document formats is PDF (portable document format) which allows anyone to create a computerized version of a paper form – or any other document. These files can then be emailed or shared quickly and easily.
Sometimes however, you may not want to share all of the information on that document, just a few of the details. When this is the case, we ‘redact’ the information that needs to be protected.
What is redaction?
Redaction simply means making some information on a document unreadable. Historically, information on a copy of a paper-based document would be redacted by simply drawing lines across the sensitive details with a large, black sharpie marker. The redacted document could then be shared safely.
Because protecting sensitive information is important, many PDF tools now include redaction tools that perform a similar job. You can highlight a portion of sensitive text and mark it to be redacted – the PDF creation software then ‘hides’ the text so it cannot be read.
But researchers have recently discovered this isn’t always the case.
Some redaction tools simply do not work
During testing it was found that the redaction features in some PDF applications didn’t work as expected. Although portions of text could not be read in the original document, researchers were able to recover the hidden text by simply selecting the redacted section and then copying and pasting into a new document – the censored text was then clearly readable.
Fortunately, this flaw was only found in a handful of applications – and once alerted to the problem, the software developers have been working to create a fix.
Advanced analysis may defeat redaction
Continuing their investigation, the researchers then looked at other ways to try and recover redacted information. To see what was possible, they built a tool that could automatically analyze ‘glyphs’ the physical size and shape of redacted letters and phrases. What does that mean? Consider the letter ‘i’ and the letter ‘w’ – i is quite narrow and w is quite wide so they both take up a different amount of space on screen.
By analyzing the size and shape of redacted text blocks, the researchers were able to make quite predictions about what the hidden text said. Their tests suggested that in 14% of cases they were able to recover people’s names from censored PDF files for instance. This may not sound like a lot (86% of names could not be recovered), but it is still a concerning statistic when dealing with sensitive information.
How to redact safely
According to the research, redacting PDF documents with the built-in tools does work in most cases. However, to best protect yourself, you should take a multi-level approach to redaction.
First, add another text layer on top of the sensitive details and type random text – even ‘zzzzzzzzz’ should work. Next, use the redaction tool as normal.
This approach ensures that text cannot be copy-and-pasted – all the hacker will see is the new text (zzzzzzzz). It also means that smart analysis will be unable to ‘guess’ which works have been censored because it cannot accurately assess the size and shapes of the underlying letters.
Fortunately, PDF software vendors have been contacted about potential redaction issues and are working to develop fixes. As always, make sure that your software is kept fully up-to-date so that when new features and bug fixes become available, you are better protected against hackers and cybercriminals.