Tech

PDF and the Edge Case

by Conor Smith | October 31, 2017

The PDF specification, first as a published standard and then as an ISO standard has been freely available for years. Any programmer can read the PDF specification and start creating their own PDF API to sell as an SDK, or to power their own PDF software application.

However, the PDF specification is about 1,000 pages long, and also references a range of other standards such as image standards, XML standards, font standards and so on. Add to the mix that some areas of the PDF specification are ambiguous and you can see how ensuring your PDF engine can consume billions of different PDF files could be problematic as every PDF is unique in some way.

Due to these rigorous standards many startups and even larger companies, will first look to a commercially licensed SDK or an open source project.

 

What’s an Edge Case?

Edge cases are those occurrences that happen only a small number of times when performing any specific action. When generating PDFs from nothing programmers can create their own PDF parsing, rendering and generation engines quickly and simply.

The real challenge begins when you need to consume PDF files that other programs have generated. In the case of PDF generation, this could be something as small as a misaligned table when converting a file to PDF, or something bigger like mismatched fonts inconsistent with the original file or much more problematic issues.

A quick Google indicates that there are approximately 2,510,000,000 — that’s 2.51 billion — PDF files accessible from indexed websites, the surface web. But what about the deep web where content is not indexed by search engines for many different reasons?
Online Indexed PDF Files

The deep web includes many different types of documents like academic papers, medical records, legal documents, scientific reports, financial records; the list goes on. Billions upon billions of PDF files, created by thousands of different PDF generation tools, conversion tools, scanners and printer drivers. How do you ensure that all these documents render properly when opened as PDFs?

 

The Solution

I’ve seen first-hand the difference between working for a smaller company and a larger company in the PDF industry.

Smaller companies spend most of their time improving their products so they are competitive in the marketplace. Their time and resources are spent keeping up with individual customer demands and ensuring that the majority of documents generate, render and convert correctly. They don’t have the resources to test edge cases and to fully implement features to ensure that it will work for all use cases. They cannot use their limited resources to the same end as larger companies who have the time, resources and experience to do so.

This experience comes from knowledge and time spent in the PDF game. Here at Foxit we have the time, resources and long term vision to extensively test and improve our software — more than any other PDF engine. Larger companies, like us, can respond to real world needs and ensure features are fully designed, developed and can handle the majority of use cases and edge cases. Since we already solved a comprehensive list of edge cases, we can now focus on new features, platforms, speed and performance.