The creation of the PDF aimed to make the dream of the paperless officea reality. Developed by Adobe in the early 1990’s, the format allowssending of text and graphics in the same document electronically. Thesedays a PDF can be viewed on any device, password-protected and printedlocally. Another useful feature is the ability to add forms which canbe completed and returned digitally.
PDFs are now easier than ever to produce. For example, a PDF can becreated from within Microsoft Office 2007 products or Open Office. Various powerful free third-party PDF applications that integrate witholder Microsoft Office products are available. Web pages can beconverted to PDF using extensions and plugins that work with popularbrowsers such as Chrome and Firefox.
Using tags
Another useful feature is the ability to create a tagged PDF. The tagsgive meaning to the content and allow for the extraction of data. However, it is not easy to add tags. You need additional software, andthe tags have to be added manually, which is a repetitive task. Furthermore, if the PDF is produced by an internal system, it may not bepossible to add the tags when the PDF is generated.
These advantages mean the format is regularly used to send informationfrom business to business. However, as the content in the PDF has nomeaning without tags, the content usually needs to be copied and pastedfrom the document into the businesses internal system. The task islaborious and prone to error, wasting time and effort.
The solution is to send the data with meaning, usually in XML or JSONformat. The file can then be dropped into an internal system that hassuitable code to recognise and accurately extract all the informationfrom the file. The problem with XML and JSON is that its aimed atprogrammers and developers, not the average computer user. The followingfigures show how a pdf and XML file may differ for ordering a pair ofshoes:
PDF File:
Brand: Fly LondonDesign: ShardSize: 39Colour: black
XML File:
It is unlikely that the average user would feel comfortable creating anXML file and would rather continue to use PDFs, which have the benefitof password protection should sensitive data be contained within. Subsequently, there needs to be a way to extract data from PDFs otherthan direct copying and pasting.
How SwiftCase helps
Fortunately, there are a number of systems designed to automateextracting information from PDFs. The best ones combine differenttechnologies to recognise the data. These technologies include OCR toolsand word pattern recognition. Word pattern recognition is important ifyou want to recognize a chunk of text that differs in length between twodifferent headings, for example.
Systems designed to extract information from PDFs automatically can beused by Swiftcase to automate your business workflows. As soon as a PDFis sent in, the relevant information can be picked-up by Swiftcase, the data extracted andinserted into the system. No more manual entry required and an instantresponse to incoming work.
SwiftCase can automate a wide range of data-import processes, helping you focus on providing an excellent service to clients.
If you’re interested in a free, no-obligation demonstration,get in touch today.
Ready to automate your workflows?
SwiftCase helps operations teams streamline their processes with powerful workflow automation, case management, and AI-powered communication tools.
