Skip to main content

Create and Manage Training Sets

Training sets is a machine learning-powered text extraction feature that allows you to extract fields from documents or bodies of text using natural language processing (NLP) and use those fields in a solution. When paired with optical character recognition (OCR), training sets enable you to extract important text even from read-only files like PDF. The fields extracted using training sets can be used, like most other fields, in module actions and triggers.

Because the process of taking in data (both structured and unstructured) is such a common part of many business processes, there are numerous situations where training sets are helpful. One common use case is using a combination of OCR and training sets to extract important fields from forms submitted as PDFs.

For example, a module might be monitoring an email inbox for various kinds of forms included as PDF attachments. Using an OCR Conversation action, you can configure the module to convert the PDF into machine-readable text. You can then run that text through a training set that locates and extracts specific fields, such as the name of the form submitter, the date of submission, and other identifying information. While requiring some configuration, training sets can remove the need for a person to read through large volumes of PDF documents and pull out the necessary information to continue another process.

Learn More