This article covers the use of Custom Document Classes in Classification Policies. A custom document class is a group of documents that are similar in content that you want to govern. While Egnyte provides the ability to find some common types of sensitive documents, you may want to create custom classes that reflect the types of documents you have in your organization. Examples of document classes might include:
- Internal presentations that follow a defined format
- Forms that contain the same fields but different values for fields
- Contracts that contain similar boilerplate text with minor differences
Egnyte allows you to leverage machine learning to train the engine to recognize documents that look similar to a set of known samples so you can discover them as part of a custom classification policy.
Beta Scope and Limitations
This capability is currently in beta. It is available to try for customers entitled to content classification. The main limitation with custom document classes is that you cannot edit a class once it has been created. If you need to retrain a class, you must delete and recreate the class with new training data. There may also be changes to the underlying model that affect the types of results you see for classes during the beta phase.
Prerequisites for Creating a Custom Document Class
To create a custom document class, you need to be a Secure & Govern admin and you also need known samples of documents that fit into the class. This set of documents is known as a training set because it is used by the system to ‘learn’ what patterns to look for. We recommend a minimum of 20 documents for your training set, but you can start with as few as 5-10 samples.
How to Create a Custom Document Class
1. Open Secure & Govern Settings and click Content Classification > Custom Document Classes.
2. Choose the option to Create a new class. You will need to provide a name and description of the class and upload known samples of documents in the class from your training set. Documents in the training set must be one of the following formats:
- PDF - .pdf
- Word Document - .docx
- Work Document - .doc
- Plain text - .txt
- Plain text - .text
3. Once the upload completes, you’ll be presented with a table showing each uploaded document, a similarity score, and a recommendation. The similarity score is an estimate of how similar the document is in terms of content to other documents in the set. Documents with high similarity scores will have a green tick indicating they are suitable candidates for training due to their similarity with other documents in the set. The recommendation for documents with relatively low similarity scores will be to remove said document from the training set to prevent errant results from being discovered as part of the class.
4. Once the training set is finalized and documents with low similarity scores are removed per recommendations, choose the option to Create to generate the class. You can now use the class as part of a custom classification policy.
How to Use Custom Document Classes in Custom Policies
1. To configure your Custom Policy using custom document classes, navigate to Settings->Content Classification->Policies->Add Custom Policy and then select Document Types. You will see any generated custom document classes as selectable options under CUSTOM DOCUMENT CLASSES. You may also edit an existing custom classification policy to add one or more custom document classes as criteria.
2. Choose the option to Save and then Save Policy. You will be able to see content that meets the policy in the Sensitive Content view.