AI pre-learning ability – use SQL database

Convincing clients to start using new software features can be a challenging task. It is the same for OCR solutions. Every single customer wants an OCR solution that would be precise out of the box. So how do you achieve this? This guide will show you:

  • How to use AI pre-learning ability.
  • How to use rows from your database to achieve precise data extraction from day one.

Why should you use AI pre-learning ability?

In the previous guide, we’ve implemented the usage of the continuous learning ability of typless AI. We saw how to integrate it into the existing workflow to support the customer’s work. However, we had one problem. Extraction did not return any data at the beginning. So the question is, how to improve the first user experience for your customers? That’s why AI needs the ability of pre-learning, making it trainable before your customer has any interaction with it. Consequently, the accuracy of the extracted data is high enough to satisfy your customers from the beginning.

What do I need to do?

If you have read the guide, how to choose OCR solution for ERP, you know that classic AI pre-learning can be costly and time-consuming. In other words, you want to invest as little resources as possible to train AI. Therefore, it would be lifesaving if you could use the data that is already in your database. This is something typless can do for you. So let’s take a look.

How to use AI’s pre-learning ability?

Prerequisites

  1. Register at typless
  2. Log in
  3. Click the Settings tab in the side navigation bar to get an API key
  4. Pre-learning ability is not enabled on free plan – contact us to enable it for testing
  5. Create a new document type named “pre-learning-example”:
    • Click on the Documents tab in the side navigation bar
    • Click on the +New button
    • For the name set “pre-learning-example
    • For the OCR language select “English
    • Click on the Next button
    • Leave fields as they are and click on the Confirm button
  6. Get code of the example project from Github.
				
					git clone https://github.com/typless/pre_learning.git
				
			

Set the API key:

				
					export API_KEY=YOUR_API_KEY_FROM_SETTINGS_TAB
				
			

To install requirements:

				
					pip install -r requirements.txt
				
			

Using AI’s pre-learning ability

In the examples directory, you will find example invoices and the database. The example database contains tables for received invoices and suppliers. Each of the records contains the following fields:

  • supplier – ID of the supplier,
  • invoice_number – invoice number,
  • issue_date – issue date of invoice,
  • total_amount – the total amount on an invoice,
  • file_path – path to file inside files directory.

To use typless AI’s pre-learning ability, we need to do the following steps:

  • read rows from the database,
  • for each row:
    • upload a file with data from a row.

The Python code that will do that with an example’s database looks like this:

				
					import json
import os
import sqlite3

import requests

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
# open connection
conn = sqlite3.connect(os.path.join(BASE_DIR, 'examples', 'examples.db'))
conn.row_factory = sqlite3.Row

cur = conn.cursor()
cur.execute("SELECT * FROM received_invoices")

# get all received invoices
rows = cur.fetchall()

for row in rows:
    with open(os.path.join(BASE_DIR, 'examples', 'files', row['file_path']), 'rb') as invoice_file:
        files = {
            "file": (row['file_path'].split('/')[-1], invoice_file.read(),),
        }
    request_data = {
        "document_type_name": 'pre-learning-example',
        "customer": 'me',
        "learning_fields": json.dumps(
            [
                {'name': 'supplier', 'value': row['supplier']},
                {'name': 'invoice_number', 'value': row['invoice_number']},
                {'name': 'issue_date', 'value': row['issue_date']},  # convert to YYYY-MM-DD string if your database has datetime type
                {'name': 'total_amount', 'value': '%.2f' % row['total_amount']},
            ]
        )
    }
    if os.getenv('API_KEY') is None:
        raise Exception('YOU MUST SET API KEY!')

    response = requests.post(
        f'https://developers.typless.com/api/document-types/learn/',
        files=files,
        data=request_data,
        headers={'Authorization': f'Token {os.getenv("API_KEY")}'}
    )
    print(response.text)

				
			

Run it:

				
					python pre_learning.py
				
			

Congratulations, you’ve just used typless AI’s pre-learning ability.

You can use extract_data.py, to try to extract data from one of the examples invoices:

				
					python extract_data.py
				
			

That’s it! Happy OCRing!

Send us a question and get in touch with us :)

What kind of documents would you like to extract?
Approximately how many documents would you like to extract monthly?
Could you tell us a little about yourself?
What would you like to discuss?