Continuously learning AI
AI is a living thing. It learns and makes mistakes. This guide will show you how to integrate continuously learning AI into the existing workflow of your customers.
Why do you need continuously learning AI?
Business processes vary from one company to another, even from one department to another inside the same company. For example, when you are on-boarding a new supplier, you need to add it into your ERP system. You need to define how to pay, when to pay, contact person, etc. With all these parameters, you will set the process of business for it. Will, you set the same process for all suppliers? Probably not. Some of the suppliers will send you invoices daily, some weekly, some monthly. Some invoices are paid with a credit card, others with bank transfer, etc. Will every company have the same process for the same supplier? Probably not. Can it change once it’s set? Of course!
How to integrate data extraction to fit those prerequisites?
Firstly, you need a solution that will learn directly from your system. You don’t want to add new steps to the existing process.
Secondly, you need continuously learning AI. You want it to reflect changes in your system as fast as possible
Lastly, you need a service that speaks your language. Your supplier A has an ID 1234 in your database. It would be nice if AI would tell you: “It is an invoice from supplier 1234.”. It would eliminate OCR errors and at least one database query.
All of that can be achieved with typless – the continuously learning AI for data extraction.
Let’s take a look at the example.
Prerequisites
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus le
To easily follow, we suggest that, before reading further, you do these following steps:
- Register at typless
- Log in
- Click the Settings tab in the side navigation bar to get API key
- Create a new document type named example:
- Click on the Documents tab in the side navigation bar
- Click on the +New button
- For the name set “example”
- For the OCR language select “English”
- Click on the Next button
- Leave fields as they are and click on the Confirm button
- Get code of the example Django project from this Github repository
git clone https://github.com/typless/continuous_learning.git
- Set the API key:
export API_KEY=YOUR_API_KEY_FROM_SETTINGS_TAB
- To install requirements:
pip install -r requirements.txt
- To run the development server:
python manage.py runserver
- To see the application, open localhost:8000.
Expenses tracking
Let’s say we, Awesome company Inc., track our expenses with a simple application which allows us to:
- see the list of our suppliers
- add a new supplier
- see the list of received invoices
- add a new received invoice
- see details of the selected received invoice
- confirm the received invoice
Although we are an Awesome company, we are still lazy. So we’ve decided to automate our data entry from received invoices. We want to keep our process the same, but we want to eliminate keystrokes. So what to do? We need a data extraction service. We’ve searched the web and decided to go with typless. So how to integrate it to meet the needs mentioned above?
Integrate into existing workflow
We don’t want to change our existing process. Nonetheless, we don’t want to mess our code too much. Before we added data extraction, our code in views looked like this:
# expenses/views.py
# ... other code ...
def received_invoices_view(request):
if request.method == 'GET':
received_invoices = models.ReceivedInvoice.objects.all()
context = {'received_invoices': received_invoices, 'form': CreateReceivedInvoiceForm()}
return render(request, 'expenses/received_invoices.html', context)
elif request.method == 'POST':
form = CreateReceivedInvoiceForm(request.POST, request.FILES)
if form.is_valid():
invoice = form.save()
# ###### Do data extraction with typless ######
# ##### End data extraction with typless ######
return redirect(f'/received-invoices/{invoice.id}')
else:
received_invoices = models.ReceivedInvoice.objects.all()
context = {'received_invoices': received_invoices, 'form': form}
return render(request, 'expenses/received_invoices.html', context)
else:
return HttpResponseNotAllowed(['GET', 'POST'])
# ... other code
Nothing special. When we added a newly received invoice file, we created a new ReceivedInvoice instance by calling form.save(). Then user was redirected to the details page to enter missing data and confirm the invoice. Therefore we’ve decided to added typless data extraction request after form.save(). Code now looks like this:
# expenses/views.py
# ... other code ...
def received_invoices_view(request):
if request.method == 'GET':
received_invoices = models.ReceivedInvoice.objects.all()
context = {'received_invoices': received_invoices, 'form': CreateReceivedInvoiceForm()}
return render(request, 'expenses/received_invoices.html', context)
elif request.method == 'POST':
form = CreateReceivedInvoiceForm(request.POST, request.FILES)
if form.is_valid():
invoice = form.save()
# ###### Do data extraction with typless ######
files = {
"file": (invoice.file.name, invoice.file.read(),),
}
request_data = {
"document_type_name": 'example',
"customer": 'test'
}
response = requests.post(
f'https://developers.typless.com/api/document-types/extract-data/',
files=files,
data=request_data,
headers={'Authorization': f'Token {os.getenv("API_KEY")}'}
)
fields = response.json()['extracted_fields']
supplier = [field for field in fields if field['name'] == 'supplier'][0]['values'][0]['value']
invoice_number = [field for field in fields if field['name'] == 'invoice_number'][0]['values'][0]['value']
issue_date = [field for field in fields if field['name'] == 'issue_date'][0]['values'][0]['value']
total_amount = [field for field in fields if field['name'] == 'total_amount'][0]['values'][0]['value']
invoice.typless_id = response.json()['object_id']
invoice.supplier_id = int(supplier) if supplier is not None else supplier
invoice.invoice_number = invoice_number
invoice.issue_date = datetime.datetime.strptime(issue_date, '%Y-%m-%d') if issue_date is not None else None
invoice.total_amount = float(total_amount) if total_amount is not None else None
invoice.save()
# ##### End data extraction with typless ######
return redirect(f'/received-invoices/{invoice.id}')
else:
received_invoices = models.ReceivedInvoice.objects.all()
context = {'received_invoices': received_invoices, 'form': form}
return render(request, 'expenses/received_invoices.html', context)
else:
return HttpResponseNotAllowed(['GET', 'POST'])
# ... other code
We’ve added a single API call in which we uploaded the invoice file and selected the document type. (We’ve created a document type named “example” following steps at the beginning of this guide.) After the response is returned, the ReceivedInvoice instance is updated with extracted data.
Test data extraction
If you haven’t already run the development server:
python manage.py runserver
and visit localhost:8000.
There are no suppliers yet. So add two of them:
- CircleCI with bank transfer as the payment method
- ScaleGrid with a credit card as the payment method
You will find example invoices in the folder named examples. Click on Received invoices link in the navigation bar. Choose file circleci1.pdf and click Submit.
After clicking Submit, you will land on the received invoice details page.
But hey, invoice data fields are still empty! Didn’t we just add data extraction? Of course, they are, typless has never been learning from your data. Let’s do it.
Continuously learning AI
typless needs to learn to extract data. Similar to data extraction, we can add continuous learning into our process. Before we added typless, code looked like this:
# expenses/views.py
# ... other code ...
def received_invoice_details(request, pk):
if request.method == 'GET':
invoice = models.ReceivedInvoice.objects.get(id=pk)
context = {'invoice': invoice, 'form': ReceivedInvoiceForm(instance=invoice)}
elif request.method == 'POST':
invoice = models.ReceivedInvoice.objects.get(id=pk)
form = ReceivedInvoiceForm(request.POST, instance=invoice)
if form.is_valid():
invoice = form.save()
# ###### Learn typless ######
# ###### Finish learning typless ######
context = {'invoice': invoice, 'form': ReceivedInvoiceForm(instance=invoice)}
else:
context = {'invoice': invoice, 'form': form}
else:
return HttpResponseNotAllowed(['GET', 'POST'])
return render(request, 'expenses/received_invoice_details.html', context)
Instance was just updated with the data from the request. As for data extraction, continuously learning of AI, needs a single REST API call. We’ve decided to insert it after instance update. Code now looks like this:
# expenses/views.py
# ... other code ...
def received_invoice_details(request, pk):
if request.method == 'GET':
invoice = models.ReceivedInvoice.objects.get(id=pk)
context = {'invoice': invoice, 'form': ReceivedInvoiceForm(instance=invoice)}
elif request.method == 'POST':
invoice = models.ReceivedInvoice.objects.get(id=pk)
form = ReceivedInvoiceForm(request.POST, instance=invoice)
if form.is_valid():
invoice = form.save()
# ###### Start typless learning######
if invoice.confirmed:
request_data = {
"document_type_name": 'example',
"customer": 'myself',
"learning_fields": [
'{"name": "supplier", "value": "%s"}' %invoice.supplier.id,
'{"name": "invoice_number", "value": "%s"}' % invoice.invoice_number,
'{"name": "issue_date", "value": "%s"}' % invoice.issue_date.strftime('%Y-%m-%d'),
'{"name": "total_amount", "value": "%.2f"}' % invoice.total_amount
],
}
requests.post(
"https://developers.typless.com/api/document-types/learn/",
data=request_data,
files={"document_object_id": (None, invoice.typless_id)},
headers={'Authorization': f'Token {os.getenv("API_KEY")}'}
)
# ###### Finish typless ######
context = {'invoice': invoice, 'form': ReceivedInvoiceForm(instance=invoice)}
else:
context = {'invoice': invoice, 'form': form}
else:
return HttpResponseNotAllowed(['GET', 'POST'])
return render(request, 'expenses/received_invoice_details.html', context)
If we take a look we can see:
- we added single API call
- we used data from our instance which is saved in the database
Confirm values and test extraction again
Now enter missing data into the form at received invoice details page and click on the Confirmbutton. Return to received invoices list by clicking on link Received invoices in the navigation bar. Choose file circleci2.pdf and click on the Submit button.
You should see that the extracted data are full and correct. Why is that? Because typless have learned from you.
You can try the same flow with scalegrid1.pdf and scalegrid2.pdf from example invoices. You can even use your invoices.
Conclusion
We have learned how to integrate typless, the continuously learning AI, into the existing workflow.
What we’ve achieved:
- workflow stays the same
- typless is learning from values in our database
- We have eliminated manual data entry
- typless has learned to return the ID of supplier for a direct database update
To do that you need to integrate the following steps:
- Request on data extraction endpoint when the user adds a new PDF invoice
- Save extracted data
- Save object_id from the response
- Request on learning endpoint when the user confirms data
- Use object_id from data extraction
That’s all! Happy OCRing!