AI’s precision – Every user wants to have precise predictions from AI engines. That is not a secret. Consequently, everyone is looking for the highest % numbers. The question is: Is an accuracy of 98% really better than 92%? In this article we will discuss the following topics:
- what accuracy means in terms of AI,
- how accuracy is defined,
- the business value of AI as a function of accuracy definition.
AI’s precision – evaluation of performance
Whenever we are building an AI we need to somehow evaluate its performance. To measure it, we can use many different metrics. The most naive approach is to measure like this:
precision = correct predictions / all predictions
Let’s say we have two AI engines extracting the total amount from invoices. We will call them Willson and Berry. If we measure precision like that, Willson’s precision is 98 % and Berry’s precision is 90%. When they both process 100 invoices:
- Willson will predict 98 total amounts correctly.
- Berry will predict 90 total amounts correctly.
Which AI’s precision is better? Of course Willson’s.
Which AI’s precision is better? Of course, it is Willson’s.
Knowing that this is a classification problem, we can acknowledge the problem of false positives and false negatives. For example:
- Willson extracted a 10x bigger number as the total amount of an invoice – false positive
- Barry didn’t extract the total amount from an invoice – false negative
Which AI’s precision is better from this perspective?
In terms of research, the only mistake Willson made was its prediction. It was one of two wrong predictions on the set of 100 invoices. It works as expected. On the other hand, from a business perspective, Willson’s wrong prediction can become a bigger problem. With that, we can conclude that Willson is worse than Berry.
Adding more context to AI’s precision
As acknowledged in the previous section, context is critical when evaluating AI. You need an AI that supports/simplifies your business.
Invoice data extraction
Let’s say that we are a company called Fast retail. We receive a considerable amount of invoices from our suppliers every month. Therefore we’ve decided to start automating our process of manual data entry. We need to extract the following data fields:
- invoice number
- issue date
- pay due date
- total amount
When we’ve researched solutions, we’ve found many different claims about accuracy. Some of them claim to achieve 97% extraction accuracy, whatever that means. Some of them are claiming to reduce keystrokes by 80%. Being confused about all of it, we’ve decided to define our business process first.
How are we receiving invoices?
Let’s examine our process. We are receiving 70% of our invoices through email. We have an employee, let’s call him John, who downloads PDF files from an email inbox or scans invoices delivered by regular mail daily. Then he opens our ERP software, enters data from invoices, and attaches PDFs. After that, our CFO checks each invoice to confirm the invoice and executes a payment. Then our accountant does his job.
What are we automating?
Knowing our workflow, we’ve decided to automate manual data entry first. John has a lot of work to do. Consequently, reducing the time spent on manual data entry by 70% would mean he could do a lot more. Our CFO will still be confirming invoices and our accountant will still be doing the bookkeeping.
The question is – how to set up data extraction automation to gain the most out of it? To do that we need to:
- Reduce the number of invoices that need to be downloaded/scanned.
- Reduce the number of invoices that need to be seen by John.
- Visually communicate potentially wrong extraction
How to evaluate AI’s precision in our context?
As we’ve seen in the first paragraph, AI’s accuracy can be evaluated with many different approaches. Let’s evaluate it with our context.
97% correct extractions per field
We’ve previously said that we will extract 5 fields. Certainly, an AI’s accuracy of 97% sounds great. Nevertheless, dig deeper. We are extracting 5 fields and each one correctly extracts values in 98 out of 100 invoices. As a result, probability, that all data extracted from a single invoice is correct, can be calculated like this: 0.97 × 0.97 × 0.97 × 0.97 × 0.97 = 0.86 → 86%. Therefore, John will need to check all of the invoices to be sure the data reaching the CFO is correct. Using such an AI engine won’t result in a decreasing the number of invoices that need to be seen by John.
Keystrokes reduced by 80%
A keystroke is a single press of a key on a keyboard. The number of keystrokes is a measure that is widely used when describing manual data entry. Therefore, keystroke reduction is used to measure an AI’s accuracy of data extraction. The problem with this metric is that it is misleading. Let’s take a look at an example.
There was an OCR error in invoice number data extraction. The extracted value was 101-1234-17 instead of 101-1234-11. You will need 2 keystrokes to correct that mistake instead of 11 for manual data entry. Will you spot an error like this? You will, but you would spot it much easier if the input field was blank or even red-colored. Knowing that keystrokes are reduced by 80% does not tell you how much time will be saved. John will still need to check every invoice – he may be even more likely to miss errors.
90% of invoices have all data correct
Now we are getting somewhere. It is a much better metric than an AI’s accuracy per field. It directly tells us – 90 out of 100 invoices will have all their data extracted correctly. The problem here is, John still does not know which invoices need to be checked. So he will still check every invoice. As a result, the number of invoices seen by John won’t be reduced, although he will save time compared to pure manual entry.
Business orientated AI
As we saw in the previous section, it is hard to fulfill a business’s needs in terms of AI’s precision. Although it is hard, it is not impossible. All the taken measurements had at least one pitfall – the number of invoices seen by John was not decreased. To define the metrics that will show us the best solution, we need to put our business in the center of measurement. Consequently, we need to stop chasing numbers. So, how do we measure an AI’s performance regarding automation?
AI’s precision of 100% is a myth
Firstly, we must embrace the fact that nothing in this world is 100% correct. Not even John’s manual data entry. On the other hand, we can find a subset of our invoices in which data extraction will be at least as accurate as John’s typing into an ERP. An AI’s precision will never be 100%, but it can be good enough, so that it requires only the CFO to check it.
Integration into an existing workflow
We’ve fulfilled one requirement, but two are still waiting for us. So how should we: Reduce the number of invoices that need to be downloaded/scanned? To do that we need to automate the reading of an e-mail. That is a trivial task for any developer. What is important in terms of AI is that files received from an e-mail or scanner can be directly uploaded into the data extraction process. The best option is a well documented single call REST API. As we see, an AI must be evaluated from the developer’s perspective too. Easier integration, easier automation.
Never miss the error
Last but not least is the visual communication of potential errors. Let’s face it if John needs to check the invoice he needs to check it. Checking the correctness of data takes time for locating data on the document, time for checking if the extracted value is correct and time for correcting errors. Knowing the false positive problems, we can agree that it is better to have a blank field than a wrong value. John will spot it easier.
Start automation journey today
It is important to know your work process before implementing an automation solution. It can save you a lot of time. Nevertheless, don’t forget that it is not all about the numbers. Context is very important. That is where typless shines the most. It is indeed a Business orientated AI.