AWS Rekognition Text detection limited to 50 words
Feb 2022 - this post is permanently moved to https://architectfwd.com, my new site, and can be found here - https://architectfwd.com/architecture/cloud/amazon-web-services-aws/rekognition/2022/01/23/aws-rekognition-text-detection-limited/ please go and bookmark that site for all of my future content.
Approach
I create a Flask API and utilised boto.
rek = boto3.client('rekognition', region_name="us-east-1")
After that I took the image bytes directly and ran a detect_text call, not too tough.
Detect text result
I uploaded an image with a small number of words and was pleased with the result. However when uploading an image containing a paragraph I found that only a subset of the words were returned.
The limit is 50 words - "
The limit is 50 words - "
DetectText
can detect up to 50 words in an image."[0]Text result response
The response splits up items by Type, either "line" or "word", and has a parentID when a word, so I filtered just the lines like this:
if label['Type']=="LINE"
It works, great result, but a solution for a larger number of words makes me think of just running this through Tesseract OCR.
[0] - https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html (Last paragraph)
Cheers
Quintes