AWS Lambda: OCR and Text Translation in the Cloud

In today’s article, I’ll show you how to combine two powerful AWS services – Rekognition and Translate – into a single Lambda function to automatically extract text from images stored in S3, and then translate that text into a language of your choice. In my project, we’ll convert an image with text into a text file, and then create a translated version of it.

This process is extremely useful when you are working with images that contain information in different languages. Because the lambda function automatically detects the language and translates the read text into the language you specify. I will use sample files from S3, some of which may be difficult to read – let’s see how AWS Rekognition handles it.

AWS rekognition – OCR with Translate in Lambda

How it works?

First, we will use AWS Rekognition to read the text from the image, and then Amazon Translate will translate it to the selected language – in my case, Polish. Finally, both text files (original and translated) will be saved in S3.

Requirements

To implement this project you need:

AWS accounts,
Rekognition, S3 and Translate permissions,
Graphic files saved in S3.

Lambda Function – code

Below you will find the code for the Lambda function that performs all the tasks described. I left the full explanation in the video, so I’m just adding the code here. Be sure to match the details in the code, such as the names of the S3 buckets or files.

import json
import boto3

def lambda_handler(event, context):
    rekognition_client = boto3.client('rekognition')
    s3_client = boto3.client('s3')
    translate_client = boto3.client('translate')
    
    bucket_name = 'rekognition-bucket-00235'
    object_key = 'OCR/Black_and_White_Minimalist_Coffee_Presentation.jpg'
    
    response = rekognition_client.detect_text(
        Image={
            'S3Object': {
                'Bucket': bucket_name,
                'Name': object_key
            }
        }
    )
    
    detected_text_lines = []
    for text in response['TextDetections']:
        if text['Type'] == 'LINE':
            detected_text_lines.append(text['DetectedText'])
            
    extracted_text = '\n'.join(detected_text_lines)
    
    new_file_key = object_key.rsplit('.',1)[0] + '_extracted_text.txt'
    
    s3_client.put_object(
        Bucket=bucket_name,
        Key=new_file_key,
        Body=extracted_text.encode('utf-8'),
        ContentType='text/plain; charset=utf-8'
    )
    
    translate_response = translate_client.translate_text(
        Text=extracted_text,
        SourceLanguageCode='auto',
        TargetLanguageCode='pl'
    )
    
    translated_text = translate_response['TranslatedText']
    new_file_key_translated = object_key.rsplit('.',1)[0] + '_translated_pl.txt'
    
    s3_client.put_object(
        Bucket=bucket_name,
        Key=new_file_key_translated,
        Body=translated_text.encode('utf-8'),
        ContentType='text/plain; charset=utf-8'
    )


    return {
        'statusCode': 200,
        'body': json.dumps(f'Text extracted and saved as {new_file_key}')
    }

Summary

Once you have implemented this code, you will be able to automatically extract text from images in S3 and translate it to the language you specify using AWS Rekognition and Translate. In my video, I explain each step in detail and show how the whole process works, so I encourage you to watch it. You will be able to see how AWS tools handle even the most difficult images.

Amazon Rekognition – Detecting Objects and Faces with AWS Lambda