Printed character recognition API documentation

Interface description

  • The end-to-end text recognition system based on the deep neural network model and iFlytek's self-developed industry-leading optical character recognition technology can convert printed fonts in pictures (from sources such as scanners or digital cameras) into scanned bodies and complex natural scenes. Text recognition is directly converted into editable text. Supports 32 languages including Chinese、English、Hungarian、French、German、Spanish etc.

  • Demo
    java
    python
    go
    nodejs
    C#

  • When you use printed character recognition, please follow these requirements:

Content Description
Transfer method http[s]
Request Address https://me-east-1.aicloudapi.com/v1/ocr
Request Line POST /v1/ocr HTTP/1.1
Request Line Signature mechanism, For details, please refer to Authentication Description
Character encoding UTF-8
Response format JSON
Image format jpg, jpeg, png, bmp, webp, tiff
Image size Minimum size: 1B; Maximum size: 10485760 B

Authentication Description

When using the business interface, the requester needs to sign the request, and the server verifies the validity of the request through the signature.

Authentication Method

Add authentication-related parameters after the request address.Please note that the values that affect the authentication result are URL, apiSecret, apiKey, and date. If you want to debug the authentication, you must debug according to the values given in the example.,The specific parameters are as follows::

Authentication parameters::

Parameter Type Required Description Example
host string yes requesting host me-east-1.aicloudapi.com
date string yes current timestamp, RFC1123 format Wed, 7th Dec 2022 08:18:43 GMT
authorization string yes Information related to the signature encoded by base64 (the signature is calculated based on hamc-sha256) Refer to the detailed generation rules below

• Format of authorization parameter generation:

1)Get the interface keys APIKey and APISecret.
After creating an account in iFLYTEK open platform,please visit the console page to obtain 32-bit strings.
2)The format of the parameter authorization base64 before encoding (authorization_origin) is as follows:

api_key="$api_key",algorithm="hmac-sha256",headers="host date request-line",signature="$signature"

Where API _ key is the APIKey obtained on the console, algorithm is the encryption algorithm (only hmac-sha256 is supported), and headers is the parameter involved in the signature(see the note below)。
A signature is a string that uses an encryption algorithm to sign the parameters that participate in the signature and uses base64 encoding. See below for details.

3)The signature origin field (signature_origin) rule is as follows:

The original signature field is formed by splicing three parameters of host, date, and request-line according to the format.
The format of the concatenation is (\ n is a newline character with a space after ’:’):

host: $host\ndate: $date\n$request-line

If

Requested url = "https://me-east-1.aicloudapi.com/v1/ocr"
date = "Wed, 07 Dec 2022 08:18:43 GMT"

Then the signature origin field (signature_origin) is:

host: me-east-1.aicloudapi.com
date: Wed, 07 Dec 2022 08:18:43 GMT
POST /v1/ocr HTTP/1.1

4)use that hmac-sha256 algorithm and combining the signature of the signature_origin by the apiSecret to obtain a signature dig signature_sha.

signature_sha=hmac-sha256(signature_origin,$apiSecret)

Where apiSecret is the APISecret obtained at the console.

5) Encode signature_sha with base64 encoding to get the final signature.

signature=base64(signature_sha)

If

APISecret = "apisecretXXXXXXXXXXXXXXXXXXXXXXX"	
date = "Wed, 07 Dec 2022 08:18:43 GMT"

Then the signature is

signature="J0D7cz4s+6lQpzNtT03BiZN1QEIhqZrSBKvk6W6nK5s="

6) Encode signature_sha with base64 encoding to get the final signature:

api_key="apikeyXXXXXXXXXXXXXXXXXXXXXXXXXX", algorithm="hmac-sha256", headers="host date request-line", signature="J0D7cz4s+6lQpzNtT03BiZN1QEIhqZrSBKvk6W6nK5s="

7)Finally, the authorization_origin is base64 encoded to obtain the final authorization parameter.

authorization = base64(authorization_origin)
Example result is:
authorization=YXBpX2tleT0iYXBpa2V5WFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFgiLCBhbGdvcml0aG09ImhtYWMtc2hhMjU2IiwgaGVhZGVycz0iaG9zdCBkYXRlIHJlcXVlc3QtbGluZSIsIHNpZ25hdHVyZT0iSjBEN2N6NHMrNmxRcHpOdFQwM0JpWk4xUUVJaHFaclNCS3ZrNlc2bks1cz0i

Authentication Result

If the authentication fails, different HTTP Code status codes will be returned according to different error types, and the error description information will be carried. The detailed error description is as follows::

HTTP Code Description Error description Solution
401 Missing authorization parameter {"message":"Unauthorized"} Check whether there is an authorization parameter.
401 Failed to resolve the signature parameters {“message”:”HMAC signature cannot be verified”} Check whether the parameters of the signature are correct, especially whether the API _ key copied below are correct
401 Failed to verify the signature { "message": "HMAC signature does not match" } Failed to verify the signature. There may be many reasons.
1. to check whether the API _ key and API _ secret are correct.
2.Check whether the parameters host, date, and request-line of the calculation signature are spliced according to the protocol requirements.
3. checks whether the base64 length of the signature is normal (normally 44 bytes).
403 Clock offset check failed {“message”:”HMAC signature cannot be verified, a valid date or x-date header is required for HMAC Authentication”} Check whether the server time is standard. If the difference is more than 5 minutes, this error will be reported.

Request Parameters

When calling a business interface, the following parameters need to be configured in the Http Request Body. The request data is a JSON string.

example:

{
	"header": {
		"app_id": "your appid",
		"status": 3
	},
	"parameter": {
		"ocr": {
			"language": "language=de",
			"ocr_output_text": {
				"encoding": "utf8",
				"compress": "raw",
				"format": "json"
			}
		}
	},
	"payload": {
		"image": {
			"encoding": "jpg",
			"image": "iVBORw0KGg······",
			"status": 3
		}
	}
}

Request parameter description:

Parameter Type Required Description
header object yes Used to upload platform parameters
header.app_id string yes appid information applied in iFLYTEK open platform
header.status int yes Request status, value: 3 (one-time transmission)
parameter object yes Used to upload service feature parameters
parameter.ocr object yes service alias
parameter.ocr.language string yes Language
parameter.ocr.ocr_output_text object yes Data format expectation, used to describe the related constraints such as the code of the returned result. Different data types have different constraint dimensions. There is a corresponding relationship between this object and the response result.
parameter.ocr.ocr_output_text.encoding string no pText encoding, optional values: UTF8 (default), gb2312
parameter.ocr.ocr_output_text.compress string no ext compression format, optional values: raw (default), gzip
parameter.ocr.ocr_output_text.format string no Text format, optional values: plain, JSON (default), XML
payload object yes Used to upload service feature parameters
payload.image object yes Input data
payload.image.encoding stringt no Image encoding, optional values: JPG: JPG format (default), JPEG: JPEG format, PNG: PNG format, BMP: BMP format, webp: webp format, Tiff: tiff format
payload.image.image string yes Image data, base64 encoding required, minimum size: 1B, maximum size: 10485760 B
payload.image.status int no Data status, optional value: 3 (one-time transfer)

Return Result

Return parameter example:

{
	"header": {
		"code": 0,
		"message": "success",
		"sid": "ocr000e583f@hu1847a3af5cd05c2882"
	},
	"payload": {
		"ocr_output_text": {
			"compress": "raw",
			"encoding": "utf8",
			"format": "json",
			"seq": "0",
			"status": "3",
			"text": "ewogICAiY2F......"
		}
	}
}

Returned parameter description:

Parameter Type Description
header object Parameters used to describe platform characteristics
header.code int 0 indicates that the session is successfully called (does not necessarily mean that the service is successfully called, and whether the service is successfully called is subject to the text field)
header.message string Description
header.sid string Unique ID of this sessionid
payload object Data segment, used to carry the data of the response
payload.ocr_output_text object Response data block
payload.ocr_output_text.compress string Text compression format
payload.ocr_output_text.encoding string Text compression format
payload.ocr_output_text.format string Text Format
payload.ocr_output_text.text string Text data returned, which needs to be base64 decoded
payload.ocr_output_text.status string Status Code

The decoded information of the payload. OCR _ output _ text. Text field base64 is as follows, please pay special attention to:

Parameter Type Description
version string Engine version number
category string Engine version number
pages array Page Collection
pages.height int The height of the page, in pixels
pages.width int The height of the page, in pixels
pages.exception int Exception information, 0 (normal), -1 (exception)
pages.angle float Rotation Angle, Range [0,360], Clockwise
pages.lines array Text line, if not detected, the field does not exist
pages.tables array Form, if not detected, the field does not exist
pages.checkboxes array check box, if not detected, the field does not exist
pages.seals array check box, if not detected, the field does not exist
pages.fingerprints array Fingerprint area, if not detected, this field does not exist
pages.graphs array Illustration, if not detected, the field does not exist
pages.headers array Header, if not detected, the field does not exist
pages.footers array Footer, if not detected, this field does not exist
pages.blocks array Paragraph. This field does not exist if no line of text is detected or if chunking is not enabled. Output according to the structure of blocks by default in the structured resume and contract document
pages.page_numbers array Page number. If not detected, the field does not exist
pages.expressions array Formula, if not detected, the field does not exist
pages.barcodes array Formula, if not detected, the field does not exist

pages.lines field**

Parameter Type Description
id int Text line number, an integer whose value range is greater than or equal to 0
coord array Location coordinates, at least 4 points
coord.x int Axis X
coord.y int Axis y
angle float Text line angle, value range [0-360] degrees
type string Text line data type (handwriting, print)
exception int Exception information (0: normal, -1: exception return)
content string Recognition Result
words array Recognition Result
words.content string Recognition Result
words.coord array Location coordinates, at least 4 points
words.coord.x int Axis X
words.coord.y int Axis y
word_units.content string Recognition Result
word_units.coord array Location coordinates, at least 4 points
word_units.coord.x int Axis X
word_units.coord.y int Axis Y

pages.tables field

Parameter Type Description
id int Table number, if the ID is the same, it means that they belong to the same table
coord array Location coordinates, at least 4 points
coord.x int Axis X
coord.y int Axis y
cols int Number of columns divided by the table
rows int Number of rows divided by the table
height_set array The set of table cell heights, in pixels.
width_set array The set of table cell widths, in pixels.
cells array The set of table cell widths, in pixels.
cells.coord array Position coordinates, at least four points
cells.coord.x int Axis X
cells.coord.y int Axis Y
cells.col int Column number of the cell
cells.row int Row number of the cell
cells.colspan int Number of columns spanned by the cell
cells.rowspan int Number of rows spanned by the cell
cells.elements array The collection of features inserted into the cell
cells.elements.id int The number of the inserted element in the cell
cells.elements.type string Type of other element inserted in the cell (table, graph, checkbox, seal, finger print, block paragraph)

pages.checkboxes field

Parameter Type Description
id int Check box number
coord object Location coordinates, at least 4 points
coord.x int Axis X
coord.y int Axis X
type string tick cross blank

pages.seals field

Parameter Type Description
id int Seal No
coord array Target area location information, at least 4 points
coord.x int Axis X
coord.y int Axis X
elements array Collection of inserted features in seal
elements.id int Number of the inserted element in the seal
elements.type string The type of element inserted in the seal, table, graph, checkbox, seal, fingerprint, block

pages.fingerprints field

Parameter Type Description
id int Fingerprint No
coord object Location coordinates, at least 4 points
coord.x int Axis X
coord.y int Axis Y

pages.graphs field

Parameter Type Description
id int Insert the number of the feature in the illustration
coord array Location information, at least 4 points
coord.x int Axis X
coord.y int Axis Y
elements array A collection of features inserted into an illustration
elements.id float Default: 1
elements.type string The type of feature to insert into the illustration. Optional value: block.

pages.headers field

Parameter Type Description
id int Header Number
coord array Target area location information, at least 4 points
coord.x int Axis X
coord.y int Axis Y
elements array Set of features inserted in the header
elements.id int Inserts the number of the feature in the header
elements.type string The type of element inserted in the header, table, graph, checkbox, seal, fingerprint, block.

pages.footers field

Parameter Type Description
id int Footer Number
coord array Location coordinates, at least 4 points
coord.x int Axis X
coord.y int Axis Y
elements array Set of elements to be inserted into the footer. Value range: min: 10 ~ Max: 100
elements.id int Insert Element Number
elements.type string The type of element inserted in the footer, table, graph, checkbox, seal, fingerprint, block

pages.blocks field

Parameter Type Description
id int Paragraph number. For a column, the text block area of the spread is the same number.
coord array Location coordinates, at least 4 points
coord.x int Axis X
coord.y int Axis Y
line_ids array Lines of text in a paragraph, indexed by ID in lines
line_ids.level int Level: Currently, it only appears in resumes and document structures. Indicates the number of nesting levels to which the current block belongs in the resume. An integer whose value range is greater than or equal to 1.
line_ids.parent_id int Parent Node: Currently, it only appears in resumes and structured documents. The parent node for the current block. An integer whose value range is greater than or equal to -1
line_ids.type string Category of paragraph block (currently only appears in resume and document structuring)
head; line

pages.page_numbers field

Parameter Type Description
id int Page Number
coord array Target area location information, at least 4 points
coord.x int Axis X
coord.y int Axis Y
elements array Set of features inserted in the page number
elements.id int Inserts the number of the feature in the page number
elements.type string The type of feature inserted in the page number

pages.expressions field

Parameter Type Description
id int Formula Number
coord array Target area location information, at least 4 points
coord.x int Axis X
coord.y int Axis Y

pages.barcodes field

Parameter Type Description
id int Bar code number
coord array Location coordinates, at least 4 points
coord.x int Axis X
coord.y int Axis Y
type string type:barcode、qrcode
content string Default: 1

language feature parameter list:

Language Parameter Language Parameter Language Parameter
Chinese language=ch_en English language=ch_en hungarian language=hu
German language=de French language=fr Japanese language=ja
Korean language=ko Spanish language=es Arabic language=ar
Portuguese language=pt hindi language=hi Indonesian language=id
Italian language=it Malaysian language=ms Russian language=ru
Thai language=th Turkish language=tr Vietnamese language=vi
Bulgarian language=bg Czech language=cs Dutch language=af
Greek language=el Polish language=pl Romanian language=ro
Swedish language=sv Tamil language=ta Bengali language=bn
Persian language=fa Urdu language=ur Danish language=da
Finnish language=fi Norwegian language=nb

Frequently Asked Questions

What is the main function of printed character recognition?

Answer: Convert the printed text in the picture into text that can be encoded by the computer.

What application platforms are supported for printed character recognition?

Answer: Web API application platform is currently supported.

Are there any requirements for pictures in printed character recognition ?

Answer: The image format supports JPG, JPEG, PNG, BMP, webp and tiff formats, and the image file size shall not exceed 4MB after base64 encoding.