API Documentation

Our web API methods are based on RESTful web service. If this technology is still unfamiliar to you, the following tutorials may be helpful: http://www.restapitutorial.com/ provides some basics, and http://rest.elkstein.org/2008/02/rest-examples-in-different-languages.html shows some examples using different programming languages. Besides the analysis API we also offer the File-Upload-Analysis service. The demo website can help you to get more intuitive impression of the analysis result and involved technologies . If you have any questions or difficulties by using our web services, please feel free to contact us.

Facebook Text Ratio API


Description: this REST API method detect the text ratio from image according to Facebook’s text policy: Ads and sponsored stories in the facebook News feed may not include images comprised of more than 20% text. Text in product shots does not count towards the 20% limit for an image.

URL (required): http://our-website-url/api/1.0/fb-tr/

X-api-key (required): you will obtain a unique key after the registration. This key is needed for the authentication of each API-call.

Request type (required): POST or PUT.

Parameters for the analysis engine:
- file (required): absolute path to your image file

- callback (optional): is a url given by client by which a POST request will be sent using this url and the analysis result can be obtained by reading the POST-parameter 'Result'. If the callback url not provided, the analysis result will then be returned via response of the API-call.

- uc (optional): True or False, default is True. When uc=False, then the classifier-based text verification process will be deactivated, which will increase the recall but lower the precision of the result.

- dct (optional): text detection confidence threshold. Restrict each detected text block with this confidence value. This parameter will take effect only if uc=True. Allowed value: a float number with range [0.0, 1.0].

- multid (optional): True or False, default is False. If multid=True, then text detection will be performed in multiple image channels, which will lead to find more text content but with longer processing time.

Support information
Supported image type: JPEG, PNG, TIFF
Maximum image file size: 10485760 bytes (10MB)

Example call:

curl -v -H "X-api-key:[your api key here]" 
-H "Content-Type: multipart/form-data"
-X POST https://our-website-url/api/1.0/fb-tr/
-F file=@"/path/to/your/image/test.png"
-F multid=true

Example result:

The correct result is a percentage number e.g. 50 indicates 50% text overlapped.

Demonstration:

Please check the SemaMediaData text grid tool.

Artificial Text OCR API


Description: this REST API method is developed for recognizing text content from images. It differs from conventional print OCR engines by using a set of sophisticated preprocessing such as text localisation, background reparation etc. It is suited for processing overlay text e.g. subtitle, caption text, and some of scene text occurred within video frames and images.

URL (required): http://our-website-url/api/1.0/ocr/

X-api-key (required): you will obtain a unique key after the registration. This key is needed for the authentication of each API-call.

Request type (required): POST or PUT.

Parameters for the analysis engine:
- file (required): absolute path to your image file

- lang (required): [language code], the supported language codes for this method include: en: English
de: German
spa: Spanish
fr: French
ita: Italian
rus: Russian
zh: Chinese Simplified
zh_t: Chinese Traditional

- callback (optional): is a url given by client by which a POST request will be sent using this url and the analysis result can be obtained by reading the POST-parameter 'Result'. If the callback url not provided, the analysis result will then be returned via a json or xml-based response of the API-call.

- sp (optional): True or False, default is False. When sp=True then a spell correction process will be performed. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- mh (optional): True or False, default is False. When mh=True then a multi-hypothesis analysis will be performed. This will increase the accuracy but also the execution time. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- uc (optional): True or False, default is True. When uc=False, then the classifier-based text verification process will be deactivated, which will increase the recall but lower the precision of the result.

- outform (optional): set 'xml' or 'json'-based result format, default is 'json'. The result contains detected text lines with their location information.

- df (optional): True or False, default is False. When df=True then the dictionary-based word-filtering process will be performed. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- itype (optional): Specify a image type, a proper configuration will be used for this type. The available types are:'overlay' for overlay text images; 'digital' for born-digital images; 'hd' for overlay text images with HD or larger resolution; 'other' a more general configuration. By default (without type specification) an adaptive algorithm will be used to select the configuration dynamically.

- noempty (optional): True or False, default is True. noempty=True means that only non-empty text objects detected will be written in the result list. To obtain the whole detected object list this option should be set to False.

Support information
Supported image type: JPEG, PNG, TIFF
Maximum image file size: 10485760 bytes (10MB)

Example call:

curl -v -H "X-api-key:[your api key here]" 
-H "Content-Type: multipart/form-data"
-X POST http://our-website-url/api/1.0/ocr/
-F file=@"/path/to/your/image/test.png"
-F lang=en
-F outform=json

Example result:

JSON-based result string read from POST['Result'] parameter:
                                
{"frames":
[{"framename":"test_4.png",
"results":[{"x":2181,"y":954,"width":270,"height":42,
"text":"example text."},
{"x":300,"y":464,"width":270,"height":69,
"text":"example text 2."}
]
}]
}
XML-based result string:
                                
<TextDetectionResults>
<TextObject><FrameName>test_3.png</FrameName>
<X>2179</X>
<Y>28</Y>
<Width>284</Width>
<Height>157</Height>
<Text>example text</Text>
</TextObject>
</TextDetectionResults>

Demonstration:

Please check the image OCR demo.

Scene Image OCR API


Description: this REST API method is developed for OCR in nature scene images.

URL (required): http://our-website-url/api/1.0/scene-ocr/

X-api-key (required): you will obtain a unique key after the registration. This key is needed for the authentication of each API-call.

Request type (required): POST or PUT.

Parameters for the analysis engine:
- file (required): absolute path to your image file

- lang (required): [language code], the supported language codes for this method include: en: English
- callback (optional): is a url given by client by which a POST request will be sent using this url and the analysis result can be obtained by reading the POST-parameter 'Result'. If the callback url not provided, the analysis result will then be returned via a json or xml-based response of the API-call.

- uc (optional): True or False, default is True. When uc=False, then the classifier-based text verification process will be deactivated, which will increase the recall but lower the precision of the result.

- outform (optional): set 'xml' or 'json'-based result format, default is 'json'. The result contains detected text lines with their location information.

- noempty (optional): True or False, default is False. noempty=True means that only non-empty text objects detected will be written in the result list. To obtain the whole detected object list this option should be set to False.

- scalef (optional): Image scale factor. If assigned the input image will be scaled by multiply this number. Value range in float: (0, 5]

- wct (optional): Word confidence threshold. Restrict each ocr word returned using its confidence value. If its confidence value is lower than the wct, then it will not be added to the final result. Allowed value: an integer number with range [0, 100], e.g. 50 means 50% reliability.

- dct (optional): text detection confidence threshold. Restrict each detected text block with this confidence value. This parameter will take effect only if uc=True. Allowed value: a float number with range [0.0, 1.0].

- multid (optional): True or False, default is False. If multid=True, then text detection will be performed in multiple image channels, which will lead to find more text content but with longer processing time.

- detonly (optional): True or False, default is False. If detonly=True, then only text localisation will be performed. The character recognition process will not be executed.

Support information
Supported image type: JPEG, PNG, TIFF
Maximum image file size: 10485760 bytes (10MB)

Example call:

curl -v -H "X-api-key:[your api key here]" 
-H "Content-Type: multipart/form-data"
-X POST http://our-website-url/api/1.0/scene-ocr/
-F file=@"/path/to/your/image/test.png"
-F lang=en
-F outform=json

Example result:

Please refer to Artificial Text OCR

Demonstration:

Please check the OCR in the wild demo.

Document Image OCR API


Description: this REST API method is suitable for document images OCR as e.g., personal ID-card.

URL (required): http://our-website-url/api/1.0/document-ocr/

X-api-key (required): you will obtain a unique key after the registration. This key is needed for the authentication of each API-call.

Request type (required): POST or PUT.

Parameters for the analysis engine:
- file (required): absolute path to your image file

- lang (required): [language code], the supported language codes for this method include: en: English
de: German
spa: Spanish
fr: French
ita: Italian
rus: Russian
zh: Chinese Simplified
zh_t: Chinese Traditional

- callback (optional): is a url given by client by which a POST request will be sent using this url and the analysis result can be obtained by reading the POST-parameter 'Result'. If the callback url not provided, the analysis result will then be returned via a json or xml-based response of the API-call.

- outform (optional): set 'xml' or 'json'-based result format, default is 'json'. The result contains detected text lines with their location information.

- noempty (optional): True or False, default is False. noempty=True means that only non-empty text objects detected will be written in the result list. To obtain the whole detected object list this option should be set to False.

- scalef (optional): Image scale factor. If assigned the input image will be scaled by multiply this number. Value range in float: (0, 5]

- wct (optional): Word confidence threshold. Restrict each ocr word returned using its confidence value. If its confidence value is lower than the wct, then it will not be added to the final result. Allowed value: an integer number with range [0, 100], e.g. 50 means 50% reliability.

Support information
Supported image type: JPEG, PNG, TIFF
Maximum image file size: 10485760 bytes (10MB)

Example call:

curl -v -H "X-api-key:[your api key here]" 
-H "Content-Type: multipart/form-data"
-X POST http://our-website-url/api/1.0/document-ocr/
-F file=@"/path/to/your/image/test.png"
-F lang=en
-F outform=json

Example result:

Please refer to Artificial Text OCR

Video Segmentation API


Description: Video shot boundary detection is used for separating a video stream into a set of individual scenes by detecting camera transitions automatically. Based on the result the user can obtain a fast overview on the video content by browsing extracted key-frames from each video scene. Furthermore, with the corresponding time information the user can directly navigate to the expected video content.

URL (required): http://our-website-url/api/1.0/video-sbd/

X-api-key (required): you will obtain a unique key after the registration. This key is needed for the authentication of each API-call.

Request type (required): POST or PUT.

Parameters for the analysis engine:
- file (required): absolute path to your video file

- callback (optional): is a url provided by client, after the analysis a POST request will be sent using this url and the client can obtain a download link of the analysis result by reading the POST-parameter 'Download_Link'. If the callback url has not been provided, you can also download the result from the task manager website.

- width (optional): set the width of the preview frame in px. (not smaller than 50px)

- height (optional): set the height of the preview frame in px. (not smaller than 50px)

- nocut (optional): True or False, default is False. nocut=True means that the submitted video is raw video material without any post-production. Therefore no hard or soft camera transition can be found in this video. For this case a specific method will be used for video segmentation task.

Support information
Supported video format: MP4, FLV, AVI, MOV, WMV
Maximum video file size: 3221225472 bytes (3GB)

Example call:

curl -v -H "X-api-key:[your api key here]" 
-H "Content-Type: multipart/form-data"
-X POST http://our-website-url/api/1.0/video-sbd/
-F file=@"/path/to/your/test_video.mp4"
-F callback=http://your-callback-url

Result structure:

The analysis result is a zipped file ready for download with a structure described as following:
                            
Example.zip/
│  
└── segmentation/
     ├── segments.csv
     └── images/1.jpg 2.jpg ...
                            
                        

Result demonstration:

Please check the video Segmentation Demo

Video OCR API


Description: video OCR is an analysis cascade which includes video segmentation (hard-cut), video text detection/recognition, and named entity recognition from video text (NER is a free add-on feature). The analysis result of this method enables automatic video retrieval and indexing as well as content-based video search in video archives. A detailed example can be found in our demo website.

URL (required): http://our-website-url/api/1.0/video-ocr/

X-api-key (required): you will obtain a unique key after the registration. This key is needed for the authentication of each API-call.

Request type (required): POST or PUT.

Parameters for the analysis engine:
- file (required): absolute path to your video file

- lang (required): [language code], the supported language codes for this method include: en: English
de: German
spa: Spanish
fr: French
ita: Italian
rus: Russian
zh: Chinese Simplified
zh_t: Chinese Traditional

- callback (optional): is a url provided by client, after the analysis a POST request will be sent using this url and the client can obtain a download link of the analysis result by reading the POST-parameter 'Download_Link'. If the callback url has not been provided, you can also download the result from the task manager website.

- sp (optional): True or False, default is False. When sp=True then a spell correction process will be performed. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- mh (optional): True or False, default is False. When mh=True then a multi-hypothesis analysis will be performed. This will increase the accuracy but also the execution time. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- uc (optional): True or False, default is True. When uc=False, then the classifier-based text verification process will be deactivated, which will increase the recall but lower the precision of the result.

- df (optional): True or False, default is False. When df=True then the dictionary-based word-filtering process will be performed. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- ner (optional): True or False, default is False. When ner=True then the named entity recognition will be performed on OCR results. Note that this is a free add-on feature. NER supported Languages are English, German, Chinese Simplified.

- itype (optional): Specify a image type, a proper configuration will be used for this type. The available types are:'overlay' for overlay text images; 'digital' for born-digital images; 'hd' for overlay text images with HD or larger resolution; 'other' a more general configuration. By default (without type specification) an adaptive algorithm will be used to select the configuration dynamically.

- nocut (optional): True or False, default is False. nocut=True means that the submitted video is raw video material without any post-production. Therefore no hard or soft camera transition can be found in this video. For this case a specific method will be used for video segmentation task.

Support information
Supported video format: MP4, FLV, AVI, MOV, WMV
Maximum video file size: 3221225472 bytes (3GB)

Example call:

curl -v -H "X-api-key:[your api key here]" 
-H "Content-Type: multipart/form-data"
-X POST http://our-website-url/api/1.0/video-ocr/
-F file=@"/path/to/your/test_video.mp4"
-F lang=en
-F callback=http://your-callback-url

Result structure:

The analysis result is a zipped file ready for download with a structure described as following:
                            
Example.zip/
├── segmentation/
│    ├── segments.csv
│    └── images/1.jpg 2.jpg ...
├── ocr/
│    └── recognition/recognition.xml
└── ner/
     └── ner.xml
                            
                        

Result demonstration:

Please check the video OCR demo.

Video Key-frame Extraction API


Description: the video key-frame extraction method also consists of a cascade of analysis processes, including video key-frame extraction based on a user defined key-frame amount, video text recognition and named entity detection from recognized video text (NER is a free add-on feature). The biggest difference from video OCR API is that it enables a rapid video summarization regardless the video length by using a predefined amount of key-frames.

URL (required): http://our-website-url/api/1.0/kf-extraction/

X-api-key (required): you will obtain a unique key after the registration. This key is needed for the authentication of each API-call.

Request type (required): POST or PUT.

Parameters for the analysis engine:
- file (required): absolute path to your video file

- lang (required): [language code], the supported language codes for this method include: en: English
de: German
spa: Spanish
fr: French
ita: Italian
rus: Russian
zh: Chinese Simplified
zh_t: Chinese Traditional

- keyframes (optional): defines the expected key-frame amount, if not provided the default value 30 will be used.

- callback (optional): is a url provided by client, after the analysis a POST request will be sent using this url and the client can obtain a download link of the analysis result by reading the POST-parameter 'Download_Link'. If the callback url has not been provided, you can also download the result from the task manager website.

- sp (optional): True or False, default is False. When sp=True then a spell correction process will be performed. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- mh (optional): True or False, default is False. When mh=True then a multi-hypothesis analysis will be performed. This will increase the accuracy but also the execution time. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- uc (optional): True or False, default is True. When uc=False, then the classifier-based text verification process will be deactivated, which will increase the recall but lower the precision of the result.

- df (optional): True or False, default is False. When df=True then the dictionary-based word-filtering process will be performed. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- ner (optional): True or False, default is False. When ner=True then the named entity recognition will be performed on OCR results. Note that this is a free add-on feature. NER supported Languages are English, German, Chinese Simplified.

- itype (optional): Specify a image type, a proper configuration will be used for this type. The available types are:'overlay' for overlay text images; 'digital' for born-digital images; 'hd' for overlay text images with HD or larger resolution; 'other' a more general configuration. By default (without type specification) an adaptive algorithm will be used to select the configuration dynamically.

Support information
Supported video format: MP4, FLV, AVI, MOV, WMV
Maximum video file size: 3221225472 bytes (3GB)

Example call:

curl -v -H "X-api-key:[your api key here]" 
-H "Content-Type: multipart/form-data"
-X POST http://our-website-url/api/1.0/kf-extraction/
-F file=@"/path/to/your/test_video.mp4"
-F lang=en
-F keyframes=20
-F callback=http://your-callback-url

Result structure:

The analysis result is a zipped file ready for download with a structure described as following:
                            
Example.zip/
├── keyframe-extraction/
│    ├── keyframes.csv
│    └── images/1.jpg 2.jpg ...
├── ocr/
│    └── recognition/recognition.xml
└── ner/
     └── ner.xml
                            
                        

Result demonstration:

Please check the video OCR demo.

Lecture Video Analysis API


Description: the lecture video analysis API method has been developed for analyzing the lecture slide recording which is produced by capturing slides displayed on the computer screen. The slide video analysis consists of slide transition detection, unique slide extraction, text recognition, lecture outline extraction from OCR text, and named entity detection from recognized video text (NER is a free add-on feature). The analysis results of this method enable the automatic lecture video browsing and content search in video archives. A detailed example can be found in our demo website.

URL (required): http://our-website-url/api/1.0/lecture-video-ocr/

X-api-key (required): you will obtain a unique key after the registration. This key is needed for the authentication of each API-call.

Request type (required): POST or PUT.

Parameters for the analysis engine:
- file (required): absolute path to your video file

- lang (required): [language code], the supported language codes for this method include: en: English
de: German
spa: Spanish
fr: French
ita: Italian
rus: Russian
zh: Chinese Simplified
zh_t: Chinese Traditional

- callback (optional): is a url provided by client, after the analysis a POST request will be sent using this url and the client can obtain a download link of the analysis result by reading the POST-parameter 'Download_Link'. If the callback url has not been provided, you can also download the result from the task manager website.

- df (optional): True or False, default is False. When df=True then the dictionary-based word-filtering process will be performed. Note that the supported languages are English, German, Spanish, French, Russian, Italian.

- ner (optional): True or False, default is False. When ner=True then the named entity recognition will be performed on OCR results. Note that this is a free add-on feature. NER supported Languages are English, German, Chinese Simplified.

Support information
Supported video format: MP4, FLV, AVI, MOV, WMV
Maximum video file size: 3221225472 bytes (3GB)

Example call:

curl -v -H "X-api-key:[your api key here]" 
-H "Content-Type: multipart/form-data"
-X POST http://our-website-url/api/1.0/lecture-video-ocr/
-F file=@"/path/to/your/test_lecture_video.mp4"
-F lang=en
-F callback=http://your-callback-url
-F df=True

Result structure:

The analysis result is a zipped file ready for download with a structure described as following:
                            
Example.zip/
├── thumbnails.json
├── thumbnails/
│    └── 1.jpg 2.jpg ...
├── thumbnails_small/
│    └── 1.jpg 2.jpg ...
├── segments.csv
├── outline.json
├── recognition
│    └── recognition.xml
└── ner/
     └── ner.xml
                            
                        

Result demonstration:

Please check the lecture video OCR demo.

Metadata and Responses

Status and Exception:

The 'Status' and 'Exception' parameter are provided in the api response object: e.g.
{"Status": "Failed", "Exception": "Invalid language code."}
Status: Pending or Failed. There will be no final status notification, instead of it the analysis result will be directly delivered to the client.
Exception: describes the detailed exception information if the request failed.

HTTP Response Codes:

Wikipedia article 'List of HTTP status codes'