Language Detection: HTML API

AlchemyAPI provides easy-to-use facilities for classifying any posted (uploaded) HTML page by language. Post (upload) any HTML content directly to our service for analysis.

Posted content is normalized / cleaned (removing ads, navigation links, and other unimportant content), and the primary language of the contained text is identified.

These API calls may be utilized to process posted (uploaded) webpages and other HTML content. If you are processing content hosted on a publicly accessible website, consider using our URL processing calls instead.

API Call: HTMLGetLanguage

Description: The HTMLGetLanguage call is utilized to detect the language utilized within a posted HTML document. AlchemyAPI will extract text from the posted HTML document structure (ignoring navigation links, advertisements, and other undesireable content), and perform language detection operations.

Endpoint: http://access.alchemyapi.com/calls/html/HTMLGetLanguage

Parameters:

http argument parameter description
apikey your private api key

(required parameter)
html HTML document content (must be uri-argument encoded)

(required parameter)
url HTML document URL

(optional parameter, for response tracking purposes. must be uri-argument encoded)
outputMode desired API output format

Possible values:
xml (default)
json
rdf

(optional parameter)
jsonp desired JSONP callback

(optional parameter, requires "outputMode" to be set to json)
sourceText where to obtain the text that will be processed by this API call.

AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries.

Possible values:
cleaned_or_raw cleaning enabled, fallback to raw when cleaning produces no text (default)
cleaned operate on 'cleaned' web page text (web page cleaning enabled)
raw operate on raw web page text (web page cleaning disabled)
cquery operate on the results of a visual constraints query

Note: The 'cquery' http argument must also be set to a valid visual constraints query.
xpath operate on the results of an XPath query

Note: The 'xpath' http argument must also be set to a valid XPath query.

(optional parameter)
cquery a visual constraints query to apply to the web page.

Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.

(optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded)
xpath an XPath query to apply to the web page.

XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.

(optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded)

Response Format (XML)

<results>
    <status>REQUEST_STATUS</status>
    <url>DOCUMENT_URL</url>
    <language>DETECTED_LANGUAGE</language>
    <iso-639-1>ISO_639_1_CODE</iso-639-1>
    <iso-639-2>ISO_639_2_CODE</iso-639-2>
    <iso-639-3>ISO_639_3_CODE</iso-639-3>
    <ethnologue>ETHNOLOGUE_URL</ethnologue>
    <native-speakers>NUM_NATIVE_SPEAKERS</native-speakers>
    <wikipedia>WIKIPEDIA_URL</wikipedia>
</results>

Response Format (JSON):

{
    "status": "REQUEST_STATUS",
    "url": "DOCUMENT_URL",
    "language": "DETECTED_LANGUAGE",
    "iso-639-1": "ISO_639_1_CODE",
    "iso-639-2": "ISO_639_2_CODE",
    "iso-639-3": "ISO_639_3_CODE",
    "ethnologue": "ETHNOLOGUE_URL",
    "native-speakers": "NUM_NATIVE_SPEAKERS",
    "wikipedia": "WIKIPEDIA_URL"
}

Response Format (RDF):

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                 xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
                 xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
    <rdf:Description rdf:ID="DOCUMENT_HASH">
        <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
        <aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
        <aapi:URL>DOCUMENT_URL</aapi:URL>
        <aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
        <aapi:ISO-639-1>ISO_639_1_CODE</aapi:ISO-639-1>
        <aapi:ISO-639-2>ISO_639_2_CODE</aapi:ISO-639-2>
        <aapi:ISO-639-3>ISO_639_3_CODE</aapi:ISO-639-3>
        <aapi:Ethnologue>ETHNOLOGUE_URL</aapi:Ethnologue>
        <aapi:NativeSpeakers>NUM_NATIVE_SPEAKERS</aapi:NativeSpeakers>
        <aapi:Wikipedia>WIKIPEDIA_URL</aapi:Wikipedia>
    </rdf:Description>
</rdf:RDF>

Response Fields:

field name field description
status success / failure status indicating whether the request was processed.

Possible values:
OK
ERROR
url http url information was requested for.
language detected language for the specified http url.

For a list of all languages (90+) that are detected, click here.
iso-639-1 ISO-639-1 code for the detected language.

For more information on ISO-639-1, click here.
iso-639-2 ISO-639-2 code for the detected language.

For more information on ISO-639-2, click here.
iso-639-3 ISO-639-3 code for the detected language.

For more information on ISO-639-3, click here.
ethnologue Link to Ethnologue containing information on the detected language.

For more information on Ethnologue, click here.
native-speakers Number of persons who natively speak the detected language.

Language statistics courtesy of Wikipedia.
wikipedia Link to the Wikipedia page for the detected language.
statusInfo failure status information (sent only if "status" == "ERROR").

Possible values:
invalid-api-key
page-is-not-html
content-exceeds-size-limit

API Notes:

  • 1. Calls to HTMLGetLanguage should be made using HTTP POST.
  • 2. HTTP POST calls should include the Content-Type header: application/x-www-form-urlencoded
  • 3. Posted HTML documents can be a maximum of 600 kilobytes. Larger documents will result in a "content-exceeds-size-limit" error response.
  • 4. A minimum of 15 characters of text must exist within the posted HTML document to perform language detection.


 - return to top of page -