AlchemyAPI provides easy-to-use facilities for categorizing any web page: Post (upload) any content directly to our service for analysis.
Posted content is normalized/cleaned (removing ads, navigation links, and other unimportant content), and the text is classified by topic.
These API calls may be utilized to process posted (uploaded) webpages and other HTML content. If you are processing content hosted on a publicly accessible website, consider using our URL processing calls instead.
Description: The HTMLGetCategory call is utilized to categorize a posted HTML document. AlchemyAPI will extract text and other important content from the posted HTML document structure and perform document categorization operations.
Endpoint: http://access.alchemyapi.com/calls/html/HTMLGetCategory
| http argument | parameter description | ||||||
|---|---|---|---|---|---|---|---|
| apikey | your private api key
(required parameter) |
||||||
| html | HTML document content (must be uri-argument encoded)
(required parameter) |
||||||
| url | HTML document URL (must be uri-argument encoded)
(optional parameter) |
||||||
| outputMode | desired API output format Possible values: xml (default) json rdf rel-tag rel-tag-raw (optional parameter) |
||||||
| jsonp | desired JSONP callback (optional parameter, requires "outputMode" to be set to json) |
||||||
| sourceText | where to obtain the text that will be processed by this API call. AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries. Possible values:
(optional parameter) |
||||||
| cquery | a visual constraints query to apply to the web page. Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description. (optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded) |
||||||
| xpath | an XPath query to apply to the web page. XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description. (optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded) |
||||||
| baseUrl | rel-tag output base http url (optional parameter, used with rel-tag or rel-tag-raw outputMode. must be uri-argument encoded) |
<results>
<status>REQUEST_STATUS</status>
<url>DOCUMENT_URL</url>
<category>DETECTED_CATEGORY</category>
<score>CATEGORY_SCORE</score>
</results>
{
"status": "REQUEST_STATUS",
"url": "DOCUMENT_URL",
"category": "DETECTED_CATEGORY",
"score": "CATEGORY_SCORE"
}
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
<rdf:Description rdf:ID="DOCUMENT_HASH">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
<aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
<aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
<aapi:URL>DOCUMENT_URL</aapi:URL>
<aapi:DocCateg>DETECTED_CATEGORY</aapi:DocCateg>
<aapi:CategScore>CATEGORY_SCORE</aapi:CategScore>
</rdf:Description>
</rdf:RDF>
<results>
<status>REQUEST_STATUS</status>
<url>DOCUMENT_URL</url>
<score>CATEGORY_SCORE</score>
<microformats>
<a href="REQUESTED_BASE_URL/DETECTED_CATEGORY" rel="tag">DETECTED_CATEGORY</a>
</microformats>
</results>
<a href="REQUESTED_BASE_URL/DETECTED_CATEGORY" rel="tag">DETECTED_CATEGORY</a>
| field name | field description |
|---|---|
| status | success / failure status indicating whether the request was processed. Possible values: OK ERROR |
| url | http url specified in the API request. |
| category | detected category. Possible values: ( click to see list) |
| score | confidence score for the detected category 0.0 .. 1.0 (higher is better). |
| statusInfo | failure status information (sent only if "status" == "ERROR"). Possible values: invalid-api-key page-is-not-html |