Keyword / Term Extraction: Web API
AlchemyAPI provides easy-to-use facilities for extracting topic keywords from your publicly-accessible web-based content. These URL processing calls automatically fetch the desired Internet webpage, normalize / clean it (removing ads, navigation links, and other unimportant content), and extract topic keywords.
These API calls may be utilized to process hosted webpages, blogs, and other publicly-accessible Internet content. If you are processing content that is not hosted on a public webserver, use our HTML API calls instead.
API Call: URLGetRankedKeywords
Description: The URLGetRankedKeywords call is utilized to extract a relevancy-ranked list of topic keywords from a given web page. AlchemyAPI will download the requested URL, extracting text from the HTML document structure (ignoring navigation links, advertisements, and other undesireable content), and perform keyword extraction operations.
Endpoint: http://access.alchemyapi.com/calls/url/URLGetRankedKeywords
Parameters:
| http argument | parameter description | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| url | http url (must be uri-argument encoded)
(required parameter) |
||||||||||
| apikey | your private api key
(required parameter) |
||||||||||
| maxRetrieve | maximum number of keywords to extract (default: 50)
(optional parameter) |
||||||||||
| keywordExtractMode | keyword extraction mode (normal or strict) Possible values: normal - normal keyword extraction mode (default) strict - strict keyword extraction mode (returns more "well-formed" keywords). refines results at the expense of returning fewer keywords. (optional parameter) |
||||||||||
| sentiment | whether to enable keyword-level sentiment analysis. Possible values: 1 - enabled 0 - disabled (default) (optional parameter - Note that enabling this option will incur usage of one (1) additional AlchemyAPI transaction) |
||||||||||
| outputMode | desired API output format Possible values: xml (default) json rdf rel-tag rel-tag-raw (optional parameter) |
||||||||||
| jsonp | desired JSONP callback (optional parameter, requires "outputMode" to be set to json) |
||||||||||
| showSourceText | whether to include the original 'source text' the keywords were extracted from within the API response. Possible values: 1 - enabled 0 - disabled (default) (optional parameter) |
||||||||||
| sourceText | where to obtain the text that will be processed by this API call. AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries. Possible values:
(optional parameter) |
||||||||||
| cquery | a visual constraints query to apply to the web page. Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description. (optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded) |
||||||||||
| xpath | an XPath query to apply to the web page. XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description. (optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded) |
||||||||||
| baseUrl | rel-tag output base http url (must be uri-argument encoded)
(optional parameter, used with rel-tag or rel-tag-raw outputMode.) |
Response Format (XML):
<results>
<status>REQUEST_STATUS</status>
<url>REQUESTED_URL</url>
<language>DOCUMENT_LANGUAGE</language>
<text>DOCUMENT_TEXT</text>
<keywords>
<keyword>
<text>DETECTED_KEYWORD</text>
<relevance>DETECTED_RELEVANCE</relevance>
<sentiment>
<type>SENTIMENT_LABEL</type>
<score>SENTIMENT_SCORE</score>
</sentiment>
</keyword>
</keywords>
</results>
Response Format (JSON):
{
"status": "REQUEST_STATUS",
"url": "REQUESTED_URL",
"language": "DOCUMENT_LANGUAGE",
"text": "DOCUMENT_TEXT",/text>
"keywords": [
{
"text": "DETECTED_KEYWORD",
"relevance": "DETECTED_RELEVANCE",
"sentiment": {
"type": "SENTIMENT_LABEL",
"score": "SENTIMENT_SCORE"
}
}
]
}
Response Format (RDF):
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
<rdf:Description rdf:ID="DOCUMENT_HASH">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
<aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
<aapi:URL>DOCUMENT_URL</aapi:URL>
<aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
<aapi:DocText>DOCUMENT_TEXT</aapi:DocText>
</rdf:Description>
<rdf:Description rdf:ID="DOCUMENT_HASH-KEYWORD_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#KeywordOccurrences"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:Relevance>DETECTED_RELEVANCE</aapi:Relevance>
<aapi:Name>DETECTED_KEYWORD</aapi:Name>
<aapi:Sentiment>
<rdf:Description rdf:about="#DOCUMENT_HASH-KEYWORD_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Sentiment"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:SentimentType>SENTIMENT_LABEL</aapi:SentimentType>
<aapi:SentimentScore>SENTIMENT_SCORE</aapi:SentimentScore>
</rdf:Description>
</aapi:Sentiment>
</rdf:Description>
</rdf:RDF>
Response Format (REL-TAG Microformat [XML-embedded] ):
<results>
<status>REQUEST_STATUS</status>
<url>REQUESTED_URL</url>
<language>DOCUMENT_LANGUAGE</language>
<text>DOCUMENT_TEXT</text>
<microformats>
<a href="REQUESTED_BASE_URL/DETECTED_KEYWORD" rel="tag">DETECTED_KEYWORD</a>
<a href="REQUESTED_BASE_URL/DETECTED_KEYWORD" rel="tag">DETECTED_KEYWORD</a>
</microformats>
</results>
Response Format (REL-TAG Microformat [raw] ):
<a href="REQUESTED_BASE_URL/DETECTED_KEYWORD" rel="tag">DETECTED_KEYWORD</a>
<a href="REQUESTED_BASE_URL/DETECTED_KEYWORD" rel="tag">DETECTED_KEYWORD</a>
Response Fields:
| field name | field description | ||||||
|---|---|---|---|---|---|---|---|
| status | success / failure status indicating whether the request was processed. Possible values: OK ERROR |
||||||
| language | the detected language that the source text was written in. | ||||||
| url | http url information was requested for. | ||||||
| relevance | relevance score for a detected keyword. Possible values: (0.0 - 1.0) [1.0 = most relevant] |
||||||
| text | the detected keyword text. | ||||||
| sentiment | sentiment for the detected keyword (sent only if keyword-level sentiment analysis is enabled)
|
||||||
| statusInfo | failure status information (sent only if "status" == "ERROR"). Possible values: invalid-api-key cannot-retrieve page-is-not-html |
Example Call:
XML: http://access.alchemyapi.com/calls/...
RDF: http://access.alchemyapi.com/calls/...
API Notes:
- 1. Calls to URLGetRankedKeywords can be made using HTTP GET or POST.
- 2. HTTP POST calls should include the Content-Type header: application/x-www-form-urlencoded
- 3. URL retrieval is attempted for a maximum of 10 seconds. Requests taking longer than this will result in a "cannot-retrieve" error response.
- 4. Requested HTML documents can be a maximum of 600 kilobytes. Larger documents will result in a "content-exceeds-size-limit" error response.
- 3. Language detection is performed on the retrieved document before attempting keyword extraction. A minimum of 15 characters of text must exist within the requested HTTP document to perform language detection.
- 4. Documents containing less than 15 characters of text are assumed to be English-language content.
- 5. Keyword extraction is currently supported for all languages listed on the language support page. Other non-supported language submissions will be rejected and an error response returned.
- 6. Enabling keyword-level sentiment analysis results in one additional transaction utilized against your daily API limit.
- 7. Sentiment analysis is currently available for English-language content only. Support for other languages is in development.
- return to top of page -
