Web API: Entity Extraction
AlchemyAPI provides easy-to-use facilities for extracting the semantic richness from your web-based content. These URL processing calls automatically fetch the desired Internet web page, normalize / clean it (removing ads, navigation links, and other unimportant content), detect the primary document language, and extract named entities, topics, and other content.
These API calls may be utilized to process hosted webpages, blogs and other publicly-accessible Internet content. If you are processing content that is not hosted on a public webserver, use our HTML and Text API calls instead.
API Call: URLGetRankedNamedEntities
Description: The URLGetRankedNamedEntities call is utilized to extract a grouped, relevancy-ranked list of named entities (people, companies, organizations, etc.) from a given web page. AlchemyAPI will download the requested URL, extracting text from the HTML document structure (ignoring navigation links, advertisements, and other undesireable content), and perform entity extraction operations.
Endpoint: http://access.alchemyapi.com/calls/url/URLGetRankedNamedEntities
Parameters:
| http argument |
parameter description |
| url |
http url (must be uri-argument encoded)
(required parameter)
|
| apikey |
your private api key
(required parameter)
|
| outputMode |
desired API output format
Possible values:
xml (default)
json
rdf
rel-tag
rel-tag-raw
(optional parameter)
|
| jsonp |
desired JSONP callback
(optional parameter, requires "outputMode" to be set to json)
|
| disambiguate |
whether to disambiguate detected entities.
Possible values:
1 - enabled (default)
0 - disabled
(optional parameter)
|
| linkedData |
whether to include Linked Data content links with disambiguated entities.
Possible values:
1 - enabled (default)
0 - disabled
(optional parameter. disambiguation must be enabled to utilize the linkedData feature.)
|
| coreference |
whether to resolve he/she/etc coreferences into detected entities.
Possible values:
1 - enabled (default)
0 - disabled
(optional parameter)
|
| quotations |
whether to enable quotations extraction.
Possible values:
1 - enabled
0 - disabled (default)
(optional parameter)
|
| sentiment |
whether to enable entity-level sentiment analysis.
Possible values:
1 - enabled
0 - disabled (default)
(optional parameter - Note that enabling this option will incur usage of one (1) additional AlchemyAPI transaction)
|
| sourceText |
where to obtain the text that will be processed by this API call.
AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries.
Possible values:
|
cleaned_or_raw
|
cleaning enabled, fallback to raw when cleaning produces no text (default)
|
|
cleaned
|
operate on 'cleaned' web page text (web page cleaning enabled)
|
|
raw
|
operate on raw web page text (web page cleaning disabled)
|
|
cquery
|
operate on the results of a visual constraints query
Note: The 'cquery' http argument must also be set to a valid visual constraints query.
|
|
xpath
|
operate on the results of an XPath query
Note: The 'xpath' http argument must also be set to a valid XPath query.
|
(optional parameter)
|
| showSourceText |
whether to include the original 'source text' the entities were extracted from within the API response.
Possible values:
1 - enabled
0 - disabled (default)
(optional parameter)
|
| cquery |
a visual constraints query to apply to the web page.
Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.
(optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded)
|
| xpath |
an XPath query to apply to the web page.
XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.
(optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded)
|
| maxRetrieve |
maximum number of named entities to extract (default: 50)
(optional parameter)
|
| baseUrl |
rel-tag output base http url
(optional parameter, used with rel-tag or rel-tag-raw outputMode. must be uri-argument encoded) |
Response Format (XML):
<results>
<status>REQUEST_STATUS</status>
<language>DOCUMENT_LANGUAGE</language>
<url>REQUESTED_URL</url>
<text>DOCUMENT_TEXT</text>
<entities>
<entity>
<type>DETECTED_TYPE</type>
<relevance>DETECTED_RELEVANCE</relevance>
<count>DETECTED_COUNT</count>
<text>DETECTED_ENTITY</text>
<disambiguated>
<name>DISAMBIGUATED_ENTITY</name>
<subType>ENTITY_SUBTYPE</subType>
<website>WEBSITE</website>
<geo>LATITUDE LONGITUDE</geo>
<dbpedia>LINKED_DATA_DBPEDIA</dbpedia>
<yago>LINKED_DATA_YAGO</yago>
<opencyc>LINKED_DATA_OPENCYC</opencyc>
<umbel>LINKED_DATA_UMBEL</umbel>
<freebase>LINKED_DATA_FREEBASE</freebase>
<ciaFactbook>LINKED_DATA_FACTBOOK</ciaFactbook>
<census>LINKED_DATA_CENSUS</census>
<geonames>LINKED_DATA_GEONAMES</geonames>
<musicBrainz>LINKED_DATA_MUSICBRAINZ</musicBrainz>
<crunchbase>CRUNCHBASE_WEB_LINK</crunchbase>
<semanticCrunchbase>LINKED_DATA_CRUNCHBASE</semanticCrunchbase>
</disambiguated>
<quotations>
<quotation>ENTITY_QUOTATION</quotation>
</quotations>
<sentiment>
<type>SENTIMENT_LABEL</type>
<score>SENTIMENT_SCORE</score>
<mixed>SENTIMENT_MIXED</mixed>
</sentiment>
</entity>
</entities>
</results>
Response Format (JSON):
{
"status": "REQUEST_STATUS",
"language": "DOCUMENT_LANGUAGE",
"url": "REQUESTED_URL",
"text": "DOCUMENT_TEXT",
"entities": [
"entity": {
"type": "DETECTED_TYPE",
"relevance": "DETECTED_RELEVANCE",
"count": "DETECTED_COUNT",
"text": "DETECTED_ENTITY"
"disambiguated": {
"name": "DISAMBIGUATED_ENTITY",
"subType": "ENTITY_SUBTYPE",
"website": "WEBSITE",
"geo": "LATITUDE LONGITUDE",
"dbpedia": "LINKED_DATA_DBPEDIA",
"yago": "LINKED_DATA_YAGO",
"opencyc": "LINKED_DATA_OPENCYC",
"umbel": "LINKED_DATA_UMBEL",
"freebase": "LINKED_DATA_FREEBASE",
"ciaFactbook": "LINKED_DATA_FACTBOOK",
"census": "LINKED_DATA_CENSUS",
"geonames": "LINKED_DATA_GEONAMES",
"musicBrainz": "LINKED_DATA_MUSICBRAINZ",
"crunchbase": "CRUNCHBASE_WEB_LINK",
"semanticCrunchbase": "LINKED_DATA_CRUNCHBASE"
},
"quotations": [
{
"quotation": "ENTITY_QUOTATION"
}
],
"sentiment": {
"type": "SENTIMENT_LABEL",
"score": "SENTIMENT_SCORE",
"mixed": "SENTIMENT_MIXED"
}
}
]
}
Response Format (RDF):
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
<rdf:Description rdf:ID="DOCUMENT_HASH">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
<aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
<aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
<aapi:URL>DOCUMENT_URL</aapi:URL>
<aapi:DocText>DOCUMENT_TEXT</aapi:DocText>
</rdf:Description>
<rdf:Description rdf:ID="DOCUMENT_HASH-ENTITY_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#EntityOccurrences"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:EntityType>DETECTED_TYPE</aapi:EntityType>
<aapi:Relevance>DETECTED_RELEVANCE</aapi:Relevance>
<aapi:NumOccurs>DETECTED_COUNT</aapi:NumOccurs>
<aapi:Name>DETECTED_ENTITY</aapi:Name>
<aapi:Disambiguation>
<rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Disambiguation"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:ResolvedName>DISAMBIGUATED_ENTITY</aapi:ResolvedName>
<aapi:SubType>ENTITY_SUBTYPE</aapi:SubType>
<aapi:URL>WEBSITE</aapi:URL>
<aapi:Geo>LATITUDE LONGITUDE</aapi:Geo>
<owl:sameAs rdf:resource="LINKED_DATA_DBPEDIA"/>
<owl:sameAs rdf:resource="LINKED_DATA_YAGO"/>
<owl:sameAs rdf:resource="LINKED_DATA_OPENCYC"/>
<owl:sameAs rdf:resource="LINKED_DATA_UMBEL"/>
<owl:sameAs rdf:resource="LINKED_DATA_FREEBASE"/>
<owl:sameAs rdf:resource="LINKED_DATA_FACTBOOK"/>
<owl:sameAs rdf:resource="LINKED_DATA_CENSUS"/>
<owl:sameAs rdf:resource="LINKED_DATA_GEONAMES"/>
<owl:sameAs rdf:resource="LINKED_DATA_MUSICBRAINZ"/>
<owl:sameAs rdf:resource="LINKED_DATA_CRUNCHBASE"/>
</rdf:Description>
</aapi:Disambiguation>
<aapi:Quotations>
<rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Quotations"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:Quotation>ENTITY_QUOTATION</aapi:Quotation>
</rdf:Description>
</aapi:Quotations>
<aapi:Sentiment>
<rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Sentiment"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:SentimentType>SENTIMENT_LABEL</aapi:SentimentType>
<aapi:SentimentScore>SENTIMENT_SCORE</aapi:SentimentScore>
<aapi:SentimentMixed>SENTIMENT_MIXED</aapi:SentimentMixed>
</rdf:Description>
</aapi:Sentiment>
</rdf:Description>
</rdf:RDF>
Response Format (REL-TAG Microformat [XML-embedded] ):
<results>
<status>REQUEST_STATUS</status>
<language>DOCUMENT_LANGUAGE</language>
<url>REQUESTED_URL</url>
<text>DOCUMENT_TEXT</text>
<microformats>
<a href="REQUESTED_BASE_URL/DETECTED_ENTITY" rel="tag">DETECTED_ENTITY</a>
<a href="REQUESTED_BASE_URL/DETECTED_ENTITY" rel="tag">DETECTED_ENTITY</a>
</microformats>
</results>
Response Format (REL-TAG Microformat [raw] ):
<a href="REQUESTED_BASE_URL/DETECTED_ENTITY" rel="tag">DETECTED_ENTITY</a>
<a href="REQUESTED_BASE_URL/DETECTED_ENTITY" rel="tag">DETECTED_ENTITY</a>
Response Fields:
| field name |
field description |
| status |
success / failure status indicating whether the request was processed.
Possible values:
OK
ERROR
|
| language |
the detected language that the source text was written in. |
| url |
http url information was requested for. |
| type |
the detected entity type.
Possible values: (click to see list)
|
| relevance |
relevance score for a detected entity.
Possible values: (0.0 - 1.0) [1.0 = most relevant] |
| count |
number of times an entity was seen within the source web page. |
| text |
the detected entity text. |
| disambiguated |
disambiguation information for the detected entity (sent only if disambiguation occurred)
| disambiguation field |
field description |
| name |
the disambiguated entity name. |
| subType |
the disambiguated entity subType
SubTypes expose additional ontological mappings for a detected entity, such as identification of a Person as a Politician or Athlete. |
| website |
the disambiguated entity website. |
| geo |
latitude longitude
the disambiguated entity geographic coordinates. |
| dbpedia |
sameAs link to DBpedia for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| yago |
sameAs link to YAGO for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| opencyc |
sameAs link to OpenCyc for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| umbel |
sameAs link to UMBEL for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| freebase |
sameAs link to Freebase for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| ciaFactbook |
sameAs link to the CIA World Factbook for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| census |
sameAs link to the US Census for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| geonames |
sameAs link to Geonames for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| musicBrainz |
sameAs link to MusicBrainz for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| crunchbase |
website link to CrunchBase for the disambiguated entity.
Note: Provided only for entities that exist in CrunchBase. |
| semanticCrunchbase |
sameAs link to Semantic CrunchBase for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
|
| quotations |
extracted quotations for the detected entity (sent only if quotations extraction is enabled)
| field |
field description |
| quotation |
quotation extracted for a particular named entity. |
|
| sentiment |
sentiment for the detected entity (sent only if entity-level sentiment analysis is enabled)
| field |
field description |
| type |
sentiment polarity: "positive", "negative", or "neutral" |
| score |
sentiment strength (0.0 == neutral) |
| mixed |
whether sentiment is mixed (both positive and negative) (1 == mixed) |
|
| statusInfo |
failure status information (sent only if "status" == "ERROR").
Possible values:
invalid-api-key
cannot-retrieve
page-is-not-html
|
Example Calls:
XML: http://access.alchemyapi.com/calls/...
RDF: http://access.alchemyapi.com/calls/...
API Notes:
- Calls to URLGetRankedNamedEntities can be made using HTTP GET or POST.
- HTTP POST calls should include the Content-Type header: application/x-www-form-urlencoded
- URL retrieval is attempted for a maximum of 10 seconds. Requests taking longer than this will result in a "cannot-retrieve" error response.
- Requested HTML documents can be a maximum of 600 kilobytes. Larger documents will result in a "content-exceeds-size-limit" error response.
- Language detection is performed on the retrieved document before attempting named entity extraction. A minimum of 15 characters of text must exist within the requested HTTP document to perform language detection.
- Documents containing less than 15 characters of text are assumed to be English-language content.
- Disambiguation of detected entities is enabled by default. Disambiguation information will be included for each entity that is successfully resolved.
- Entity extraction is currently supported for all languages listed on the language support page. Other non-supported language submissions will be rejected and an error response returned.
- Enabling entity-level sentiment analysis results in one additional transaction utilized against your daily API limit. Entity-level sentiment analysis is currently provided for both English and German-language content.
- Disambiguation and quotations extraction are currently available for English-language content only. Support for other languages is in development.
API Call: URLGetNamedEntities
- return to top of page -
Description: The URLGetNamedEntities call is utilized to extract named entities (people, companies, organizations, etc.) from a given web page. AlchemyAPI will download the requested URL, extracting text from the HTML document structure (ignoring navigation links, advertisements, and other undesireable content), and perform entity extraction operations.
Endpoint: http://access.alchemyapi.com/calls/url/URLGetNamedEntities
Parameters:
| http argument |
parameter description |
| url |
http url (must be uri-argument encoded)
(required parameter)
|
| apikey |
your private api key
(required parameter)
|
| outputMode |
desired API output format
Possible values:
xml (default)
json
rdf
(optional parameter)
|
| disambiguate |
whether to disambiguate detected entities.
Possible values:
1 - enabled (default)
0 - disabled
(optional parameter)
|
| linkedData |
whether to include Linked Data content links with disambiguated entities.
Possible values:
1 - enabled (default)
0 - disabled
(optional parameter. disambiguation must be enabled to utilize the linkedData feature.)
|
| coreference |
whether to resolve he/she/etc coreferences into detected entities.
Possible values:
1 - enabled (default)
0 - disabled
(optional parameter)
|
| quotations |
whether to enable quotations extraction.
Possible values:
1 - enabled
0 - disabled (default)
(optional parameter)
|
| sourceText |
where to obtain the text that will be processed by this API call.
AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries.
Possible values:
|
cleaned_or_raw
|
cleaning enabled, fallback to raw when cleaning produces no text (default)
|
|
cleaned
|
operate on 'cleaned' web page text (web page cleaning enabled)
|
|
raw
|
operate on raw web page text (web page cleaning disabled)
|
|
cquery
|
operate on the results of a visual constraints query
Note: The 'cquery' http argument must also be set to a valid visual constraints query.
|
|
xpath
|
operate on the results of an XPath query
Note: The 'xpath' http argument must also be set to a valid XPath query.
|
(optional parameter)
|
| showSourceText |
whether to include the original 'source text' the entities were extracted from within the API response.
Possible values:
1 - enabled
0 - disabled (default)
(optional parameter)
|
| cquery |
a visual constraints query to apply to the web page.
Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.
(optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded)
|
| xpath |
an XPath query to apply to the web page.
XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.
(optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded)
|
Response Format (XML):
<results>
<status>REQUEST_STATUS</status>
<language>DOCUMENT_LANGUAGE</language>
<url>REQUESTED_URL</url>
<text>DOCUMENT_TEXT</text>
<entities>
<entity>
<type>DETECTED_TYPE</type>
<start>START_POS</start>
<end>END_POS</end>
<text>DETECTED_ENTITY</text>
<disambiguated>
<name>DISAMBIGUATED_ENTITY</name>
<subType>ENTITY_SUBTYPE</subType>
<website>WEBSITE</website>
<geo>LATITUDE LONGITUDE</geo>
<dbpedia>LINKED_DATA_DBPEDIA</dbpedia>
<yago>LINKED_DATA_YAGO</yago>
<opencyc>LINKED_DATA_OPENCYC</opencyc>
<umbel>LINKED_DATA_UMBEL</umbel>
<freebase>LINKED_DATA_FREEBASE</freebase>
<ciaFactbook>LINKED_DATA_FACTBOOK</ciaFactbook>
<census>LINKED_DATA_CENSUS</census>
<geonames>LINKED_DATA_GEONAMES</geonames>
<musicBrainz>LINKED_DATA_MUSICBRAINZ</musicBrainz>
<crunchbase>CRUNCHBASE_WEB_LINK</crunchbase>
<semanticCrunchbase>LINKED_DATA_CRUNCHBASE</semanticCrunchbase>
</disambiguated>
<quotations>
<quotation>ENTITY_QUOTATION</quotation>
</quotations>
</entity>
</entities>
</results>
Response Format (JSON):
{
"status": "REQUEST_STATUS",
"language": "DOCUMENT_LANGUAGE",
"url": "REQUESTED_URL",
"text": "DOCUMENT_TEXT",
"entities": [
"entity": {
"type": "DETECTED_TYPE",
"start": "START_POS",
"end": "END_POS",
"text": "DETECTED_ENTITY"
"disambiguated": {
"name": "DISAMBIGUATED_ENTITY",
"subType": "ENTITY_SUBTYPE",
"website": "WEBSITE",
"geo": "LATITUDE LONGITUDE",
"dbpedia": "LINKED_DATA_DBPEDIA",
"yago": "LINKED_DATA_YAGO",
"opencyc": "LINKED_DATA_OPENCYC",
"umbel": "LINKED_DATA_UMBEL",
"freebase": "LINKED_DATA_FREEBASE",
"ciaFactbook": "LINKED_DATA_FACTBOOK",
"census": "LINKED_DATA_CENSUS",
"geonames": "LINKED_DATA_GEONAMES",
"musicBrainz": "LINKED_DATA_MUSICBRAINZ",
"crunchbase": "CRUNCHBASE_WEB_LINK",
"semanticCrunchbase": "LINKED_DATA_CRUNCHBASE"
},
"quotations": [
{
"quotation": "ENTITY_QUOTATION"
}
]
}
]
}
Response Format (RDF):
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
<rdf:Description rdf:ID="DOCUMENT_HASH">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
<aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
<aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
<aapi:URL>DOCUMENT_URL</aapi:URL>
<aapi:DocText>DOCUMENT_TEXT</aapi:DocText>
</rdf:Description>
<rdf:Description rdf:ID="DOCUMENT_HASH-ENTITY_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#EntityOccurrence"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:EntityType>DETECTED_TYPE</aapi:EntityType>
<aapi:TextStartPos>START_POS</aapi:TextStartPos>
<aapi:TextEndPos>END_POS</aapi:TextEndPos>
<aapi:Name>DETECTED_ENTITY</aapi:Name>
<aapi:Disambiguation>
<rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Disambiguation"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:ResolvedName>DISAMBIGUATED_ENTITY</aapi:ResolvedName>
<aapi:SubType>ENTITY_SUBTYPE</aapi:SubType>
<aapi:URL>WEBSITE</aapi:URL>
<aapi:Geo>LATITUDE LONGITUDE</aapi:Geo>
<owl:sameAs rdf:resource="LINKED_DATA_DBPEDIA"/>
<owl:sameAs rdf:resource="LINKED_DATA_YAGO"/>
<owl:sameAs rdf:resource="LINKED_DATA_OPENCYC"/>
<owl:sameAs rdf:resource="LINKED_DATA_UMBEL"/>
<owl:sameAs rdf:resource="LINKED_DATA_FREEBASE"/>
<owl:sameAs rdf:resource="LINKED_DATA_FACTBOOK"/>
<owl:sameAs rdf:resource="LINKED_DATA_CENSUS"/>
<owl:sameAs rdf:resource="LINKED_DATA_GEONAMES"/>
<owl:sameAs rdf:resource="LINKED_DATA_MUSICBRAINZ"/>
<owl:sameAs rdf:resource="LINKED_DATA_CRUNCHBASE"/>
</rdf:Description>
</aapi:Disambiguation>
<aapi:Quotations>
<rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Quotations"/>
<aapi:Doc>DOCUMENT_HASH</aapi:Doc>
<aapi:Quotation>ENTITY_QUOTATION</aapi:Quotation>
</rdf:Description>
</aapi:Quotations>
</rdf:Description>
</rdf:RDF>
Response Fields:
| field name |
field description |
| status |
success / failure status indicating whether the request was processed.
Possible values:
OK
ERROR
|
| url |
http url information was requested for. |
| language |
the detected language that the source text was written in. |
| type |
the detected entity type.
Possible values: (click to see list)
|
| start |
start offset (in bytes) of this entity in the text stream.
|
| end |
end offset (in bytes) of this entity in the text stream.
|
| text |
the detected entity text. |
| disambiguated |
disambiguation information for the detected entity (sent only if disambiguation occurred)
| disambiguation field |
field description |
| name |
the disambiguated entity name. |
| subType |
the disambiguated entity subType
SubTypes expose additional ontological mappings for a detected entity, such as identification of a Person as a Politician or Athlete. |
| website |
the disambiguated entity website. |
| geo |
latitude longitude
the disambiguated entity geographic coordinates. |
| dbpedia |
sameAs link to DBpedia for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| yago |
sameAs link to YAGO for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| opencyc |
sameAs link to OpenCyc for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| umbel |
sameAs link to UMBEL for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| freebase |
sameAs link to Freebase for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| ciaFactbook |
sameAs link to the CIA World Factbook for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| census |
sameAs link to the US Census for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| geonames |
sameAs link to Geonames for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| musicBrainz |
sameAs link to MusicBrainz for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
| crunchbase |
website link to CrunchBase for the disambiguated entity.
Note: Provided only for entities that exist in CrunchBase. |
| semanticCrunchbase |
sameAs link to Semantic CrunchBase for the disambiguated entity.
Note: Provided only for entities that exist in this linked data-set. |
|
| quotations |
extracted quotations for the detected entity (sent only if quotations extraction is enabled)
| field |
field description |
| quotation |
quotation extracted for a particular named entity. |
|
| statusInfo |
failure status information (sent only if "status" == "ERROR").
Possible values:
invalid-api-key
cannot-retrieve
page-is-not-html
|
Example Calls:
XML: http://access.alchemyapi.com/calls/...
RDF: http://access.alchemyapi.com/calls/...
API Notes:
- 1. Calls to URLGetNamedEntities can be made using HTTP GET or POST.
- 2. HTTP POST calls should include the Content-Type header: application/x-www-form-urlencoded
- 3. URL retrieval is attempted for a maximum of 10 seconds. Requests taking longer than this will result in a "cannot-retrieve" error response.
- 4. Requested HTML documents can be a maximum of 600 kilobytes. Larger documents will result in a "content-exceeds-size-limit" error response.
- 3. Language detection is performed on the retrieved document before attempting named entity extraction. A minimum of 15 characters of text must exist within the requested HTTP document to perform language detection.
- 4. Documents containing less than 15 characters of text are assumed to be English-language content.
- 5. Disambiguation of detected entities is enabled by default. Disambiguation information will be included for each entity that is successfully resolved.
- 6. Entity extraction is currently supported for all languages listed on the language support page. Other foreign-language submissions will be rejected and an error response returned.
- 7. Disambiguation and quotations extraction are currently available for English-language content only. Support for other languages is in development.
API Call: URLGetAnnotatedNamedEntityText
- return to top of page -
Description: The URLGetAnnotatedNamedEntityText call is utilized to annotate webpage text with detected named entities (people, companies, organizations, etc.). AlchemyAPI will download the requested URL, extracting text from the HTML document structure (ignoring navigation links, advertisements, and other undesireable content), perform entity extraction operations, and annotate the original webpage text according to the provided variable substitution template.
Endpoint: http://access.alchemyapi.com/calls/url/URLGetAnnotatedNamedEntityText
Parameters:
| http argument |
parameter description |
| url |
http url (must be uri-argument encoded)
(required parameter)
|
| apikey |
your private api key
(required parameter)
|
| template |
the annotation template to apply to the text we're marking up. this template is applied to each detected entity in the source text. Variable substitution is utilized to integrate entity information into the annotated text.
Supported substitution variables:
| variable |
variable description |
| $ENTITY |
the detected named entity text |
| $ENCODED_ENTITY |
the detected named entity text (uri-argument encoded) |
| $TRIMMED_ENTITY |
the detected named entity text (punctuation-trimmed) |
| $ENCODED_TRIMMED_ENTITY |
the detected named entity text (punctuation-trimmed, uri-argument encoded) |
| $RESOLVED_ENTITY |
the resolved, disambiguated named entity |
| $ENCODED_RESOLVED_ENTITY |
the resolved, disambiguated named entity (uri-argument encoded) |
| $TYPE |
the detected named entity type |
(optional parameter)
|
| outputMode |
desired API output format
Possible values:
xml (default)
json
rdf
(optional parameter)
|
| disambiguate |
whether to disambiguate detected entities.
Possible values:
1 - enabled (default)
0 - disabled
(optional parameter)
|
| coreference |
whether to resolve he/she/etc coreferences into detected entities.
Possible values:
1 - enabled (default)
0 - disabled
(optional parameter)
|
| quotations |
whether to enable quotations extraction.
Possible values:
1 - enabled
0 - disabled (default)
(optional parameter)
|
| sourceText |
where to obtain the text that will be processed by this API call.
AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries.
Possible values:
|
cleaned_or_raw
|
cleaning enabled, fallback to raw when cleaning produces no text (default)
|
|
cleaned
|
operate on 'cleaned' web page text (web page cleaning enabled)
|
|
raw
|
operate on raw web page text (web page cleaning disabled)
|
|
cquery
|
operate on the results of a visual constraints query
Note: The 'cquery' http argument must also be set to a valid visual constraints query.
|
|
xpath
|
operate on the results of an XPath query
Note: The 'xpath' http argument must also be set to a valid XPath query.
|
(optional parameter)
|
| showSourceText |
whether to include the original 'source text' the entities were extracted from within the API response.
Possible values:
1 - enabled
0 - disabled (default)
(optional parameter)
|
| cquery |
a visual constraints query to apply to the web page.
Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.
(optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded)
|
| xpath |
an XPath query to apply to the web page.
XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.
(optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded)
|
Response Format (XML):
<results>
<status>REQUEST_STATUS</status>
<url>REQUESTED_URL</url>
<language>DETECTED_LANGUAGE</language>
<text>DOCUMENT_TEXT</text>
<annotatedText>ANNOTATED_TEXT</annotatedText>
</results>
Response Format (JSON):
{
"status": "REQUEST_STATUS",
"url": "REQUESTED_URL",
"language": "DETECTED_LANGUAGE",
"text": "DOCUMENT_TEXT",
"annotatedText": "ANNOTATED_TEXT"
}
Response Format (RDF):
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
<rdf:Description rdf:ID="DOCUMENT_HASH">
<rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
<aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
<aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
<aapi:URL>DOCUMENT_URL</aapi:URL>
<aapi:DocText>DOCUMENT_TEXT</aapi:DocText>
<aapi:AnnotatedDocText>ANNOTATED_TEXT</aapi:AnnotatedDocText>
</rdf:Description>
</rdf:RDF>
Response Fields:
| field name |
field description |
| status |
success / failure status indicating whether the request was processed.
Possible values:
OK
ERROR
|
| url |
http url information was requested for. |
| language |
the detected language that the source text was written in. |
| annotatedText |
the source text annotated with all identified named entities. Text annotation is controlled by the provided template parameter. |
| statusInfo |
failure status information (sent only if "status" == "ERROR").
Possible values:
invalid-api-key
cannot-retrieve
page-is-not-html
|
Example Call:
XML: http://access.alchemyapi.com/calls/...
RDF: http://access.alchemyapi.com/calls/...
API Notes:
- Calls to URLGetAnnotatedNamedEntityText can be made using HTTP GET or POST.
- HTTP POST calls should include the Content-Type header: application/x-www-form-urlencoded
- URL retrieval is attempted for a maximum of 10 seconds. Requests taking longer than this will result in a "cannot-retrieve" error response.
- Requested HTML documents can be a maximum of 600 kilobytes. Larger documents will result in a "content-exceeds-size-limit" error response.
- Language detection is performed on the retrieved document before attempting text annotation. A minimum of 15 characters of text must exist within the requested HTTP document to perform language detection.
- Documents containing less than 15 characters of text are assumed to be English-language content.
- Disambiguation of detected entities is enabled by default. Disambiguation information will be made available for variable substitution for each entity that is successfully resolved.
- Entity extraction is currently supported for all languages listed on the language support page. Other foreign-language submissions will be rejected and an error response returned.
- Disambiguation and quotations extraction are currently available for English-language content only. Support for other languages is in development.