Back to Top

Web API

Web API: Entity Extraction

AlchemyAPI provides easy-to-use facilities for extracting the semantic richness from your web-based content. These URL processing calls automatically fetch the desired Internet web page, normalize / clean it (removing ads, navigation links, and other unimportant content), detect the primary document language, and extract named entities, topics, and other content. These API calls may be utilized to process hosted webpages, blogs and other publicly-accessible Internet content. If you are processing content that is not hosted on a public webserver, use our HTML and Text API calls instead.

URLGetRankedNamedEntities Extract a grouped, relevancy-ranked list of named entities from a web page.

API Call: URLGetRankedNamedEntities

Description: The URLGetRankedNamedEntities call is utilized to extract a grouped, relevancy-ranked list of named entities (people, companies, organizations, etc.) from a given web page. AlchemyAPI will download the requested URL, extracting text from the HTML document structure (ignoring navigation links, advertisements, and other undesireable content), and perform entity extraction operations.

Endpoint: http://access.alchemyapi.com/calls/url/URLGetRankedNamedEntities

Parameters:

http argument parameter description
url http url (must be uri-argument encoded)

(required parameter)
apikey your private api key

(required parameter)
outputMode desired API output format

Possible values:
xml (default)
json
rdf
rel-tag
rel-tag-raw

(optional parameter)
jsonp desired JSONP callback

(optional parameter, requires "outputMode" to be set to json)
disambiguate whether to disambiguate detected entities.

Possible values:
1 - enabled (default)
0 - disabled

(optional parameter)
linkedData whether to include Linked Data content links with disambiguated entities.

Possible values:
1 - enabled (default)
0 - disabled

(optional parameter. disambiguation must be enabled to utilize the linkedData feature.)
coreference whether to resolve he/she/etc coreferences into detected entities.

Possible values:
1 - enabled (default)
0 - disabled

(optional parameter)
quotations whether to enable quotations extraction.

Possible values:
1 - enabled
0 - disabled (default)

(optional parameter)
sentiment whether to enable entity-level sentiment analysis.

Possible values:
1 - enabled
0 - disabled (default)

(optional parameter - Note that enabling this option will incur usage of one (1) additional AlchemyAPI transaction)
sourceText where to obtain the text that will be processed by this API call.

AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries.

Possible values:
cleaned_or_raw cleaning enabled, fallback to raw when cleaning produces no text (default)
cleaned operate on 'cleaned' web page text (web page cleaning enabled)
raw operate on raw web page text (web page cleaning disabled)
cquery operate on the results of a visual constraints query

Note: The 'cquery' http argument must also be set to a valid visual constraints query.
xpath operate on the results of an XPath query

Note: The 'xpath' http argument must also be set to a valid XPath query.
(optional parameter)
showSourceText whether to include the original 'source text' the entities were extracted from within the API response.

Possible values:
1 - enabled
0 - disabled (default)

(optional parameter)
cquery a visual constraints query to apply to the web page.

Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.

(optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded)
xpath an XPath query to apply to the web page.

XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.

(optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded)
maxRetrieve maximum number of named entities to extract (default: 50)

(optional parameter)
baseUrl rel-tag output base http url

(optional parameter, used with rel-tag or rel-tag-raw outputMode. must be uri-argument encoded)

Response Format (XML):

<results>
    <status>REQUEST_STATUS</status>
    <language>DOCUMENT_LANGUAGE</language>
    <url>REQUESTED_URL</url>
    <text>DOCUMENT_TEXT</text>
    <entities>
        <entity>
            <type>DETECTED_TYPE</type>
            <relevance>DETECTED_RELEVANCE</relevance>
            <count>DETECTED_COUNT</count>
            <text>DETECTED_ENTITY</text>
            <disambiguated>
                <name>DISAMBIGUATED_ENTITY</name>
                <subType>ENTITY_SUBTYPE</subType>
                <website>WEBSITE</website>
                <geo>LATITUDE LONGITUDE</geo>
                <dbpedia>LINKED_DATA_DBPEDIA</dbpedia>
                <yago>LINKED_DATA_YAGO</yago>
                <opencyc>LINKED_DATA_OPENCYC</opencyc>
                <umbel>LINKED_DATA_UMBEL</umbel>
                <freebase>LINKED_DATA_FREEBASE</freebase>
                <ciaFactbook>LINKED_DATA_FACTBOOK</ciaFactbook>
                <census>LINKED_DATA_CENSUS</census>
                <geonames>LINKED_DATA_GEONAMES</geonames>
                <musicBrainz>LINKED_DATA_MUSICBRAINZ</musicBrainz>
                <crunchbase>CRUNCHBASE_WEB_LINK</crunchbase>
                <semanticCrunchbase>LINKED_DATA_CRUNCHBASE</semanticCrunchbase>
            </disambiguated>
            <quotations>
                <quotation>ENTITY_QUOTATION</quotation>
            </quotations>
            <sentiment>
                <type>SENTIMENT_LABEL</type>
                <score>SENTIMENT_SCORE</score>
                <mixed>SENTIMENT_MIXED</mixed>
            </sentiment>
        </entity>
    </entities>
</results>

Response Format (JSON):

{
    "status": "REQUEST_STATUS",
    "language": "DOCUMENT_LANGUAGE",
    "url": "REQUESTED_URL",
    "text": "DOCUMENT_TEXT",
    "entities": [
        "entity": {
            "type": "DETECTED_TYPE",
            "relevance": "DETECTED_RELEVANCE",
            "count": "DETECTED_COUNT",
            "text": "DETECTED_ENTITY"
            "disambiguated": {
                "name": "DISAMBIGUATED_ENTITY",
                "subType": "ENTITY_SUBTYPE",
                "website": "WEBSITE",
                "geo": "LATITUDE LONGITUDE",
                "dbpedia": "LINKED_DATA_DBPEDIA",
                "yago": "LINKED_DATA_YAGO",
                "opencyc": "LINKED_DATA_OPENCYC",
                "umbel": "LINKED_DATA_UMBEL",
                "freebase": "LINKED_DATA_FREEBASE",
                "ciaFactbook": "LINKED_DATA_FACTBOOK",
                "census": "LINKED_DATA_CENSUS",
                "geonames": "LINKED_DATA_GEONAMES",
                "musicBrainz": "LINKED_DATA_MUSICBRAINZ",
                "crunchbase": "CRUNCHBASE_WEB_LINK",
                "semanticCrunchbase": "LINKED_DATA_CRUNCHBASE"
            },
            "quotations": [
                {
                    "quotation": "ENTITY_QUOTATION"
                }
            ],
            "sentiment": {
                "type": "SENTIMENT_LABEL",
                "score": "SENTIMENT_SCORE",
                "mixed": "SENTIMENT_MIXED"
            }
        }
    ]
}

Response Format (RDF):

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                 xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
                 xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
    <rdf:Description rdf:ID="DOCUMENT_HASH">
        <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
        <aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
        <aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
        <aapi:URL>DOCUMENT_URL</aapi:URL>
        <aapi:DocText>DOCUMENT_TEXT</aapi:DocText>
    </rdf:Description>
    <rdf:Description rdf:ID="DOCUMENT_HASH-ENTITY_NUM">
        <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#EntityOccurrences"/>
        <aapi:Doc>DOCUMENT_HASH</aapi:Doc>
        <aapi:EntityType>DETECTED_TYPE</aapi:EntityType>
        <aapi:Relevance>DETECTED_RELEVANCE</aapi:Relevance>
        <aapi:NumOccurs>DETECTED_COUNT</aapi:NumOccurs>
        <aapi:Name>DETECTED_ENTITY</aapi:Name>
        <aapi:Disambiguation>
            <rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
                <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Disambiguation"/>
                <aapi:Doc>DOCUMENT_HASH</aapi:Doc>
                <aapi:ResolvedName>DISAMBIGUATED_ENTITY</aapi:ResolvedName>
                <aapi:SubType>ENTITY_SUBTYPE</aapi:SubType>
                <aapi:URL>WEBSITE</aapi:URL>
                <aapi:Geo>LATITUDE LONGITUDE</aapi:Geo>
                <owl:sameAs rdf:resource="LINKED_DATA_DBPEDIA"/>
                <owl:sameAs rdf:resource="LINKED_DATA_YAGO"/>
                <owl:sameAs rdf:resource="LINKED_DATA_OPENCYC"/>
                <owl:sameAs rdf:resource="LINKED_DATA_UMBEL"/>
                <owl:sameAs rdf:resource="LINKED_DATA_FREEBASE"/>
                <owl:sameAs rdf:resource="LINKED_DATA_FACTBOOK"/>
                <owl:sameAs rdf:resource="LINKED_DATA_CENSUS"/>
                <owl:sameAs rdf:resource="LINKED_DATA_GEONAMES"/>
                <owl:sameAs rdf:resource="LINKED_DATA_MUSICBRAINZ"/>
                <owl:sameAs rdf:resource="LINKED_DATA_CRUNCHBASE"/>
            </rdf:Description>
        </aapi:Disambiguation>
        <aapi:Quotations>
            <rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
                <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Quotations"/>
                <aapi:Doc>DOCUMENT_HASH</aapi:Doc>
                <aapi:Quotation>ENTITY_QUOTATION</aapi:Quotation>
            </rdf:Description>
        </aapi:Quotations>
        <aapi:Sentiment>
            <rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
                <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Sentiment"/>
                <aapi:Doc>DOCUMENT_HASH</aapi:Doc>
                <aapi:SentimentType>SENTIMENT_LABEL</aapi:SentimentType>
                <aapi:SentimentScore>SENTIMENT_SCORE</aapi:SentimentScore>
                <aapi:SentimentMixed>SENTIMENT_MIXED</aapi:SentimentMixed>
            </rdf:Description>
        </aapi:Sentiment>
    </rdf:Description>
</rdf:RDF>

Response Format (REL-TAG Microformat [XML-embedded] ):

<results>
    <status>REQUEST_STATUS</status>
    <language>DOCUMENT_LANGUAGE</language>
    <url>REQUESTED_URL</url>
    <text>DOCUMENT_TEXT</text>
    <microformats>
        <a href="REQUESTED_BASE_URL/DETECTED_ENTITY" rel="tag">DETECTED_ENTITY</a>
        <a href="REQUESTED_BASE_URL/DETECTED_ENTITY" rel="tag">DETECTED_ENTITY</a>
    </microformats>
</results>

Response Format (REL-TAG Microformat [raw] ):

<a href="REQUESTED_BASE_URL/DETECTED_ENTITY" rel="tag">DETECTED_ENTITY</a>
<a href="REQUESTED_BASE_URL/DETECTED_ENTITY" rel="tag">DETECTED_ENTITY</a>

Response Fields:

field name field description
status success / failure status indicating whether the request was processed.

Possible values:
OK
ERROR
language the detected language that the source text was written in.
url http url information was requested for.
type the detected entity type.

Possible values: (click to see list)
relevance relevance score for a detected entity.

Possible values: (0.0 - 1.0)   [1.0 = most relevant]
count number of times an entity was seen within the source web page.
text the detected entity text.
disambiguated disambiguation information for the detected entity (sent only if disambiguation occurred)
disambiguation field field description
name the disambiguated entity name.
subType the disambiguated entity subType

SubTypes expose additional ontological mappings for a detected entity, such as identification of a Person as a Politician or Athlete.
website the disambiguated entity website.
geo latitude longitude

the disambiguated entity geographic coordinates.
dbpedia sameAs link to DBpedia for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
yago sameAs link to YAGO for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
opencyc sameAs link to OpenCyc for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
umbel sameAs link to UMBEL for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
freebase sameAs link to Freebase for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
ciaFactbook sameAs link to the CIA World Factbook for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
census sameAs link to the US Census for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
geonames sameAs link to Geonames for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
musicBrainz sameAs link to MusicBrainz for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
crunchbase website link to CrunchBase for the disambiguated entity.

Note: Provided only for entities that exist in CrunchBase.
semanticCrunchbase sameAs link to Semantic CrunchBase for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
quotations extracted quotations for the detected entity (sent only if quotations extraction is enabled)
field field description
quotation quotation extracted for a particular named entity.
sentiment sentiment for the detected entity (sent only if entity-level sentiment analysis is enabled)
field field description
type sentiment polarity: "positive", "negative", or "neutral"
score sentiment strength (0.0 == neutral)
mixed whether sentiment is mixed (both positive and negative) (1 == mixed)
statusInfo failure status information (sent only if "status" == "ERROR").

Possible values:
invalid-api-key
cannot-retrieve
page-is-not-html

Example Calls:

XML: http://access.alchemyapi.com/calls/...
RDF: http://access.alchemyapi.com/calls/...

API Notes:

  1. Calls to URLGetRankedNamedEntities can be made using HTTP GET or POST.
  2. HTTP POST calls should include the Content-Type header: application/x-www-form-urlencoded
  3. URL retrieval is attempted for a maximum of 10 seconds. Requests taking longer than this will result in a "cannot-retrieve" error response.
  4. Requested HTML documents can be a maximum of 600 kilobytes. Larger documents will result in a "content-exceeds-size-limit" error response.
  5. Language detection is performed on the retrieved document before attempting named entity extraction. A minimum of 15 characters of text must exist within the requested HTTP document to perform language detection.
  6. Documents containing less than 15 characters of text are assumed to be English-language content.
  7. Disambiguation of detected entities is enabled by default. Disambiguation information will be included for each entity that is successfully resolved.
  8. Entity extraction is currently supported for all languages listed on the language support page. Other non-supported language submissions will be rejected and an error response returned.
  9. Enabling entity-level sentiment analysis results in one additional transaction utilized against your daily API limit. Entity-level sentiment analysis is currently provided for both English and German-language content.
  10. Disambiguation and quotations extraction are currently available for English-language content only. Support for other languages is in development.

API Call: URLGetNamedEntities

 - return to top of page - 

Description: The URLGetNamedEntities call is utilized to extract named entities (people, companies, organizations, etc.) from a given web page. AlchemyAPI will download the requested URL, extracting text from the HTML document structure (ignoring navigation links, advertisements, and other undesireable content), and perform entity extraction operations.

Endpoint: http://access.alchemyapi.com/calls/url/URLGetNamedEntities

Parameters:

http argument parameter description
url http url (must be uri-argument encoded)

(required parameter)
apikey your private api key

(required parameter)
outputMode desired API output format

Possible values:
xml (default)
json
rdf

(optional parameter)
disambiguate whether to disambiguate detected entities.

Possible values:
1 - enabled (default)
0 - disabled

(optional parameter)
linkedData whether to include Linked Data content links with disambiguated entities.

Possible values:
1 - enabled (default)
0 - disabled

(optional parameter. disambiguation must be enabled to utilize the linkedData feature.)
coreference whether to resolve he/she/etc coreferences into detected entities.

Possible values:
1 - enabled (default)
0 - disabled

(optional parameter)
quotations whether to enable quotations extraction.

Possible values:
1 - enabled
0 - disabled (default)

(optional parameter)
sourceText where to obtain the text that will be processed by this API call.

AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries.

Possible values:
cleaned_or_raw cleaning enabled, fallback to raw when cleaning produces no text (default)
cleaned operate on 'cleaned' web page text (web page cleaning enabled)
raw operate on raw web page text (web page cleaning disabled)
cquery operate on the results of a visual constraints query

Note: The 'cquery' http argument must also be set to a valid visual constraints query.
xpath operate on the results of an XPath query

Note: The 'xpath' http argument must also be set to a valid XPath query.

(optional parameter)
showSourceText whether to include the original 'source text' the entities were extracted from within the API response.

Possible values:
1 - enabled
0 - disabled (default)

(optional parameter)
cquery a visual constraints query to apply to the web page.

Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.

(optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded)
xpath an XPath query to apply to the web page.

XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.

(optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded)

Response Format (XML):

<results>
    <status>REQUEST_STATUS</status>
    <language>DOCUMENT_LANGUAGE</language>
    <url>REQUESTED_URL</url>
    <text>DOCUMENT_TEXT</text>
    <entities>
        <entity>
            <type>DETECTED_TYPE</type>
            <start>START_POS</start>
            <end>END_POS</end>
            <text>DETECTED_ENTITY</text>
            <disambiguated>
                <name>DISAMBIGUATED_ENTITY</name>
                <subType>ENTITY_SUBTYPE</subType>
                <website>WEBSITE</website>
                <geo>LATITUDE LONGITUDE</geo>
                <dbpedia>LINKED_DATA_DBPEDIA</dbpedia>
                <yago>LINKED_DATA_YAGO</yago>
                <opencyc>LINKED_DATA_OPENCYC</opencyc>
                <umbel>LINKED_DATA_UMBEL</umbel>
                <freebase>LINKED_DATA_FREEBASE</freebase>
                <ciaFactbook>LINKED_DATA_FACTBOOK</ciaFactbook>
                <census>LINKED_DATA_CENSUS</census>
                <geonames>LINKED_DATA_GEONAMES</geonames>
                <musicBrainz>LINKED_DATA_MUSICBRAINZ</musicBrainz>
                <crunchbase>CRUNCHBASE_WEB_LINK</crunchbase>
                <semanticCrunchbase>LINKED_DATA_CRUNCHBASE</semanticCrunchbase>
            </disambiguated>
            <quotations>
                <quotation>ENTITY_QUOTATION</quotation>
            </quotations>
        </entity>
    </entities>
</results>

Response Format (JSON):

{
    "status": "REQUEST_STATUS",
    "language": "DOCUMENT_LANGUAGE",
    "url": "REQUESTED_URL",
    "text": "DOCUMENT_TEXT",
    "entities": [
        "entity": {
            "type": "DETECTED_TYPE",
            "start": "START_POS",
            "end": "END_POS",
            "text": "DETECTED_ENTITY"
            "disambiguated": {
                "name": "DISAMBIGUATED_ENTITY",
                "subType": "ENTITY_SUBTYPE",
                "website": "WEBSITE",
                "geo": "LATITUDE LONGITUDE",
                "dbpedia": "LINKED_DATA_DBPEDIA",
                "yago": "LINKED_DATA_YAGO",
                "opencyc": "LINKED_DATA_OPENCYC",
                "umbel": "LINKED_DATA_UMBEL",
                "freebase": "LINKED_DATA_FREEBASE",
                "ciaFactbook": "LINKED_DATA_FACTBOOK",
                "census": "LINKED_DATA_CENSUS",
                "geonames": "LINKED_DATA_GEONAMES",
                "musicBrainz": "LINKED_DATA_MUSICBRAINZ",
                "crunchbase": "CRUNCHBASE_WEB_LINK",
                "semanticCrunchbase": "LINKED_DATA_CRUNCHBASE"
            },
            "quotations": [
                {
                    "quotation": "ENTITY_QUOTATION"
                }
            ]
        }
    ]
}

Response Format (RDF):

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                 xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
                 xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
    <rdf:Description rdf:ID="DOCUMENT_HASH">
        <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
        <aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
        <aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
        <aapi:URL>DOCUMENT_URL</aapi:URL>
        <aapi:DocText>DOCUMENT_TEXT</aapi:DocText>
    </rdf:Description>
    <rdf:Description rdf:ID="DOCUMENT_HASH-ENTITY_NUM">
        <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#EntityOccurrence"/>
        <aapi:Doc>DOCUMENT_HASH</aapi:Doc>
        <aapi:EntityType>DETECTED_TYPE</aapi:EntityType>
        <aapi:TextStartPos>START_POS</aapi:TextStartPos>
        <aapi:TextEndPos>END_POS</aapi:TextEndPos>
        <aapi:Name>DETECTED_ENTITY</aapi:Name>
        <aapi:Disambiguation>
            <rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
                <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Disambiguation"/>
                <aapi:Doc>DOCUMENT_HASH</aapi:Doc>
                <aapi:ResolvedName>DISAMBIGUATED_ENTITY</aapi:ResolvedName>
                <aapi:SubType>ENTITY_SUBTYPE</aapi:SubType>
                <aapi:URL>WEBSITE</aapi:URL>
                <aapi:Geo>LATITUDE LONGITUDE</aapi:Geo>
                <owl:sameAs rdf:resource="LINKED_DATA_DBPEDIA"/>
                <owl:sameAs rdf:resource="LINKED_DATA_YAGO"/>
                <owl:sameAs rdf:resource="LINKED_DATA_OPENCYC"/>
                <owl:sameAs rdf:resource="LINKED_DATA_UMBEL"/>
                <owl:sameAs rdf:resource="LINKED_DATA_FREEBASE"/>
                <owl:sameAs rdf:resource="LINKED_DATA_FACTBOOK"/>
                <owl:sameAs rdf:resource="LINKED_DATA_CENSUS"/>
                <owl:sameAs rdf:resource="LINKED_DATA_GEONAMES"/>
                <owl:sameAs rdf:resource="LINKED_DATA_MUSICBRAINZ"/>
                <owl:sameAs rdf:resource="LINKED_DATA_CRUNCHBASE"/>
            </rdf:Description>
        </aapi:Disambiguation>
        <aapi:Quotations>
            <rdf:Description rdf:about="#DOCUMENT_HASH-ENTITY_NUM">
                <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Quotations"/>
                <aapi:Doc>DOCUMENT_HASH</aapi:Doc>
                <aapi:Quotation>ENTITY_QUOTATION</aapi:Quotation>
            </rdf:Description>
        </aapi:Quotations>
    </rdf:Description>
</rdf:RDF>

Response Fields:

field name field description
status success / failure status indicating whether the request was processed.

Possible values:
OK
ERROR
url http url information was requested for.
language the detected language that the source text was written in.
type the detected entity type.

Possible values: (click to see list)
start start offset (in bytes) of this entity in the text stream.

end end offset (in bytes) of this entity in the text stream.

text the detected entity text.
disambiguated disambiguation information for the detected entity (sent only if disambiguation occurred)
disambiguation field field description
name the disambiguated entity name.
subType the disambiguated entity subType

SubTypes expose additional ontological mappings for a detected entity, such as identification of a Person as a Politician or Athlete.
website the disambiguated entity website.
geo latitude longitude

the disambiguated entity geographic coordinates.
dbpedia sameAs link to DBpedia for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
yago sameAs link to YAGO for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
opencyc sameAs link to OpenCyc for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
umbel sameAs link to UMBEL for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
freebase sameAs link to Freebase for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
ciaFactbook sameAs link to the CIA World Factbook for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
census sameAs link to the US Census for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
geonames sameAs link to Geonames for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
musicBrainz sameAs link to MusicBrainz for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
crunchbase website link to CrunchBase for the disambiguated entity.

Note: Provided only for entities that exist in CrunchBase.
semanticCrunchbase sameAs link to Semantic CrunchBase for the disambiguated entity.

Note: Provided only for entities that exist in this linked data-set.
quotations extracted quotations for the detected entity (sent only if quotations extraction is enabled)
field field description
quotation quotation extracted for a particular named entity.
statusInfo failure status information (sent only if "status" == "ERROR").

Possible values:
invalid-api-key
cannot-retrieve
page-is-not-html

Example Calls:

XML: http://access.alchemyapi.com/calls/...
RDF: http://access.alchemyapi.com/calls/...

API Notes:

  • 1. Calls to URLGetNamedEntities can be made using HTTP GET or POST.
  • 2. HTTP POST calls should include the Content-Type header: application/x-www-form-urlencoded
  • 3. URL retrieval is attempted for a maximum of 10 seconds. Requests taking longer than this will result in a "cannot-retrieve" error response.
  • 4. Requested HTML documents can be a maximum of 600 kilobytes. Larger documents will result in a "content-exceeds-size-limit" error response.
  • 3. Language detection is performed on the retrieved document before attempting named entity extraction. A minimum of 15 characters of text must exist within the requested HTTP document to perform language detection.
  • 4. Documents containing less than 15 characters of text are assumed to be English-language content.
  • 5. Disambiguation of detected entities is enabled by default. Disambiguation information will be included for each entity that is successfully resolved.
  • 6. Entity extraction is currently supported for all languages listed on the language support page. Other foreign-language submissions will be rejected and an error response returned.
  • 7. Disambiguation and quotations extraction are currently available for English-language content only. Support for other languages is in development.


API Call: URLGetAnnotatedNamedEntityText

 - return to top of page - 

Description: The URLGetAnnotatedNamedEntityText call is utilized to annotate webpage text with detected named entities (people, companies, organizations, etc.). AlchemyAPI will download the requested URL, extracting text from the HTML document structure (ignoring navigation links, advertisements, and other undesireable content), perform entity extraction operations, and annotate the original webpage text according to the provided variable substitution template.

Endpoint: http://access.alchemyapi.com/calls/url/URLGetAnnotatedNamedEntityText

Parameters:

http argument parameter description
url http url (must be uri-argument encoded)

(required parameter)
apikey your private api key

(required parameter)
template the annotation template to apply to the text we're marking up. this template is applied to each detected entity in the source text. Variable substitution is utilized to integrate entity information into the annotated text.

Supported substitution variables:
variable variable description
$ENTITY the detected named entity text
$ENCODED_ENTITY the detected named entity text (uri-argument encoded)
$TRIMMED_ENTITY the detected named entity text (punctuation-trimmed)
$ENCODED_TRIMMED_ENTITY the detected named entity text (punctuation-trimmed, uri-argument encoded)
$RESOLVED_ENTITY the resolved, disambiguated named entity
$ENCODED_RESOLVED_ENTITY the resolved, disambiguated named entity (uri-argument encoded)
$TYPE the detected named entity type
(optional parameter)
outputMode desired API output format

Possible values:
xml (default)
json
rdf

(optional parameter)
disambiguate whether to disambiguate detected entities.

Possible values:
1 - enabled (default)
0 - disabled

(optional parameter)
coreference whether to resolve he/she/etc coreferences into detected entities.

Possible values:
1 - enabled (default)
0 - disabled

(optional parameter)
quotations whether to enable quotations extraction.

Possible values:
1 - enabled
0 - disabled (default)

(optional parameter)
sourceText where to obtain the text that will be processed by this API call.

AlchemyAPI supports multiple modes of text extraction: web page cleaning (removes ads, navigation links, etc.), raw text extraction (processes all web page text, including ads / nav links), visual constraint queries, and XPath queries.

Possible values:
cleaned_or_raw cleaning enabled, fallback to raw when cleaning produces no text (default)
cleaned operate on 'cleaned' web page text (web page cleaning enabled)
raw operate on raw web page text (web page cleaning disabled)
cquery operate on the results of a visual constraints query

Note: The 'cquery' http argument must also be set to a valid visual constraints query.
xpath operate on the results of an XPath query

Note: The 'xpath' http argument must also be set to a valid XPath query.
(optional parameter)
showSourceText whether to include the original 'source text' the entities were extracted from within the API response.

Possible values:
1 - enabled
0 - disabled (default)

(optional parameter)
cquery a visual constraints query to apply to the web page.

Constraint queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.

(optional parameter, used when sourceText is set to 'cquery'. must be uri-argument encoded)
xpath an XPath query to apply to the web page.

XPath queries enable API operations to be performed on a targeted area of a web page, such as a story title or product description.

(optional parameter, used when sourceText is set to 'xpath'. must be uri-argument encoded)

Response Format (XML):

<results>
    <status>REQUEST_STATUS</status>
    <url>REQUESTED_URL</url>
    <language>DETECTED_LANGUAGE</language>
    <text>DOCUMENT_TEXT</text>
    <annotatedText>ANNOTATED_TEXT</annotatedText>
</results>

Response Format (JSON):

{
    "status": "REQUEST_STATUS",
    "url": "REQUESTED_URL",
    "language": "DETECTED_LANGUAGE",
    "text": "DOCUMENT_TEXT",
    "annotatedText": "ANNOTATED_TEXT"
}

Response Format (RDF):

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                 xmlns:aapi="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#"
                 xml:base="http://rdf.alchemyapi.com/rdf/v1/r/response.rdf">
    <rdf:Description rdf:ID="DOCUMENT_HASH">
        <rdf:type rdf:resource="http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#DocInfo"/>
        <aapi:ResultStatus>REQUEST_STATUS</aapi:ResultStatus>
        <aapi:Language>DOCUMENT_LANGUAGE</aapi:Language>
        <aapi:URL>DOCUMENT_URL</aapi:URL>
        <aapi:DocText>DOCUMENT_TEXT</aapi:DocText>
        <aapi:AnnotatedDocText>ANNOTATED_TEXT</aapi:AnnotatedDocText>
    </rdf:Description>
</rdf:RDF>

Response Fields:

field name field description
status success / failure status indicating whether the request was processed.

Possible values:
OK
ERROR
url http url information was requested for.
language the detected language that the source text was written in.
annotatedText the source text annotated with all identified named entities. Text annotation is controlled by the provided template parameter.
statusInfo failure status information (sent only if "status" == "ERROR").

Possible values:
invalid-api-key
cannot-retrieve
page-is-not-html

Example Call:

XML: http://access.alchemyapi.com/calls/...
RDF: http://access.alchemyapi.com/calls/...

API Notes:

  1. Calls to URLGetAnnotatedNamedEntityText can be made using HTTP GET or POST.
  2. HTTP POST calls should include the Content-Type header: application/x-www-form-urlencoded
  3. URL retrieval is attempted for a maximum of 10 seconds. Requests taking longer than this will result in a "cannot-retrieve" error response.
  4. Requested HTML documents can be a maximum of 600 kilobytes. Larger documents will result in a "content-exceeds-size-limit" error response.
  5. Language detection is performed on the retrieved document before attempting text annotation. A minimum of 15 characters of text must exist within the requested HTTP document to perform language detection.
  6. Documents containing less than 15 characters of text are assumed to be English-language content.
  7. Disambiguation of detected entities is enabled by default. Disambiguation information will be made available for variable substitution for each entity that is successfully resolved.
  8. Entity extraction is currently supported for all languages listed on the language support page. Other foreign-language submissions will be rejected and an error response returned.
  9. Disambiguation and quotations extraction are currently available for English-language content only. Support for other languages is in development.