Named entities specify things such as persons, places and organizations. AlchemyAPI's named entity extraction is capable of identifying people, companies, organizations, cities, geographic features and other typed entities from your HTML, text or web-based content.
Entity extraction can add a wealth of semantic knowledge to your content to help you quickly understand the subject of the text. It is one of the most common starting points for using natural language processing techniques to enrich your content.
AlchemyAPI's named entity extraction is based on sophisticated statistical algorithms and natural language processing technology. It is unique in the industry with its combination of multilingual support, linked data, context-sensitive entity disambiguation, comprehensive type support and quotations extraction.
Example entity extraction from a TechCrunch article.
One year ago, several hours before cities across the United States started their annual fireworks displays, a different type of fireworks were set off at the European Center for Nuclear Research (CERN) in Switzerland. At 9:00 a.m., physicists announced to the world that they had found something they had been searching for for nearly 50 years: the elusive Higgs boson. Today, on the anniversary of its discovery, are we any closer to figuring out what that particle's true identity is? The Higgs boson is popularly referred to as "the God particle," perhaps because of its role in giving other particles their mass. However, it's not the boson itself that gives mass. Back in 1964, Peter Higgs proposed a theory that described a universal field (similar to an electric or a magnetic field) that particles interacted with.
|Peter Higgs||0.98893||positive||Person||DBpedia | Yago|
|European Center for Nuclear Research||0.69407||neutral||Organization||Website | Lat:46.23,Lon:6.06 | DBpedia | Yago | OpenCyc | GeoNames|
|United States||0.461032||neutral||Country||Website | DBpedia | Yago | OpenCyc | CIA Factbook|
|Switzerland||0.445847||neutral||Country||Website | Lat:46.83,Lon:8.33 | DBpedia | Yago | OpenCyc | CIA Factbook|
AlchemyAPI provides the ability to extract entity-level sentiment (positive or negative statements). Using sentiment analysis can help identity the content that refers to an entity in a positive or negative manner.
The entity extraction API can resolve the coreferences (i.e. he, she, the company, etc.) into detected entities. AlchemyAPI's powerful technology understands pronouns and the specific entities they link to.
An entity type describes what an entity is, such as a person, a city or a company. Additionally, many entities have sub-types that provide further description for disambiguation. For example, an entity can be a person, but it can also be a musical artist or an author. AlchemyAPI is capable of identifying hundreds of entity types and sub-types, and a full list is located here: Supported Entity Types
Content can be ambiguous because human language is not exact. Is it Michael Jackson the pop star? Or is it Michael Jackson the writer? When a potentially ambiguous entity is found, the surrounding text is examined for contextual cues. Complex statistics and big data are used to determine which entity is likely correct. AlchemyAPI identifies the correct Michael Jackson by looking for cues on his career, where he's located, notable achievements, etc.
The associated linked data for each disambiguated entity is included in the response. Use linked data to access additional semantic information and further enhance your content. Learn more about AlchemyAPI's linked data support.
AlchemyAPI is able to identify quotations and link them to a specific entity. For instance, if the text contains Bob said, "this is a quotation," Bob would be identified as an entity and his quote would be linked to him. Note: quotation extraction is currently available for English and French languages only.
The entity extraction API can return either JSON, XML or RDF formatted data. The response formats are designed to be flexible to meet the needs of your application.
AlchemyAPI supports named entity extraction for content written in the following 8 languages: English, French, German, Italian, Portuguese, Russian, Spanish, and Swedish. These are the native languages for more than 1.3 billion people.