Back to Top

Text Extraction

Text Extraction

AlchemyAPI provides easy-to-use mechanisms to extract page text and title information from any web page.

A HTML page cleaning facility is provided, which normalizes and cleans HTML content (removing ads, navigation links, and other unimportant content), enabling extraction of only the important article text.

Text Extraction Diagram

API endpoints are provided for performing text/title extraction on Internet-accessible URLs and posted HTML files.

Extracted meta-data may be returned in XML, JSON, and RDF formats. More information on Text Extraction API response formats.

Text Extraction Demo