Back to Top

 

Content Scraping

Content Scraping API

AlchemyAPI provides a sophisticated content scraping engine, capable of mining structured data (product attributes, descriptions, pricing, etc.) from any web page. Our Constraint Query technology employs visual constraint-driven document analysis, enabling content extraction that is not dependent upon underlying tag structure and HTML layout changes. Transform any web page into structured data.

AlchemyAPI enables web pages to be mined using simple, natural language queries. Knowledge of HTML, DOM, or other web technologies is not required.









Step Beyond XPath

Document query technologies such as XPath provide significant roadblocks to application developers engaging in structured data extraction from web pages. XPath queries are tied to the underlying structure of a web page, and are not robust to document changes such as HTML layout changes, repositioning of webpage elements, etc. AlchemyAPI's visual constraint engine overcomes these roadblocks, enabling web pages to be queried using their visual, presentation-level characteristics. The AlchemyAPI document query engine is highly robust to HTML document changes and provides advanced heuristic data extraction capabilities not possible using XPath.

API Information

API endpoints are provided for performing content extraction on Internet-accessible URLs and posted HTML files.

Extracted meta-data may be returned in XML, JSON, and RDF formats. More information on API response formats is available.