Back to Top

 

AlchemyCmd

AlchemyCmd

Command line text analysis tool for Linux/Unix

AlchemyCmd? What's that?

AlchemyCmd is a command-line tool for performing natural language processing and text analysis on Linux/Unix systems.

What Does It Do?

AlchemyCmd enables you to perform named entity extraction, concept tagging, keyword extraction, language detection, and structured content extraction directly from the command line. This tool can process HTML or text content on your local filesystem, crawl Internet-hosted web pages, and process content from standard input (stdin). This makes it easy to construct shell scripts and UNIX command pipes that leverage natural language processing.

Download AlchemyCmd:

Download the latest version of AlchemyCmd here: AlchemyCmd-1.2.tar.gz

AlchemyCmd utilizes the AlchemyAPI C/C++ SDK

Questions, comments, or feature requests? Contact us

OK, How Do I Use It?

AlchemyCmd provides a wide variety of NLP and content retrieval options. You may see the available command-line options by issuing the following command:


alchemycmd --help

To perform concept tagging on an Internet-hosted web page, you may issue the following command:

alchemycmd --mode concept -S web -U "http://www.cnn.com/2010/US/07/13/steinbrenner.obit/index.html?hpt=T1"

The above command would return concept tagging results in a simplified, comma separated value (CSV) format:


New York Yankees,0.956409

Major League Baseball,0.62449

George Steinbrenner,0.548511

Billy Martin,0.473525

Yankee Stadium,0.457922

Reggie Jackson,0.423929

Derek Jeter,0.421396

YES Network,0.413723

Output in XML format may also be retrieved using the "--output-mode xml" command-line option.

How Do I Construct Command Pipes?

AlchemyCmd makes it easy to construct UNIX command pipes for performing complex data manipulations, through its support for direct file-system access, Internet URL retrieval, and processing content directly from standard input (stdin). Multiple AlchemyCmd operations can even be
chained together to allow for complex site-wide Semantic crawling operations.

In the following command, we illustrate a multi-step UNIX command pipe:

alchemycmd -S web -M cquery -AC "1st 10 links after 'latest news'" -U http://www.cnn.com/ |
sed 's/.*,http/http/' | xargs -n 1 alchemycmd -S web -M entity -APU -U

This series of commands performs the following operations:

  • Retrieves http://www.cnn.com/
  • Identifies the top 10 article links in the "Latest News" section.
  • Crawls each of the 10 article links
  • Performs named entity extraction on all 10 CNN articles

The result of this UNIX command pipe is the following output:


http://www.cnn.com/2010/CRIME/07/13/ohio.execution/index.html?hpt=T2,William Garner,Person,0.735041

http://www.cnn.com/2010/CRIME/07/13/ohio.execution/index.html?hpt=T2,Ohio,StateOrCounty,0.474604

http://www.cnn.com/2010/CRIME/07/13/ohio.execution/index.html?hpt=T2,Addie Mack,Person,0.465454

http://www.cnn.com/2010/CRIME/07/13/ohio.execution/index.html?hpt=T2,CNN,Company,0.412148

http://www.cnn.com/2010/CRIME/07/13/ohio.execution/index.html?hpt=T2,Cincinnati,City,0.381326

http://www.cnn.com/2010/CRIME/07/13/ohio.execution/index.html?hpt=T2,smoke inhalation,HealthCondition,0.347447

...

How Do I Process Content From STDIN?

AlchemyCmd can process text or HTML content directly from standard input (stdin), making it easy to utilize natural language processing in a Unix/Linux pipes / shell scripting environment.

In the following command, we illustrate performing keyword extraction on the first 100 lines of a text file, using standard input:


head -n 100 /some/file.txt | alchemycmd -S text -M keyword -R stdin

What Command-Line Options Are Available?

AlchemyCmd supports a variety of command-line options for tuning natural language processing operations:

Usage: alchemycmd [OPTION]...
Examples:
  alchemycmd --mode concept -S web --source-url http://some.url.com/
  alchemycmd --mode entity -S html --source-file /tmp/some-file.html
  alchemycmd --mode keyword -S text --source-file /tmp/some-file.txt
  alchemycmd --mode category -S web --source-url http://some.url.com/
  alchemycmd --mode language -S html --source-file /tmp/some-file.html
  alchemycmd --mode cquery -S html -F /tmp/some-file.html -AC '1st 3 links'

Required:
  -M, --mode TYPE          Select the NLP operation type:
                                  concept  (concept tagging)
                                   entity  (named entity extraction)
                                  keyword  (keyword extraction)
                                  cateory  (text categorization)
                                 language  (language detection)
                                   cquery  (constraint query)
  -S, --source-type TYPE       Select the source content type:
                                      web  (process a WWW resource)
                                     html  (process a local HTML file)
                                     text  (process a local text file)
  -F, --source-file PATH       Process a text or HTML file
                                   (if --source-type = 'text' or 'html')
  -U, --source-url URL         Specify a source web URL
                                   (if --source-type = 'web')
Miscellaneous:
  -H, --help                   Print command usage
  -V, --verbose                Enable verbose mode
  -K, --key-location PATH      Specify a custom API key location
  -O, --output-mode MODE       Output formatting mode:
                                   simple  (simple comma-delimited output)
                                      xml  (raw XML output
  -R, --read-from FROM         Read content from: file, stdin

HTML Targeting Options (for --source-type = 'html' or 'web'):
  -AT, --api-source-text MODE  Set a HTML page targetting mode:
                                  cleaned  (use HTML content cleaning)
                                      raw  (process entire HTML content)
                                    xpath  (use an XPath query)
                                   cquery  (use a HTML constraint query)
                           cleaned_or_raw  (use HTML content cleaning,
                                            falling back to 'raw' mode)
  -AC, --api-cquery QUERY      Set a constraint query
                                   (see http://www.alchemyapi.com/api/scrape/)
  -AX, --api-xpath XPATH       Set an XPath expression

Other API Options:
  -AM, --api-max-retrieve VAL  Specify the max # of results to retrieve
  -APU, --api-print-url        Enable printing of URLs in simple responses