Back to Top



Q3 Feature Round-Up

Introducing New AlchemyAPI Features and Functionality

Q3 AlchemyAPI Product Updates

As the world's awareness of the challenges that come with growing “piles” of data increases, we’re constantly working on our services to make it easier for businesses to consume all of their emails, Tweets, documents and other sources of information in real-time. Our goal is to help you understand and act on the wants and needs of your customers without spending all of your time and money doing so.

With that, we’re happy to announce the release of several new features and enhancements to the APIs you use today. These are available to API Key holders now.

Below, you’ll find a brief description of each new feature. For additional information, as well as directions on how to implement them, please refer to our Documentation.

1. Spanish Language Sentiment Analysis

With over 405 million native speakers worldwide, Spanish language sentiment analysis was a clear next step for us. At both the document and user-targeted levels, you can determine the attitude, opinion or feeling toward something, such as a person, organization, product or location written in Spanish.

Up next - Keyword and entity targeted Spanish sentiment.

2. Structured Entity Support

Named entity calls now support "structured entities" such as email addresses, Twitter handles, hashtags, IP addresses, dollar amounts and quantities (weights, measurements, etc.).

Previously, our entity system focused on finding the people, companies and places that were being talked about naturally without breaking down certain types of content such as hashtags or known quantities. However, we know that there is a lot of information that can be gathered from this structured data that often resides in unstructured text. With this update, you can obtain additional information that is interesting for your business.

Up next - support for other structured entities such as phone numbers, specific quantities, addresses, and more.

3. Type Hierarchies via the AlchemyAPI Knowledge Graph

The AlchemyAPI knowledge graph is our database that provides detailed information on how everything in the world is related. We employ the surrounding content to identify the pathways for ambiguous terms like “apple.” The knowledge graph helps us understand if that term in a certain text refers to a fruit, tree, flavor of wine, company, Gwyneth Paltrow’s daughter, etc.

We have never exposed these results… until now. With type hierarchies, every keyword essentially has its own taxonomy. When using our Image Tagging, Entity Extraction, Keyword Extraction or Concept Tagging APIs, we will expose the hierarchy most relevant to your content, up to 5 levels deep, so that you can explore the parent and child terms that enhance your results.

For example many advertisers, like AdTheorent, read web pages, profile the pages and try to determine the central topic of each page. They use these categorized pages to determine where to place an ad so that it reaches the intended audience. With text hierarchies, advertisers can now get more context to better target their ads.

In the past when an advertiser processed an article about Apple (the company), the service would have returned “apple.” Now, we also return terms such as “brand” or “software developer.” This helps our advertiser to determine that they should place an ad for Apple laptops on this page as opposed to an ad for apple juice.

4. Link Extraction in Cleaned Web Page Text

When link extraction is turned “on," you can now get a sense of how linked content is related to the original article. This is a great indicator of content that you or your readers may be interested in and can give you additional ideas for pages you can track and analyze in the future.

5. Twitter Hashtag and Username Decomposition

Often times, the sentiment or emotion associated with a Tweet has a lot to do with a hashtag. The simple example below demonstrates how a hashtag can change the meaning and overall sentiment of a phrase. We have also enhanced our language modeling strategies to break apart the hashtag so it can be more accurately classified by our Sentiment Analysis and Taxonomy APIs and give you a good idea of the emotions behind content.

Studying for a test returns a negative sentiment.

This phrase returns a negative sentiment score.

Studying for a test with #goingtorockit returns a positive sentiment.

When you add #goingtorockit, the sentiment changes to positive.

6. New Authors Extraction Endpoint

We are deprecating the existing Author Extraction endpoint and replacing it with a new Authors Extraction endpoint. Instead of returning just the primary author of a given article or text, we will now return all authors listed.

Please note: If you are currently using the Author Extraction API and want to receive results containing multiple authors, you need to update your endpoint. See the documentation for details.

7. Face Detection and Recognition API

We recently announced our new Face Detection and Recognition API, the latest addition to the AlchemyVision product family. When provided an image file or URL (demo), the Face Detection and Recognition API returns the position, age, gender, and, in the case of celebrities, the identity of the people in the photo. Organizations across a variety of industries, such as social media monitoring and advertising, can take advantage of face detection to analyze their unstructured image data. This API provides the ability for applications to glean demographic data from images, which can be useful when analyzing a person’s social media habits or for analyzing which images have the highest return on investment in advertising campaigns.

Will Ferrell and Chad Smith

Were you able to tell actor, Will Ferrell and drummer for the Red Hot Chili Peppers, Chad Smith apart? AlchemyVision can!

8. New Text Extraction Mode: Cleaned+Xpath

For our intermediate to advanced users, we’ve created an xpath mode. AlchemyAPI has its own clean text where we strategically determine the most relevant content on the page (stripping all chrome in the form of headers, footers, ads, etc.). However, some users prefer to have more control over what is or is not included in our results.

The new xpath mode allows you to take advantage of the known organization or layout of a web page. For example on a typical news article, you can take a very educated guess as to where the comments live on the page (the bottom). Typically, our clean text avoids comments, but if you know that you want to gather reader sentiment from those comments just point us to the xpath. “Cleaned” will give you the text from the article and an xpath query will supply the comments for analysis.

9. Enhancements

  • Image Tagging API - With this release, we’ve expanded the results of our Image Tagging API so that you can view and select from additional tags to ensure you have access to the information you want and need.
  • Image Link Extraction API - We've enhanced our Image Link Extraction API to make it more accurate and faster than ever before.
  • Publication Date API - We've enhanced our Publication Date API to make it more accurate and faster than ever before. We have also added a confidence notification to the results to help you further understand your results.

Begin Using These Features Today:

Get Your Free API Key


Part 2 - How Fractl Applies Science to Social Sharing

Part 2 - The Art and Science of Fractl's Publisher Engagement Study

By Sonya Hansen, Marketing Director

Fractl Analyzes Content Sharing

In my first post in this two-part series, I shared an interesting study performed by Sam Deford, Creative Strategist, and his team at Fractl that reveals the relationship between content sharing and readers’ emotional reactions.

Fractl wanted to learn which publishers have become most successful at promoting their articles on social media, and how, in order to better understand the viral potential of content placements. Through their research, Sam and his team discovered patterns showing the types of content that were doing best on certain publications, but needed hard data to confirm their suspicions and guide content strategy.

Using Buzzsumo to retrieve one million shared articles, Fractl ran them through Alchemy’s Sentiment Analysis API. The result of their research became the Publisher Engagement Analysis study, which provides an in-depth look at the content people share most and where they share it.

Of course, the reasoning and impetus for such a study is engaging in itself. But, what is an idea without technique to go along with it? Here is the rest of Sam Deford’s interview where he shares his "recipe" for performing this analysis.

What was your process for incorporating AlchemyAPI? What programming language did you use?

We retrieved one million URLs, article titles and the relevant share stats on each URL from Buzzsumo. I then used Mathematica to fetch a positive/negative sentiment score from Alchemy’s API. Using Mathemetica was primarily a force of habit, since I had become so familiar with the program during my physics and math studies in college. I have since written the JavaScript included in the pastebin embedded below to fetch results from AlchemyAPI in Google Docs spreadsheets.

What were your feelings about the results? Were they accurate?

At first, I did have some skepticism about the validity of the numerical sentiment score being returned by AlchemyAPI. So I did what I encourage anyone else to do: I fed the API approximately 20 articles and compared the sentiment scores returned with my gut feeling about the positive/negative nature of each article.

I was very surprised at the accuracy. I was also interested in how the API worked, so I acquainted myself with much of the documentation that Alchemy offers, which convinced me that the score I was getting was as accurate as possible.

What are your tips for others who are just getting started with analyzing their unstructured data? Pitfalls to avoid? Suggestions on first steps?

I would point out that the greater the amount of text being analyzed, the more accurate the result. Secondly, I would encourage those who are curious to just get started right now. You don’t need to know how to code at all.

I have written a script that can be easily be used in Google Sheets and have you running some sentiment analysis right away. Here are the steps you’ll need to follow to get this set up:

  1. Get an API key from Alchemy.
  2. Create a new sheet in Google Drive.
  3. In this new sheet, go to Tools > Script Editor > Blank Project.
  4. Delete “function myFunction() { }” and paste the contents of the following pastebin:
  5. Paste in your AlchemyAPI key between the quotes in the very first line of the script.
  6. Save your script and name it, close script editor.
  7. In the spreadsheet, you can now use the following functions:

    This function returns an overall positive/negative sentiment score of the text between the quotes. Omit line breaks to ensure all text is analyzed.

    This function returns an overall positive/negative score of the contents of the URL between the quotes.

    This function returns a positive/negative score of the text in between the quotes as it relates to the targeted word/phrase in the second argument. Omit line breaks to ensure all text is analyzed.

    This function returns a positive/negative score of the contents of the URL between the quotes as it relates to the targeted word/phrase in the second argument.

To learn more about targeted vs. overall sentiment, please refer to the following documentation.

What could we do to make your AlchemyAPI experience even better?

I would really love to see sarcasm detection as a service offered, but am skeptical if technology may ever be capable of such a thing. In lieu of that, I would like to see AlchemyAPI create tools that allow the specific dominant emotions evoked by a piece of content to be identified.

Thanks to Sam and the team at Fractl for sharing their approach with us. Do you have a story to tell? An interesting use of technology that could be shared with and emulated by your fellow Alchemists? Let us know on Twitter @AlchemyAPI or email us.

Read Fractl's Publisher Engagement Analysis Study:

Read now

Learn About AlchemyAPI's Approach to Sentiment Analysis:

Get the Whitepaper


How Fractl Applies Science to Social Sharing

Part 1 - The Art and Science of Fractl's Publisher Engagement Study

By Sonya Hansen, Marketing Director

Fractl Analyzes Content Sharing

Content informs. Content convinces. But really effective content also elicits an emotional response that makes you want to share it. So, when I read about a new report from Fractl on one of my favorite blogs, I was intrigued by what they found through researching the relationship between content sharing and sentiment. That they used AlchemyAPI’s Sentiment Analysis API as part of the process was icing on the cake. And I do like cake.

So, I contacted Sam Deford and the team at Fractl to find out more about their work. Graciously, Sam agreed to share his thoughts on the report, their impressions of AlchemyAPI and what you can do to start gauging the visceral impact of the content you so painstakingly craft. He provided so many great insights that I broke it into two parts. This first part will focus on the story and art behind Fractl’s study. The second post will provide Sam’s recipe for implementing this yourself. You’ll want to read that one, I promise!

First, here’s some background. Sam Deford is a Creative Strategist. After graduating from the University of Colorado – Boulder (also my alma mater – Go Buffs!) with a bachelor’s degree in physics and a minor in mathematics, Sam worked at an Internet security software company. He then moved to South Florida to begin work at Fractl. In his first few months, Sam promoted content to the Web’s top publishers, which gave him a solid understanding of what makes the Internet tick. Now, Sam employs his analytical skills, creativity and a range of technical abilities to scour the nooks of the Internet that are ripe for exclusive research and create content that audiences will latch onto and share.

Tell us about your annual report. What gave you the idea? Who is your target audience or reader?

Fractl performs exclusive research and creates interesting content for Internet audiences to enjoy and share. The role of the online publisher is crucial to the success of our campaigns because they are the springboard through which our content reaches audiences across the Web. While publishers provide this important role for us, we provide them with new and relevant stories. The relationships truly are mutually beneficial.

Over time, we started to pick up on patterns showing what types of content were doing best on which publications; but without the hard data to confirm our suspicions, we were left with mere snapshots of insight based primarily on speculation.

Our research was aimed at putting an end to the speculation. The arrival of Buzzsumo, and Buzzsumo’s support by extending us the use of their API, allowed us to retrieve the top one million shared articles from the last six months, from 200 of the Web’s top publishers. We then ran these articles through Alchemy’s Sentiment Analysis API, which gave us numerous insights that we could not have found otherwise.

I believe the resulting analysis of that data set is of interest to many verticals. It was originally driven by our own curiosity but, as we began looking into the data, it became obvious that the online publishing world, online advertisers, content marketers and the very Internet audiences this research looks into could find value in the information.

How did you incorporate AlchemyAPI into your annual report? What features did you use? What information were you trying to gather?

In this report, we incorporated Alchemy’s Sentiment Analysis REST API to get an overall score of the emotional sentiment of the one million most-shared articles. Overall, the top performing content skewed toward the negative end of the sentiment spectrum, but this was not true for all publishers and social networks.

The following chart shows the distribution in emotional sentiment for the top 500 articles on each social network.

Sentiment for top 500 articles on each social network

This chart shows the distribution in emotional sentiment for the top 500 articles on 20 of the Web’s top publishers.

Sentiment for top 500 articles on 20 of web top publishers

What surprised you most about your project?

What surprised me most about our findings was how overly dependent on Facebook sharing online publishers are. The articles we looked at had 2.6 billion shares in total, and Facebook shares made up 2,188,777,030, or 81% of those shares. I was pleased to see that certain publishers like Mashable have a pretty well rounded “sharing landscape” (as can be seen in the dashboard linked above). Overall, I’ve been very intrigued by variations in the nature of the top-shared content on each publication and the overall sharing landscape on each publication.

How did you find out about AlchemyAPI?

I first found out about AlchemyAPI through a colleague who was using it in a client project to determine a sentiment score for a group of Reddit comments. A few months later, when I was working on this research, it occurred to me that AlchemyAPI could be used to provide an additional layer of insight in our study.

Where do you see unstructured data analysis heading in the future? Are there additional use cases you would consider or recommend to others?

Humans communicate and understand things in a manner much differently than computers. Using technology to gain insight from the garbled nonsense of human language is becoming increasingly possible with services such as those offered by AlchemyAPI. In the future, I expect such services to eventually do as good a job as an actual human being and also to be capable of the same range of emotional insight as humans. That is, in the future, I see tools like those AlchemyAPI develops being able to determine where language falls within a full spectrum of human emotions.

In addition, I expect to see tools that are capable of analyzing images, video, and spoken word and performing a similar emotional scoring of them.

Any additional comments?

I'd like to encourage readers to check out the full Publisher Engagement Analysis study.

Stay tuned for Part 2 of this post coming next week. We'll discuss the technical aspects of this project with Sam and provide insight into how Fractl completed their study.

Learn more about Sentiment Analysis:

Read now


What You Can Learn from Analyzing Communications Data

Tweets, and Chats, and Phone Calls... Oh My! Unlocking the Story Behind All That Data.

By Sonya Hansen, Marketing Director

What’s the total volume of communication you had in 2013? That includes all the emails, Facebook messages, SMS, phone calls and physical mail (yes, letters still exist) you received or sent. You probably didn’t track that data, but Nicholas Felton tracked his. According to his 2013 Annual Report, he exchanged thoughts with someone about 95,000 times last year.

Nicholas Felton analyzes his personal data

Nicholas Felton, Information Designer

Felton spends much of his time thinking about data, charts and our daily routines. He is the author of many Personal Annual Reports that weave numerous measurements into a tapestry of graphs, maps and statistics reflecting the year’s activities.

As an Information Designer, Felton’s infographics are as much works of art as they are communication devices. He was one of the lead designers of Facebook’s timeline and the co-founder of His most recent product is Reporter, an iPhone app designed to record and visualize subtle aspects of our lives. His work is part of the permanent collection at MoMA. He has also been profiled by the Wall Street Journal, Wired and the New York Times and recognized as one of the 50 most influential designers in America by Fast Company.

What’s also interesting is that Felton used AlchemyAPI’s services to help process the data he collected over the year. When I learned that he was a fellow Alchemist, I jumped at the chance to find out more about his experience with us. Adding to his volume of conversations for 2014, he was kind enough to respond in this online interview.

Tell us about your annual report. What gave you the idea?

At the end of 2004, I designed a graphic titled “Best of 04” that attempted to encapsulate the year. This graphic included my favorite aspects of the year including several quantified details like the number of postcards sent and air miles traveled.

The following year, I created the first Annual Report out of information drawn from my memory, calendar, photos and data. I segmented these activities from the year into sections like travel, photography, music and books and I thought the result would be interesting primarily to friends and family. Incredibly, the report was as popular among people who had never met me as with those who knew me intimately. The audience for the project now includes people with interests in design, data visualization, storytelling, anthropology and beyond.

How did you incorporate AlchemyAPI into your annual report? What features did you use? What information were you trying to gather?

With the 2013 Annual Report, I attempted to collect and analyze as much of my communication data as possible. This led me to gather nearly 95,000 records including the contents of my SMS, email, Facebook messages, conversation, phone calls and physical mail. I naively thought that I would be able to analyze the contents of these records by hand, to extract items of interest, but quickly realized that I needed to automate the process.

When I discovered AlchemyAPI, I was thrilled to see how the Entity Extraction features intersected with the questions I had for my data. I wanted to discover locations discussed, people mentioned and media that appeared in my communications. I used AlchemyAPI to extract entities from all of my entries and was then able to categorize the content of my communication and filter it by location, participant, medium and time of year.

The results of a year's worth of interaction

A visualization of a year's worth of Felton's communication data.

What was your process for incorporating AlchemyAPI? What programming language did you use?

I rely on MySQL to store the data for my project and used to fetch entries from my database, send them to AlchemyAPI and then parse and save the results to a new database.

How did you find out about AlchemyAPI? What was your experience like? What were your feelings about the results?

AlchemyAPI was recommended to me by Golan Levin at Carnegie Mellon. Golan both recommended the service and provided sample code for processing that allowed me to integrate the Entity Extraction API into my project. Having a tool like AlchemyAPI to parse my communication records was transformative. I wound up with nearly eight million words of content, so classifying them manually would have been an impossible endeavor.

I was both thrilled and frustrated by the results. AlchemyAPI made a Herculean task manageable and automatically categorized my data in ways that I would not have considered. My frustrations arose when I felt that the API was leaping to conclusions and making assumptions; I think it would be more accurate for the software to limit these leaps. For instance, “Uncanny Valley” was returned as a recognized location, when it is a theoretical place. AlchemyAPI would also recognize typos but not provide disambiguation, so “headacheyvkately” was being returned as a health condition. It’s fantastic that these entities are being recognized, but it seems that some human intervention is needed to return unblemished results.

What surprised you most about your project?

I approached the project with the assumption that the content aspects of my communication would be the most descriptive and sensitive portion of the data set. What I was surprised to discover was that the meta-data of my communication was in many ways more descriptive and ultimately much more trustworthy than the payload. This metadata was able to accurately describe my location, social ties and sleep patterns, while the content was filled with the ambiguities that arise from implied context in communication. This vagueness makes it so difficult to know which “Nicholas” is being discussed when only a first name is given or if “Ira Glass” is being mentioned in the context of his radio show or television appearances.

What are your tips for others who are just getting started with analyzing their unstructured data? Pitfalls to avoid? Suggestions on first steps?

There’s no replacement for experience. I would recommend diving in and getting started with the API to test your results. I think that working with a corpus you know intimately (like a favorite book) or something you’ve explored previously will be helpful. By working with data I lived, the surprising results stood out brightly for me and helped me to understand the API output more clearly.

Where do you see unstructured data analysis heading in the future? Are there additional use cases you would consider or recommend to others?

I see text analysis becoming far more nuanced and able to separate hunches from conclusions. I can imagine a version of AlchemyAPI that would ask for validation of important assumptions. In my dataset this would allow for “Nicholas,” “Nicholas Felton,” “Nick” and “Nick Felton” to be merged in a context-sensitive manner.

Any additional information you'd like to add?

I am really excited by the prospects for text analysis of this sort in both my work and that of other data artists. AlchemyAPI uncovered many of the stories locked in my data and I look forward to introducing my students and others to its potential.

Exchanging messages with Nicholas has not only made me think about my personal communications, but it also has me thinking how this kind of analysis can be used for businesses of all types - voice of the customer analysis, support emails or even internal corporate communications. The possibilities are endless.

Get a free API Key and test our text and image analysis services on your own data.

Download now


Waggener Edstrom Teams Up PR Professionals and App Developers

How Application Developers Can Enable Agile, More Impactful Campaigns

Waggener Edstrom Success Story

At Waggener Edstrom, a global integrated communications agency, there are strong bonds between PR professionals, developers and analytics teams. Application developers are brought directly into the PR dialogue, where their skills and imagination are called on to create new tools and platforms for measuring campaign results on a daily or even moment-by-moment basis. These tools and platforms uncover access to data that can be used to direct and modify communications campaigns to make the greatest business impact.

Waggener Edstrom’s story tells how WE Infinity, a data mining and analytics platform, tracks online news coverage, millions of Tweets and other website content in real-time providing account managers the opportunity to measure campaign performance and help their clients modify strategies and tactics as a campaign matures. It’s an agile approach to unstructured data analysis that has helped Waggener Edstrom spur growth in an impressive roster of global enterprises.

“With the rapid pace of today’s communications environment combined with a deluge of data, our clients need to have accurate intelligence that helps them make decisions they can trust. With WE Infinity, we are helping our clients answer two of the most fundamental communications questions: Did I make an impact, and how do I improve my impact going forward,” said Karla Wachter, Waggener Edstrom Communications senior vice president of Insight & Analytics.

WE Infinity started with an internal hackathon, the desire for real-time analysis, and our free API keys. Application developers at Waggener Edstrom used five services from keyword extraction and named entities to concept tagging and author extraction to transform an approach that once was entirely manual – visiting numerous web sites and listening to multiple social channels on a daily basis – to a more than 80% automated process.

“We wanted to crunch near real-time data, and we needed a way to analyze it to determine if it was relevant to our clients,” said David Kohn, Waggener Edstrom vice president of software development. “Our developers quickly built a proof-of-concept that included AlchemyAPI’s services using the free API keys available on their website. That was given to a larger team of developers with the challenge to build their own tools using that platform.”

The data produced about brands, products and issues is only growing in today’s fast paced environment. But data is just data until it is transformed into useful intelligence that helps communications professionals understand their audience, who influences them and where, what messages resonate and what ones don’t. With solutions like WE Infinity, not only is Waggener Edstrom providing near real-time insight into the performance of communications activities, but also what can be done to improve performance going forward.

Read Waggener Edstrom's success story to learn:

  • How application developers transformed a once manual data analysis process to be more than 80% automated.
  • How five AlchemyAPI services integrate directly into the WE Infinity platform to enable near real-time analysis.
  • How they create a bigger bang for every campaign dollar with data-driven decisions and agile campaign management.
  • Read now


Answers to Your Top 7 Questions from the Google Webcast

Answers to Your Top 7 Questions From the Google and AlchemyAPI Webcast

Questions and Answers from Google Webcast

During a recent webcast, Eric Schmidt, Product Manager for Google's Cloud Dataflow and Richard Leavitt, CMO of AlchemyAPI partnered to dig into the analytical pipeline capabilities Google recently unveiled under their Cloud Dataflow services. Eric shared how he analyzed the sentiment of millions of tweets streaming real-time from the World Cup in order to track fan response to events happening on the field. Attendees learned how he calculated average sentiment per team and per match, correlated sentiment with keywords of turning points in the games, and tied it all together with a timeline visualization that lets you track how global fans feel throughout the match.

As always, we received excellent questions for our presenters and couldn't quite get to all of them. Read on to get the answers to your top seven questions on everything from the use of third party services to training algorithms to using similar approaches on text messages.

1. There’s an idea of plugging a third party service into something as big and fast as what Google is handling. How could you get a service like AlchemyAPI operating at the volume and speed you were processing?

Eric: Taking a step back, the pipeline or graph that I built when deployed on Dataflow is deployed across virtual machines to execute each step. One of my steps was to go and call out to Alchemy’s APIs and the translation service. We had to do a little bit of tuning to see how much latency there was with those downstream services. Our experience was that the latency was consistent when we were making calls out to the service. Then we could work backwards and say, “If we want to do 1,000 per second and we have X messages that don’t ultimately make it through then we can plan to only score so many messages and plan the amount of workers we need to call out.” We actually have a primitive in our SDK that helps you throttle the amount of parallelization that you do for an external service. You wouldn’t just stampede over a service like AlchemyAPI.

Richard: AlchemyAPI does support multiple connections up into the thousands so the idea to get many threads or many connections going to process at whatever speed you need is something Alchemy is really robust and able to handle.

2. There’s the ability in AlchemyAPI to target your sentiment at high, medium or low levels. When extracting sentiment, were you using general sentiments of the Tweets or were you targeting specific entities or teams?

Eric: We just used the general sentiment of Tweets. If I had to do it again, I don’t know that we would make a different decision. We felt pretty strongly that we could build a better targeting mechanism for soccer because we were already building up a rich set of inputs to help us understand team composition. We were building rosters and other team information for a different part of the project. So, we felt confident that we could understand the target of the Tweet with the data that we already had.

We built our own targeting algorithm for two reasons. One, resolving targets on soccer tweets is hard unless you have a very specific training model e.g. player names, nick names, country names, supporter phrases, etc. We weren’t getting the right level of targeting. And two, given the terseness of the Tweets we felt we could do this inline more efficiently vs. calling out. Now if I were processing an entire paragraph or blog, I’d use Alchemy’s API to do the targeting.

Richard: Tweets are notoriously short so it can be a challenge to gather sentiment with traditional language tools. Alchemy supports targeting sentiment at multiple levels of a document. You can get the overall sentiment of the complete document, or target sentiment to the specific phrases you care about (a team name, player, etc.), the named entities in the document or all the way down to each keyword and phrase. Before you target sentiment with your own phrases, you should see if our entities and keywords are already recognizing your data types. We do a pretty good job at recognizing these, even for brand new, first-time-seen entities and keywords! You can learn more here.

3. There is an idea that a lot of training goes on when using cloud services. Did you do any training? Or, were you using the “off-the-shelf” AlchemyAPI sentiment analysis feature?

Eric: Time and quality were big factors for us. We tried to do our own training with both our Prediction API and some other open-source APIs. We just weren’t getting the results that we wanted. Now, it was clear that our accuracy improved as we trained with Google’s Prediction API. But as Product Manager, I took a step back and I wondered if somebody else had built a better model. That was how I came across AlchemyAPI. I was doing testing on the same data that we were training on. I was getting better accuracy with AlchemyAPI and I didn’t have to invest time to build the training models.

4. Can the APIs you used be on data streams other than Twitter? Or, are they only for Twitter?

Eric: Yes you can process any type of stream source such as text, binary, relational dataset, etc. Dataflow’s internal data structure for containing data is called a PCollection where the type can be any type you create. PCollection, PCollection, etc. Right now we support basic input primitives like delimited files, row source from BigQuery and text source from Cloud Pub/Sub. In this case, I was ingesting JSON into Pub/Sub and then reading from Pub/Sub and converting JSON to a Tweet object.

Richard: A powerful feature of AlchemyAPI is that we take on the heavy lifting of acquiring, crawling and cleaning text and images directly from web pages or posted HTML. In addition to text and image extraction APIs, you can scrape content from complex pages, extract authors, detect languages, etc.

5. Eric, did you train on Tweets from prior soccer matches? Can AlchemyAPI’s sentiment analysis API be tuned for this particular domain? Discuss customization please?

Eric: We did not train Alchemy’s API. We did several test passes on sets of games to verify accuracy. Alchemy does provide customization options. I will let Richard speak to that. I did do training on Google’s Prediction API, but as mentioned in the talk I was not able to get enough training done in the time budgeted to get the accuracy I wanted, which led us to use AlchemyAPI.

Richard: AlchemyAPI does have specific customization opportunities. Given that we are trying to democratize this technology, our philosophy is to have a general understanding that can be applied to many different problems of extracting meaning for text. If we are going to return keywords or categories – as in, smartphones or football – we try to be general to apply to a broad audience. But customization is extremely important and there are domain specific ideas that we all need to take advantage of.

Contact us if you are working with a specific lexicon or group of terms and need to customize the solution. We are constantly looking at how we can make our systems as flexible as possible and still offer customization to those in niche industries. In particular, our Taxonomy API is able to categorize text into over 1,000 categories going up to five levels deep, such as sports (i.e. kayaking, baseball or cricket) or finance (i.e. mergers and acquisitions, credit cards or vehicle financing). We have seen a lot of success with custom categorization in that area, which allows you to pick your own categories even if they aren’t normally available.

6. Please talk about the Twitter drill down. Are there opportunities to apply other NLP functions like keyword analysis or entity extraction?

Eric: We did consider this. If I was going to continue to evolve this, I would definitely do additional entity analysis. We would get higher fidelity or higher quality data if we used other NLP functions. There is definitely opportunity to gather additional insights. For this particular example, we did not use links. We extracted them and focused primarily on the text of each Tweet.

We considered doing image analysis using AlchemyAPI’s computer vision services. We thought about doing that, as well as influencer analysis. We were tracking certain Twitter handles based on those who are material influencers in the community. They have a more realistic or truthful view of what is happening in a particular match. While we stopped with just text analysis, you could definitely add another sub-workflow to do things like image analysis, influencer analysis or link analysis.

We did build some training models using Google’s Prediction API. We realized that we would have to spend two weeks or more to get the models that would provide a reasonable level of consistency. It would have been fun to customize training models with AlchemyAPI. If I were to extend this, I would invest in it a bit more. Our soccer experts and data scientists helped us use our internal data. You have to decide whether the training work is worth it.

7. So, just how long did this project take?

Eric: I did most of the development work in four weeks. That includes narrowing down scope, the customer process with AlchemyAPI, Twitter, etc. I licensed Twitter data. We spent quite a few cycles going through contract processing.

It took me about a week to get the parsing and basic pipeline built and all the to pieces work. Then, I spent another two weeks focused on honing the algorithms and building the visualizations. After that first week, I was able to prove that the pipeline and overall model worked, which was great because then I could really start focusing on how to answer the questions in a more meaningful way, versus just saying, “I can do sentiment analysis based on translation and third party APIs.”

Richard: For those of us that have been in the industry for a while, that’s what is so astounding. This wasn’t a team of developers or a long-winded project. Machine learning and analyzing unstructured data doesn’t have to be hard to get a lot of meaningful results from it.

AlchemyAPI was designed for speed of results. With 1000 free calls/day, and offered as a web service with built-in text extraction and cleaning for web pages, you really can get started immediately. Plus, stay tuned for the launch of AlchemyData, where we will offer a simple query interface to a treasure trove of public news and blogs - no text or image processing required (we’ll already have done it for you).

Watch the recording to hear more of what Eric and Richard shared during the webinar:

Access the recording

Read the full Q&A to learn more about Eric's approach:

Get the Q&A


AlchemyVision Face Detection and Recognition API Released Today

AlchemyVision Face Detection and Recognition API Released Today

Detects Faces, Recognizes Celebrity Name, Age Range, Gender, and Face Position

By Audrey Klammer, Marketing Director

Today we're announcing the public availability of our Face Detection and Recognition API, the latest addition to the AlchemyVision product family. When provided an image file or URL, the AlchemyVision Face Detection and Recognition API returns the position, age, gender, and, in the case of celebrities, the identity of the people in the photo. Organizations across a variety of industries, such as social media monitoring and advertising, can take advantage of face detection to analyze their unstructured image data. This API provides the ability for applications to glean demographic data from images, which can be useful when analyzing a person’s social media habits or for analyzing which images have the highest return on investment in advertising campaigns.

In addition to the general face detection capabilities, an impressive feature of the AlchemyVision Face Detection and Recognition API is its familiarity with well-known entities. For example, providing the API with an image of a famous politician or Hollywood celebrity allows a user to retrieve identity information, along with several other pieces of metadata: age and gender, type hierarchy information pulled from our knowledge graph, and a slew of linked data (e.g., personal websites, DBpedia links). The face recognition system is capable of identifying 60,000 different celebrities.


Derrick Harris from GigaOM announced our release in a story today, "AlchemyAPI now recognizes famous faces (and can learn yours, too)" and according to him, he "was impressed when Turner showed me it could distinguish actor Will Ferrell from Red Hot Chili Peppers drummer Chad Smith. He also showed identifying politicians Barack Obama, Harry Reid and Nancy Pelosi."

We’ll be following up in the next few weeks with additional information on useful applications of this API and how you can extract business value from visual content.

Try our Face Detection and Recognition Demo for yourself:

Try our Demo

Try our Face Detection and Recognition API out, learn how in our Face Detection Documentation section.

Try our API


Confidence Scores

Tips for Understanding Scores when Tagging Images and Documents

By Devin Harper, AI Research Scientist

The world is constantly uploading photos to the internet. By the same token, news articles and blog posts are written and posted to the web around the clock. Clearly, there is an ever-growing need to have automated systems tag these documents and images for us. Having a way to provide keywords for text and photos without human involvement can be a massive game-changer for someone interested in aggregating relevant topics in news outlets or categorizing very large libraries of pictures.

Here at AlchemyAPI, we have solutions to these problems! Users of AlchemyLanguage and AlchemyVision have already seen the power of quickly and reliably extracting keywords out of paragraphs and/or pixels. But, this begs the question: what do we mean by “reliable?”

For demonstration purposes, let’s consider the image below. What are some good tags (or keywords) for this picture? Don’t overthink this one...

iPhone input to AlchemyVision

If you said “iPhone” and “Apple,” most people would agree with you. As it turns out, AlchemyVision would also agree with you! See the output of our image tagging API for this photo:

AlchemyVision output

JSON output from AlchemyVision’s Image Tagging API. Nailed it.

AlchemyVision takes simple tagging a step further by associating a “confidence score” between 0 and 1 (1 being the most confident) with each tag.

Note how different these two scores are. The confidence of “iphone” is significantly higher than that of “apple.” Why? If we know that this is an iPhone, don’t we also know that it is an Apple product? Shouldn’t we be equally confident in those terms as appropriate tags for this image?

The thing about these scores is that they do not necessarily demonstrate the correctness of a particular tag, but rather they indicate how appropriate the tag is for a given image. In the example above, “iphone” and “apple” are both correct tags for the picture. However, it turns out that “iphone” is actually a better fit, which is why we see such a large difference between these scores.

While there is no silver bullet for selecting the number of terms or tags to associate with your images, you can turn the knobs yourself. Figure out the appropriate thresholds of scores for your image tagging purposes. As a general rule of thumb, a score of 0.9 or higher with AlchemyVision signifies that a tag that is pretty spot-on.

Also, for text analysis features like entity extraction and concept tagging, relevance scores are calculated for each entity or keyword in a document. The relevance score depicts the significance of each unique term, in a similar vein to the confidence scores returned in image tagging. The higher the relevance score, the more important that term to the central meaning of the document.

As tags can be subjective, we recommend familiarizing yourself with the outputs of our analysis engines by test driving the AlchemyLanguage and AlchemyVision demos. If you’d like to run more in-depth testing with some of your images and documents, download a free API Key and get started today.


How Google Does It

Big and Fast Semantic Analytics Pipelines - How Does Google Do It?

By Sonya Hansen, Marketing Director

AlchemyAPI & Google

“Easy to code,” “simple to use” and “Internet-scale” are probably not terms you associate with developing applications that analyze Tweets, chats, emails and other unstructured data coursing through analytics pipelines. But, perhaps we can cause you to reconsider with an example.

Leaning on a decade of data analytics and cloud scalability, Google recently unveiled Cloud Dataflow, an SDK and managed data processing service that can analyze real-time, streaming data flows and batch sets. Dataflow gives app developers the ability to execute semantic analysis of social posts and news using just a few lines of code and API calls.

Part of the reason we’re so excited about Dataflow is because we had a front-row seat when the Google Cloud engineering team paired it with our Sentiment Analysis API during the World Cup. For this project, Dataflow grabbed tweets, converted them into objects, translated them to English, and then used our API to score the positive and negative connotations found in World Cup fans’ tweets. It was fun to see how social media activity directly correlates with real-time events on the field.

Another reason we’re talking about Dataflow is that we are hosting a web session on Thursday, September 18 with Eric Schmidt, a Solutions Architect for the Google Cloud engineering team focused on big data scenarios who led the Dataflow project. Our CMO, Richard Leavitt, will join Eric, to discuss:

  • How you can manage the size and scale of analyzing Twitter streams in real-time
  • How you might correlate Twitter sentiment to real-world events
  • How to extend the data you get from Twitter to extract better decision signals

Recordings of the first two web events in our Deep Learning Series, titled What is Deep Learning AI & How Should You Use It Today? and Artificial Intelligence APIs: Intro for Those Building Smarter Applications, are available today.

What compelled Google to commit Schmidt and his team to create a replacement for 10-year old MapReduce was the need to cope with exabyte-scale data while making it very easy to write pipelines, apply analytics and use the same code for batch and streaming analytics.

While you probably aren’t dealing with data at Google-scale, you probably are working to answer the same types of questions: What do people really think about my company and products? How can we get actionable information from data in real-time, rather than just store it for later? What connections can I make from my raw data to the results to confirm the trends that we see?

I invite you to join us for How Google Does It: Big and Fast Semantic Analytics Pipelines. Of course, if you register but cannot attend, we will follow up with access to the presentation, recording and Q&A.

How Google Does It: Big and Fast Semantic Analytics Pipelines

Date/Time: Thursday, September 18, 2014 at 12pm ET/9am PT
Perfect for: Software and technology leaders who want to perform semantic analysis on real-time social media and news streams at Internet scale.
Presenters: Eric Schmidt, Solutions Architect at Google and Richard Leavitt, CMO at AlchemyAPI

Register now


Q&A with AlchemyVision Research Scientist Devin Harper

AI Research Scientist, Devin Harper, on AlchemyVision: Lessons Learned

By Audrey Klammer, Marketing Director


Elliot Turner, CEO, with Devin Harper and Nicholas Lineback, leading AlchemyVision researchers.

Tell us about the AlchemyVision project. What did the team set out to accomplish?

The goal of the project was to build a large-scale system that could be used reliably to add valuable structured data to unstructured entities in the form of image data. This would provide a balance in the scope of our company, whose primary goal at that point was adding structured data to unstructured text.

At the outset, the team knew that, eventually, the vision project should be able to extract accurate knowledge from the potentially billions of images that make up the world's growing corpus of data.

What were some of the AlchemyVision project outcomes?

Our first metric of success was to measure up with a seminal paper published by Hinton’s group from 2012. The team reached that goal rather quickly. From that point on, it was a new frontier as there were no other groups who had moved past that point. After we started to see the release of other competing products, we made surpassing those products’ results our goal. We’re proud to say that the AlchemyVision system can accurately tag a wider range of images than these other competing systems. 

We’ve had positive feedback from several customers, such as Simply Measured, whose CPO and Co-Founder, Aviel Ginzburg, is excited to use AlchemyVision in its product.

"As shoppers become increasingly comfortable making their purchases online, major brands are driving a large amount of sales through eCommerce offerings," said Aviel Ginzburg, CPO and Co-Founder of Simply Measured. "It's important that we provide brands a way to track which campaigns resonate and drive action online. The ability to track and measure everything in a campaign, including the images used, gives brands a competitive advantage when targeting customers and driving sales. With AlchemyVision, we have been able to accurately tag and classify a good portion of images at very high rates with minimal human effort."

AlchemyVision now has several customers, including CamFind, who has seen response rates as fast as a second when querying our API. They use our service to provide the first-response tagging results for their image recognition app. 

What was one of the major lessons you learned during the AlchemyVision project?

KISS, or “keep it simple, stupid,” was a consistent theme in many stages of development of AlchemyVision. The simplest idea usually ended up being the best. Some approaches seemed like they were too basic to work, but in the end, they were the optimal solutions to our problems. In the course of a brainstorming session, the team would come up with an idea that was so simple it appeared to trivialize the problem. We thought that there had to be a more elegant solution. But when we tried these, most often we’d end up back at the first idea, refining it to make it more robust.

Learn more about AlchemyVision with this webinar:

Topic Artificial Intelligence APIs: An Intro for Developers Who Must Build Smarter Applications
Perfect for: Developers, programmers, engineers & hackers getting started with AI
Presenter: Devin Harper, AI Research Scientist

Watch now



Subscribe to Blog