Back to Top

 

Blog

The Truth About Natural Language Processing: Myth #1

The Whole Truth & Nothing But the Truth: The First of 4 Myths Debunked

The Truth About Natural Language Processing

Customer feedback, competitor information, legal filings, press releases and other data intended for human consumption contain valuable information for an organization. But, it is nearly impossible to employ enough humans to read, comprehend and share all of it with economic practicality, speed or accuracy.

Perhaps that’s why Gartner estimates that more than 85% of Fortune 500 companies will fail to effectively utilize unstructured data to their competitive advantage by 2015. And if the world’s largest organizations struggle with this problem, we’re sure that everyone else does too. But, the good news is that there’s a solution – natural language processing.

Natural language processing (NLP) is a field of computer science in which a neural network of algorithms digests raw text and extracts critical knowledge for businesses to use for competitive advantage. In plain English, it’s a really smart way to make sense of all of the email, chat, social and other text inundating your business. A wide range of industries experience its benefits already (from advertising to social media monitoring and public relations to sales intelligence), and more explore it everyday.

Gone are the days where natural language processing and machine learning were reserved for computer scientists and big-budget organizations. With the rise of easy-to-use, cloud services, it’s now economic and practical.

As with any game-changing technology, misconceptions abound. In this series, we’ll expose and destroy four of the most pervasive myths surrounding NLP.

Myth #1: I have to teach and train complex algorithms

NLP Data Cloud

For NLP to work, algorithms are developed that recognize the who, what, when, and where in unstructured content. Traditionally, these systems have been trained through painstaking human annotation. This method is extremely tedious and often inefficient. A misconception exists that this manual, human labeling is the sole way to teach and train algorithms. The truth is that this arduous task is eliminated with unsupervised learning. Instead, advanced systems are automatically trained based on models that become increasingly knowledgeable with more use.

“The data that was once intended exclusively for human consumption can now be understood by machines that have been taught to process that information,” shares Aaron Chavez, Chief Scientist at AlchemyAPI.

It can be incredibly exciting to watch a computer mimic human behavior and return results from data that would take a human weeks or months to read. Leading companies realize myths for what they are and choose cloud NLP solutions that help better target consumers and predict their behavior.

Stay tuned for our second post in the series where we'll debunk the myth: “I can use open source systems for my business.”

For a comprehensive look at the four myths, download the NLP Myths Ebook

Get NLP Myths Ebook Now

Share/Save

The Next Generation of Sales Intelligence

Spiderbook Redefines CRM, Creates 10x More Accurate Customer Relationship Predictor

Spiderbook Success Story

Top-performing sales people spend a lot of time gathering information to get to know their prospects and their prospects’ businesses. They carry out background research - on Linkedin, Twitter, community forums, company websites, news articles and the list goes on - to understand the company, the department, and the people they hope to build a relationship with. Many use CRM (customer relationship management) tools to handle the routine tasks associated with the sales process.

Unfortunately, while CRM solutions are good for tracking the progress of a sale, they are inept when it comes to actually help close the deal. Even if a sales rep can adequately manage all of their tasks, there is still too much content for one person to digest and use. But, what if they had a system that automatically processed all of the deal-closing business intelligence and served it up in an easy-to-use interface?

Spiderbook, a start-up headquartered in San Francisco, was founded by Aman Naimat and Alan Fletcher to solve those problems. If the adoption rate for their service is any indication, all signs point to a rousing success.

Aman and his team of three fellow NLP developers built SpiderGraph, which uses AlchemyAPI’s Keyword Extraction, Entity Extraction and Language Detection REST APIs to forge business intelligence based on everything from the public-facing records like press releases, websites, blogs, PR and digital marketing content to private business profiles accessed through partnerships with data services providers.

“We go beyond traditional CRM by using natural language processing and named entity recognition to understand businesses,” Aman explains. “We are curious to know how they partner, details on acquisitions, the products they sell, branding, SEC listings and even the types of resources that they look for in job posts."

Spiderbook’s story describes how the team at Spiderbook is seeking to change the way sales people “connect the dots among companies, people, partners, products and documents.”

Read Spiderbook's success story to learn:

  • How Spiderbook can predict mutually beneficial customer relationships with 10x more accuracy.
  • Why Spiderbook chose AlchemyAPI after comparing all of the available NLP and NER services.
  • How NLP gives Spiderbook the ability to regularly analyze over 750 terabytes of data and pass relevant knowledge users.
  • Read now

Share/Save

Deep Learning Gets Spooky

Halloween at "AlchemummyAPI"

By Mackenzie Sedlak, Marketing Intern

Members of the Alchemy team in their Halloween costumes.

What were a cow, Charlie Chaplin, and an old-time bartender doing at AlchemyAPI? They were gathered to celebrate Halloween, of course! We said “boo” to all the critics that claim that this holiday is “just for kids,” and got our fill of sugary fun. Decked out in costumes, our team proves that scientists, engineers, sales reps and marketers can enjoy Halloween with the best of ‘em.

With our Q3 release and Face Recognition API launch behind us, we figured it was time for a little celebration. Before we knew it, our “Boos and Brews” Halloween bash was born. What costumes would we see? The secrecy was killing us!

Alchemy's Puppy-Crocodile on Halloween

On October 31, everyone gathered around the company tennis table dressed from head-to-toe in creative costumes. Mario, Walter White, Bruce Willis and the Crocodile Hunter (with a cute puppy croc in tow) all made appearances for our costume contest. Only after a hilarious catwalk tiebreaker, prizes were awarded for the scariest costume (Josh impersonating another Alchemist, Devin), funniest costume (Jake as William Wallace from Braveheart) and the best overall costume (Nicholas as Walter from The Big Lebowski).

While our team remains focused on creating the best platform possible, we can always find some time to let loose and have a little fun. With the holiday season upon us, let the festivities begin!

AlchemyAPI's Team dressed in their best costumes

AlchemyAPI's Team of Ghouls and Goblins on Halloween

Share/Save

Interactive Text Mining

How One UX Designer Built Crawling for Context: An Interactive Text Mining Application

By Sonya Hansen, Marketing Director

A lot of buzzwords pass through everyday business conversations. You may even call “buzzword” -- a buzzword. From “synergy” to “move the needle” to “bleeding edge,” the shelf life of most buzzwords is fairly short. But they exist for a reason. Even after jargon fizzles out and new terms take over, the meaning endures. A present-day example that we see often is “actionable data”. It's one that I hear often from our users as well as my colleagues. It came about around the same time that “Big Data” became pervasive.

What makes data actionable? Our CEO, Elliot Turner would say that it's the algorithms developers use to translate data and build smarter applications. Signal Noise, an information design agency in London that specializes in allowing “people to make sense of an increasingly data-driven world” has a similar position. Recently, they challenged people to create “static, motion and interactive visualizations which explain how algorithms work and reveal the hidden systems and processes that make the modern world tick.”

Crawling for Context

Visuals Reflect the Emotional Context of Songs

Signal Noise’s campaign caught the eye of Matt Turnbull, a UX/UI designer from London. In addition to being part of the team that designed the Nokia S 40 Full Touch Phone Interface, he has used his design skills and self-proclaimed, hobbyist-level programming capabilities to develop a generative music visualizer. His visualizer creates an identical representation each time it "hears" a song, with the colors reflecting the emotional context, as you can see in the example to the right.

Matt shared how he learned about text mining and AlchemyAPI. “Initially, Signal Noise’s introduction to their exhibition had me intrigued: ‘Almost every part of our lives, from medicine to music, is now shaped, informed or controlled in some way by algorithms. They have become one of the most powerful forces shaping the 21st century, but remain invisible and impenetrable to all but a few.’ After a bit of research, I discovered text mining algorithms. They’re the unsung heroes of the Internet. What particularly fascinated me is discovering how these algorithms actually read, an act that’s typically very human.” A contact at Signal Noise recommended Alchemy's REST APIs, and Matt was soon equipped with an API Key and the algorithms he needed to build his app.

The Many Faces of Natural Language Processing

Matt’s interactive application goes beyond showing how algorithms read by applying natural language processing to reflect sentiment and meaning while also returning named entities (people, places, etc.). His data visualization application employs these services…

...to analyze some well-known speeches from Martin Luther King, Jr., John F. Kennedy and Travis Bickel (a character from the movie Taxi Driver). In one example, Matt feeds Dr. King’s I Have a Dream speech into his application, which results in:

Martin Luther King Jr.'s I Have a Dream speech

This shows a visualization of a fragment of Martin Luther King Jr.'s I Have a Dream speech.

The application also generates a sentiment graph, which shows both positive and negative context:

A visualization of the sentiment found in Dr. King's speech fragment

This chart displays the sentiment results of this portion of Dr. King's I Have a Dream speech.

JFK’s speech announcing the U.S.’s commitment to send astronauts to the moon reveals even more positive sentiment, shown both in the representation of his face and the associated graph:

Visualization of JFK's speech.

This displays the portion of JFK's speech that was analyzed.

Visualization of JFK's speech.

A visual representation of the output from JFK's speech.

He also chose to analyze a very emotionally charged chunk of dialogue from the movie Taxi Driver, the famous scene in which Bickle says, “You talkin’ to me? You talkin’ to me? Then who the hell else?” That analysis gives us:

Visualization of the Taxi Driver character, Travis Bickle's memorable speech.

The above charts displays Travis Bickle's memorable speech in the movie Taxi Driver.

Visualization of the sentiment found in Taxi Driver character, Travis Bickle's memorable speech.

The above charts displays the sentiment of Travis Bickle's speech.

A UX Developer's Experience

“Once I had the AlchemyAPI Language SDK in the right place, connecting to it was blissfully easy,” Matt remarks. I asked him what surprised him most about his application. “The penny dropped when I sent Alchemy’s API a news article and it returned all of the people and places within the article before I’d finished the first sentence! In a world where roughly 80 percent of information is stored as text, the speed and effectiveness of Alchemy Language could be put to some amazing uses.”

One of those uses was to fascinate attendees at the 2014 London Design Festival. You see, it was there that Matt’s app, called Crawling for Context, was included among other projects showcasing how algorithms worked. Of course, he was curious to find out how people would react. “A lot of people stood there seemingly dumbfounded,” he remembers. “In hindsight, I could have done more to explain exactly what was going on! From those that did get it, the feedback was great. Thanks to Alchemy’s quick response time, the final exhibition piece allowed people to tweet to a hashtag and have their tweets analyzed on screen live and in front of them. When people realized this, the play factor became great. Tweets ranged from news article excerpts to ‘happy birthday so-and-so’ and included ‘David Hasselhoff’ and ‘get me a beer’.”

2014 London Design Festival

Turnbull's app in action at the 2014 London Design Festival.

See a video of Matt's application in action.To learn more about Matt’s work, go to mattturnbulldesign.com. Also, Matt thanks Signal Noise and the London Design Festival for their support.

A big thanks to Matt for letting us tell his story and for showing that there's a human quality to the technology of algorithms. In fact, we'll discuss that idea in future blogs featuring Elliot Turner. Stay tuned!

Watch the on-demand web session, Artificial Intelligence APIs, to learn tools for building apps and visualizing results.

Download webinar now

Share/Save

Data Visualization Made Easy

Tweet Sentiment Visualization Using Maltego

By Paul Richards, Developer at Paterva

Recently AlchemyAPI, one of the primary resources that Paterva uses to analyze sentiment, asked us to share how and why we use AlchemyAPI within our tool and why we are so excited about it. As the developer of these transforms (taking one piece of information to another with a small piece of code), I will briefly describe our use and how we got to where we are now.

For those unfamiliar with Maltego, it is a data visualization and mining tool that allows you to quickly and easily mine for data as well as see the correlation between different pieces of information to visually gain intelligence. In Maltego, pieces of information, known as ‘entities’, are used to mine for additional pieces of information that link to the original.

A practical example of how Maltego is used to find links between groups of information is below. Imagine you want to find a common link between three popular brands such as Nike, Puma and Adidas. You could start with Twitter Affiliation entities of each of these brands and run a transform that returns Twitter users that have tweeted about one of the brands. The resulting graph, shown below, makes it easy to identify Twitter users that have tweeted about one or more of these brands (the nodes located in between two or more clusters). As you can quickly tell from the graph, there is one user in the middle of the graph who tweeted to all three brands making them a possible person of interest.

Analysis of Twitter Activity on 3 Brands

This chart displays all Twitter users who have tweeted about the three brands.

Sentiment analysis is the use of natural language processing (NLP) to extract the attitude/opinion of a writer towards a specific topic. With the overwhelming amount of data being posted on the Internet every day with no way for a human to read it all, sentiment analysis is valuable for extracting and aggregating opinions from many sources on a specific topic.

There are many sentiment analysis APIs out there to choose from and it was difficult to decide which one would work best within Maltego. After much experimentation, I found that Alchemy’s sentiment analysis API was one of the most accurate out of all the APIs tested.

Combining Alchemy’s sentiment analysis API with Maltego’s visualization capabilities gives an analyst a powerful tool for graphically depicting opinions on specific topics. The transform that we built takes a Tweet as its input and returns either a positive, neutral or negative entity. In this way, a large amount of Tweets can be quickly and accurately categorized according to their sentiment. There is a wide range of potential uses for this transform ranging from brand reputation monitoring, market research, customer reactions to product launches and stock market monitoring to gauging opinions towards political parties, governments or countries.

Visualization of Positive and Negative Sentiment

This shows a visualization of positive and negative sentiment.

Maltego also allows you to build machines which automate the process of running multiple transforms – essentially allowing you to create a macro of tasks that are commonly run sequentially. This allows continuous monitoring of a topic by running a group of transforms at a set time interval and automatically updating your graph every time it runs. We built a new machine named Twitter Analyser to use with the new sentiment analysis transform. This machine takes a specific phrase in as its input and searches Twitter for Tweets with this phrase. From these Tweets, hashtags, links, sentiment and uncommon words are extracted as children of the originals. Maltego has multiple ways of visually representing the data. In this case, I used the ‘bubble view’, which sizes the entities according to the number of incoming tweets. This makes it much easier to see commonalities across Tweets.

Visualization of Twitter Analysis of a Phrase

The Maltego graph above shows an example of using Twitter Analyser on the phrase ‘YesScotland’.

This graph allows you to easily identify groups of Tweets with the same sentiment, common URLs, hashtags and interesting words. It automatically updates the graph every five minutes by getting new Tweets posted by users.

This is just one example of the many cases in which sentiment analysis is being used to monitor social networks. The vast amount of information being posted on the internet every hour makes sentiment analysis a vital tool to monitoring public opinions on specific topics.

As always, enjoy responsibly!

Read more about sentiment analysis in Sentiment Analysis with AlchemyAPI: A Hybrid Approach

Get the free white paper

Share/Save

Q3 Feature Round-Up

Introducing New AlchemyAPI Features and Functionality

Q3 AlchemyAPI Product Updates

As the world's awareness of the challenges that come with growing “piles” of data increases, we’re constantly working on our services to make it easier for businesses to consume all of their emails, Tweets, documents and other sources of information in real-time. Our goal is to help you understand and act on the wants and needs of your customers without spending all of your time and money doing so.

With that, we’re happy to announce the release of several new features and enhancements to the APIs you use today. These are available to API Key holders now.

Below, you’ll find a brief description of each new feature. For additional information, as well as directions on how to implement them, please refer to our Documentation.

1. Spanish Language Sentiment Analysis

With over 405 million native speakers worldwide, Spanish language sentiment analysis was a clear next step for us. At both the document and user-targeted levels, you can determine the attitude, opinion or feeling toward something, such as a person, organization, product or location written in Spanish.

Up next - Keyword and entity targeted Spanish sentiment.

2. Structured Entity Support

Named entity calls now support "structured entities" such as email addresses, Twitter handles, hashtags, IP addresses, dollar amounts and quantities (weights, measurements, etc.).

Previously, our entity system focused on finding the people, companies and places that were being talked about naturally without breaking down certain types of content such as hashtags or known quantities. However, we know that there is a lot of information that can be gathered from this structured data that often resides in unstructured text. With this update, you can obtain additional information that is interesting for your business.

Up next - support for other structured entities such as phone numbers, specific quantities, addresses, and more.

3. Type Hierarchies via the AlchemyAPI Knowledge Graph

The AlchemyAPI knowledge graph is our database that provides detailed information on how everything in the world is related. We employ the surrounding content to identify the pathways for ambiguous terms like “apple.” The knowledge graph helps us understand if that term in a certain text refers to a fruit, tree, flavor of wine, company, Gwyneth Paltrow’s daughter, etc.

We have never exposed these results… until now. With type hierarchies, every keyword essentially has its own taxonomy. When using our Image Tagging, Entity Extraction, Keyword Extraction or Concept Tagging APIs, we will expose the hierarchy most relevant to your content, up to 5 levels deep, so that you can explore the parent and child terms that enhance your results.

For example many advertisers, like AdTheorent, read web pages, profile the pages and try to determine the central topic of each page. They use these categorized pages to determine where to place an ad so that it reaches the intended audience. With text hierarchies, advertisers can now get more context to better target their ads.

In the past when an advertiser processed an article about Apple (the company), the service would have returned “apple.” Now, we also return terms such as “brand” or “software developer.” This helps our advertiser to determine that they should place an ad for Apple laptops on this page as opposed to an ad for apple juice.

4. Link Extraction in Cleaned Web Page Text

When link extraction is turned “on," you can now get a sense of how linked content is related to the original article. This is a great indicator of content that you or your readers may be interested in and can give you additional ideas for pages you can track and analyze in the future.

5. Twitter Hashtag and Username Decomposition

Often times, the sentiment or emotion associated with a Tweet has a lot to do with a hashtag. The simple example below demonstrates how a hashtag can change the meaning and overall sentiment of a phrase. We have also enhanced our language modeling strategies to break apart the hashtag so it can be more accurately classified by our Sentiment Analysis and Taxonomy APIs and give you a good idea of the emotions behind content.

Studying for a test returns a negative sentiment.

This phrase returns a negative sentiment score.

Studying for a test with #goingtorockit returns a positive sentiment.

When you add #goingtorockit, the sentiment changes to positive.

6. New Authors Extraction Endpoint

We are deprecating the existing Author Extraction endpoint and replacing it with a new Authors Extraction endpoint. Instead of returning just the primary author of a given article or text, we will now return all authors listed.

Please note: If you are currently using the Author Extraction API and want to receive results containing multiple authors, you need to update your endpoint. See the documentation for details.

7. Face Detection and Recognition API

We recently announced our new Face Detection and Recognition API, the latest addition to the AlchemyVision product family. When provided an image file or URL (demo), the Face Detection and Recognition API returns the position, age, gender, and, in the case of celebrities, the identity of the people in the photo. Organizations across a variety of industries, such as social media monitoring and advertising, can take advantage of face detection to analyze their unstructured image data. This API provides the ability for applications to glean demographic data from images, which can be useful when analyzing a person’s social media habits or for analyzing which images have the highest return on investment in advertising campaigns.

Will Ferrell and Chad Smith

Were you able to tell actor, Will Ferrell and drummer for the Red Hot Chili Peppers, Chad Smith apart? AlchemyVision can!

8. New Text Extraction Mode: Cleaned+Xpath

For our intermediate to advanced users, we’ve created an xpath mode. AlchemyAPI has its own clean text where we strategically determine the most relevant content on the page (stripping all chrome in the form of headers, footers, ads, etc.). However, some users prefer to have more control over what is or is not included in our results.

The new xpath mode allows you to take advantage of the known organization or layout of a web page. For example on a typical news article, you can take a very educated guess as to where the comments live on the page (the bottom). Typically, our clean text avoids comments, but if you know that you want to gather reader sentiment from those comments just point us to the xpath. “Cleaned” will give you the text from the article and an xpath query will supply the comments for analysis.

9. Enhancements

  • Image Tagging API - With this release, we’ve expanded the results of our Image Tagging API so that you can view and select from additional tags to ensure you have access to the information you want and need.
  • Image Link Extraction API - We've enhanced our Image Link Extraction API to make it more accurate and faster than ever before.
  • Publication Date API - We've enhanced our Publication Date API to make it more accurate and faster than ever before. We have also added a confidence notification to the results to help you further understand your results.

Begin Using These Features Today:

Get Your Free API Key

Share/Save

Part 2 - How Fractl Applies Science to Social Sharing

Part 2 - The Art and Science of Fractl's Publisher Engagement Study

By Sonya Hansen, Marketing Director

Fractl Analyzes Content Sharing

In my first post in this two-part series, I shared an interesting study performed by Sam Deford, Creative Strategist, and his team at Fractl that reveals the relationship between content sharing and readers’ emotional reactions.

Fractl wanted to learn which publishers have become most successful at promoting their articles on social media, and how, in order to better understand the viral potential of content placements. Through their research, Sam and his team discovered patterns showing the types of content that were doing best on certain publications, but needed hard data to confirm their suspicions and guide content strategy.

Using Buzzsumo to retrieve one million shared articles, Fractl ran them through Alchemy’s Sentiment Analysis API. The result of their research became the Publisher Engagement Analysis study, which provides an in-depth look at the content people share most and where they share it.

Of course, the reasoning and impetus for such a study is engaging in itself. But, what is an idea without technique to go along with it? Here is the rest of Sam Deford’s interview where he shares his "recipe" for performing this analysis.

What was your process for incorporating AlchemyAPI? What programming language did you use?

We retrieved one million URLs, article titles and the relevant share stats on each URL from Buzzsumo. I then used Mathematica to fetch a positive/negative sentiment score from Alchemy’s API. Using Mathemetica was primarily a force of habit, since I had become so familiar with the program during my physics and math studies in college. I have since written the JavaScript included in the pastebin embedded below to fetch results from AlchemyAPI in Google Docs spreadsheets.

What were your feelings about the results? Were they accurate?

At first, I did have some skepticism about the validity of the numerical sentiment score being returned by AlchemyAPI. So I did what I encourage anyone else to do: I fed the API approximately 20 articles and compared the sentiment scores returned with my gut feeling about the positive/negative nature of each article.

I was very surprised at the accuracy. I was also interested in how the API worked, so I acquainted myself with much of the documentation that Alchemy offers, which convinced me that the score I was getting was as accurate as possible.

What are your tips for others who are just getting started with analyzing their unstructured data? Pitfalls to avoid? Suggestions on first steps?

I would point out that the greater the amount of text being analyzed, the more accurate the result. Secondly, I would encourage those who are curious to just get started right now. You don’t need to know how to code at all.

I have written a script that can be easily be used in Google Sheets and have you running some sentiment analysis right away. Here are the steps you’ll need to follow to get this set up:

  1. Get an API key from Alchemy.
  2. Create a new sheet in Google Drive.
  3. In this new sheet, go to Tools > Script Editor > Blank Project.
  4. Delete “function myFunction() { }” and paste the contents of the following pastebin:
  5. Paste in your AlchemyAPI key between the quotes in the very first line of the script.
  6. Save your script and name it, close script editor.
  7. In the spreadsheet, you can now use the following functions:

    =alchemy_text(“text”)
    This function returns an overall positive/negative sentiment score of the text between the quotes. Omit line breaks to ensure all text is analyzed.

    =alchemy_url(“url”)
    This function returns an overall positive/negative score of the contents of the URL between the quotes.

    =alchemy_targeted_text(“text”,”target”)
    This function returns a positive/negative score of the text in between the quotes as it relates to the targeted word/phrase in the second argument. Omit line breaks to ensure all text is analyzed.

    =alchemy_targeted_url(“url”,”target”)
    This function returns a positive/negative score of the contents of the URL between the quotes as it relates to the targeted word/phrase in the second argument.

To learn more about targeted vs. overall sentiment, please refer to the following documentation.

What could we do to make your AlchemyAPI experience even better?

I would really love to see sarcasm detection as a service offered, but am skeptical if technology may ever be capable of such a thing. In lieu of that, I would like to see AlchemyAPI create tools that allow the specific dominant emotions evoked by a piece of content to be identified.

Thanks to Sam and the team at Fractl for sharing their approach with us. Do you have a story to tell? An interesting use of technology that could be shared with and emulated by your fellow Alchemists? Let us know on Twitter @AlchemyAPI or email us.

Read Fractl's Publisher Engagement Analysis Study:

Read now

Learn About AlchemyAPI's Approach to Sentiment Analysis:

Get the Whitepaper

Share/Save

How Fractl Applies Science to Social Sharing

Part 1 - The Art and Science of Fractl's Publisher Engagement Study

By Sonya Hansen, Marketing Director

Fractl Analyzes Content Sharing

Content informs. Content convinces. But really effective content also elicits an emotional response that makes you want to share it. So, when I read about a new report from Fractl on one of my favorite blogs, I was intrigued by what they found through researching the relationship between content sharing and sentiment. That they used AlchemyAPI’s Sentiment Analysis API as part of the process was icing on the cake. And I do like cake.

So, I contacted Sam Deford and the team at Fractl to find out more about their work. Graciously, Sam agreed to share his thoughts on the report, their impressions of AlchemyAPI and what you can do to start gauging the visceral impact of the content you so painstakingly craft. He provided so many great insights that I broke it into two parts. This first part will focus on the story and art behind Fractl’s study. The second post will provide Sam’s recipe for implementing this yourself. You’ll want to read that one, I promise!

First, here’s some background. Sam Deford is a Creative Strategist. After graduating from the University of Colorado – Boulder (also my alma mater – Go Buffs!) with a bachelor’s degree in physics and a minor in mathematics, Sam worked at an Internet security software company. He then moved to South Florida to begin work at Fractl. In his first few months, Sam promoted content to the Web’s top publishers, which gave him a solid understanding of what makes the Internet tick. Now, Sam employs his analytical skills, creativity and a range of technical abilities to scour the nooks of the Internet that are ripe for exclusive research and create content that audiences will latch onto and share.

Tell us about your annual report. What gave you the idea? Who is your target audience or reader?

Fractl performs exclusive research and creates interesting content for Internet audiences to enjoy and share. The role of the online publisher is crucial to the success of our campaigns because they are the springboard through which our content reaches audiences across the Web. While publishers provide this important role for us, we provide them with new and relevant stories. The relationships truly are mutually beneficial.

Over time, we started to pick up on patterns showing what types of content were doing best on which publications; but without the hard data to confirm our suspicions, we were left with mere snapshots of insight based primarily on speculation.

Our research was aimed at putting an end to the speculation. The arrival of Buzzsumo, and Buzzsumo’s support by extending us the use of their API, allowed us to retrieve the top one million shared articles from the last six months, from 200 of the Web’s top publishers. We then ran these articles through Alchemy’s Sentiment Analysis API, which gave us numerous insights that we could not have found otherwise.

I believe the resulting analysis of that data set is of interest to many verticals. It was originally driven by our own curiosity but, as we began looking into the data, it became obvious that the online publishing world, online advertisers, content marketers and the very Internet audiences this research looks into could find value in the information.

How did you incorporate AlchemyAPI into your annual report? What features did you use? What information were you trying to gather?

In this report, we incorporated Alchemy’s Sentiment Analysis REST API to get an overall score of the emotional sentiment of the one million most-shared articles. Overall, the top performing content skewed toward the negative end of the sentiment spectrum, but this was not true for all publishers and social networks.

The following chart shows the distribution in emotional sentiment for the top 500 articles on each social network.

Sentiment for top 500 articles on each social network

This chart shows the distribution in emotional sentiment for the top 500 articles on 20 of the Web’s top publishers.

Sentiment for top 500 articles on 20 of web top publishers

What surprised you most about your project?

What surprised me most about our findings was how overly dependent on Facebook sharing online publishers are. The articles we looked at had 2.6 billion shares in total, and Facebook shares made up 2,188,777,030, or 81% of those shares. I was pleased to see that certain publishers like Mashable have a pretty well rounded “sharing landscape” (as can be seen in the dashboard linked above). Overall, I’ve been very intrigued by variations in the nature of the top-shared content on each publication and the overall sharing landscape on each publication.

How did you find out about AlchemyAPI?

I first found out about AlchemyAPI through a colleague who was using it in a client project to determine a sentiment score for a group of Reddit comments. A few months later, when I was working on this research, it occurred to me that AlchemyAPI could be used to provide an additional layer of insight in our study.

Where do you see unstructured data analysis heading in the future? Are there additional use cases you would consider or recommend to others?

Humans communicate and understand things in a manner much differently than computers. Using technology to gain insight from the garbled nonsense of human language is becoming increasingly possible with services such as those offered by AlchemyAPI. In the future, I expect such services to eventually do as good a job as an actual human being and also to be capable of the same range of emotional insight as humans. That is, in the future, I see tools like those AlchemyAPI develops being able to determine where language falls within a full spectrum of human emotions.

In addition, I expect to see tools that are capable of analyzing images, video, and spoken word and performing a similar emotional scoring of them.

Any additional comments?

I'd like to encourage readers to check out the full Publisher Engagement Analysis study.

Stay tuned for Part 2 of this post coming next week. We'll discuss the technical aspects of this project with Sam and provide insight into how Fractl completed their study.

Learn more about Sentiment Analysis:

Read now

Share/Save

What You Can Learn from Analyzing Communications Data

Tweets, and Chats, and Phone Calls... Oh My! Unlocking the Story Behind All That Data.

By Sonya Hansen, Marketing Director

What’s the total volume of communication you had in 2013? That includes all the emails, Facebook messages, SMS, phone calls and physical mail (yes, letters still exist) you received or sent. You probably didn’t track that data, but Nicholas Felton tracked his. According to his 2013 Annual Report, he exchanged thoughts with someone about 95,000 times last year.

Nicholas Felton analyzes his personal data

Nicholas Felton, Information Designer

Felton spends much of his time thinking about data, charts and our daily routines. He is the author of many Personal Annual Reports that weave numerous measurements into a tapestry of graphs, maps and statistics reflecting the year’s activities.

As an Information Designer, Felton’s infographics are as much works of art as they are communication devices. He was one of the lead designers of Facebook’s timeline and the co-founder of Daytum.com. His most recent product is Reporter, an iPhone app designed to record and visualize subtle aspects of our lives. His work is part of the permanent collection at MoMA. He has also been profiled by the Wall Street Journal, Wired and the New York Times and recognized as one of the 50 most influential designers in America by Fast Company.

What’s also interesting is that Felton used AlchemyAPI’s services to help process the data he collected over the year. When I learned that he was a fellow Alchemist, I jumped at the chance to find out more about his experience with us. Adding to his volume of conversations for 2014, he was kind enough to respond in this online interview.

Tell us about your annual report. What gave you the idea?

At the end of 2004, I designed a graphic titled “Best of 04” that attempted to encapsulate the year. This graphic included my favorite aspects of the year including several quantified details like the number of postcards sent and air miles traveled.

The following year, I created the first Annual Report out of information drawn from my memory, calendar, photos and last.fm data. I segmented these activities from the year into sections like travel, photography, music and books and I thought the result would be interesting primarily to friends and family. Incredibly, the report was as popular among people who had never met me as with those who knew me intimately. The audience for the project now includes people with interests in design, data visualization, storytelling, anthropology and beyond.

How did you incorporate AlchemyAPI into your annual report? What features did you use? What information were you trying to gather?

With the 2013 Annual Report, I attempted to collect and analyze as much of my communication data as possible. This led me to gather nearly 95,000 records including the contents of my SMS, email, Facebook messages, conversation, phone calls and physical mail. I naively thought that I would be able to analyze the contents of these records by hand, to extract items of interest, but quickly realized that I needed to automate the process.

When I discovered AlchemyAPI, I was thrilled to see how the Entity Extraction features intersected with the questions I had for my data. I wanted to discover locations discussed, people mentioned and media that appeared in my communications. I used AlchemyAPI to extract entities from all of my entries and was then able to categorize the content of my communication and filter it by location, participant, medium and time of year.

The results of a year's worth of interaction

A visualization of a year's worth of Felton's communication data.

What was your process for incorporating AlchemyAPI? What programming language did you use?

I rely on MySQL to store the data for my project and used Processing.org to fetch entries from my database, send them to AlchemyAPI and then parse and save the results to a new database.

How did you find out about AlchemyAPI? What was your experience like? What were your feelings about the results?

AlchemyAPI was recommended to me by Golan Levin at Carnegie Mellon. Golan both recommended the service and provided sample code for processing that allowed me to integrate the Entity Extraction API into my project. Having a tool like AlchemyAPI to parse my communication records was transformative. I wound up with nearly eight million words of content, so classifying them manually would have been an impossible endeavor.

I was both thrilled and frustrated by the results. AlchemyAPI made a Herculean task manageable and automatically categorized my data in ways that I would not have considered. My frustrations arose when I felt that the API was leaping to conclusions and making assumptions; I think it would be more accurate for the software to limit these leaps. For instance, “Uncanny Valley” was returned as a recognized location, when it is a theoretical place. AlchemyAPI would also recognize typos but not provide disambiguation, so “headacheyvkately” was being returned as a health condition. It’s fantastic that these entities are being recognized, but it seems that some human intervention is needed to return unblemished results.

What surprised you most about your project?

I approached the project with the assumption that the content aspects of my communication would be the most descriptive and sensitive portion of the data set. What I was surprised to discover was that the meta-data of my communication was in many ways more descriptive and ultimately much more trustworthy than the payload. This metadata was able to accurately describe my location, social ties and sleep patterns, while the content was filled with the ambiguities that arise from implied context in communication. This vagueness makes it so difficult to know which “Nicholas” is being discussed when only a first name is given or if “Ira Glass” is being mentioned in the context of his radio show or television appearances.

What are your tips for others who are just getting started with analyzing their unstructured data? Pitfalls to avoid? Suggestions on first steps?

There’s no replacement for experience. I would recommend diving in and getting started with the API to test your results. I think that working with a corpus you know intimately (like a favorite book) or something you’ve explored previously will be helpful. By working with data I lived, the surprising results stood out brightly for me and helped me to understand the API output more clearly.

Where do you see unstructured data analysis heading in the future? Are there additional use cases you would consider or recommend to others?

I see text analysis becoming far more nuanced and able to separate hunches from conclusions. I can imagine a version of AlchemyAPI that would ask for validation of important assumptions. In my dataset this would allow for “Nicholas,” “Nicholas Felton,” “Nick” and “Nick Felton” to be merged in a context-sensitive manner.

Any additional information you'd like to add?

I am really excited by the prospects for text analysis of this sort in both my work and that of other data artists. AlchemyAPI uncovered many of the stories locked in my data and I look forward to introducing my students and others to its potential.

Exchanging messages with Nicholas has not only made me think about my personal communications, but it also has me thinking how this kind of analysis can be used for businesses of all types - voice of the customer analysis, support emails or even internal corporate communications. The possibilities are endless.

Get a free API Key and test our text and image analysis services on your own data.

Download now

Share/Save

Waggener Edstrom Teams Up PR Professionals and App Developers

How Application Developers Can Enable Agile, More Impactful Campaigns

Waggener Edstrom Enables Agile Public Relations

At Waggener Edstrom, a global integrated communications agency, there are strong bonds between PR professionals, developers and analytics teams. Application developers are brought directly into the PR dialogue, where their skills and imagination are called on to create new tools and platforms for measuring campaign results on a daily or even moment-by-moment basis. These tools and platforms uncover access to data that can be used to direct and modify communications campaigns to make the greatest business impact.

Waggener Edstrom’s story tells how WE Infinity, a data mining and analytics platform, tracks online news coverage, millions of Tweets and other website content in real-time providing account managers the opportunity to measure campaign performance and help their clients modify strategies and tactics as a campaign matures. It’s an agile approach to unstructured data analysis that has helped Waggener Edstrom spur growth in an impressive roster of global enterprises.

“With the rapid pace of today’s communications environment combined with a deluge of data, our clients need to have accurate intelligence that helps them make decisions they can trust. With WE Infinity, we are helping our clients answer two of the most fundamental communications questions: Did I make an impact, and how do I improve my impact going forward,” said Karla Wachter, Waggener Edstrom Communications senior vice president of Insight & Analytics.

WE Infinity started with an internal hackathon, the desire for real-time analysis, and our free API keys. Application developers at Waggener Edstrom used five services from keyword extraction and named entities to concept tagging and author extraction to transform an approach that once was entirely manual – visiting numerous web sites and listening to multiple social channels on a daily basis – to a more than 80% automated process.

“We wanted to crunch near real-time data, and we needed a way to analyze it to determine if it was relevant to our clients,” said David Kohn, Waggener Edstrom vice president of software development. “Our developers quickly built a proof-of-concept that included AlchemyAPI’s services using the free API keys available on their website. That was given to a larger team of developers with the challenge to build their own tools using that platform.”

The data produced about brands, products and issues is only growing in today’s fast paced environment. But data is just data until it is transformed into useful intelligence that helps communications professionals understand their audience, who influences them and where, what messages resonate and what ones don’t. With solutions like WE Infinity, not only is Waggener Edstrom providing near real-time insight into the performance of communications activities, but also what can be done to improve performance going forward.

Read Waggener Edstrom's success story to learn:

  • How application developers transformed a once manual data analysis process to be more than 80% automated.
  • How five AlchemyAPI services integrate directly into the WE Infinity platform to enable near real-time analysis.
  • How they create a bigger bang for every campaign dollar with data-driven decisions and agile campaign management.
  • Read now

Share/Save

Pages

Subscribe to Blog