The world is constantly uploading photos to the internet. By the same token, news articles and blog posts are written and posted to the web around the clock. Clearly, there is an ever-growing need to have automated systems tag these documents and images for us. Having a way to provide keywords for text and photos without human involvement can be a massive game-changer for someone interested in aggregating relevant topics in news outlets or categorizing very large libraries of pictures.
Here at AlchemyAPI, we have solutions to these problems! Users of AlchemyLanguage and AlchemyVision have already seen the power of quickly and reliably extracting keywords out of paragraphs and/or pixels. But, this begs the question: what do we mean by “reliable?”
For demonstration purposes, let’s consider the image below. What are some good tags (or keywords) for this picture? Don’t overthink this one...
If you said “iPhone” and “Apple,” most people would agree with you. As it turns out, AlchemyVision would also agree with you! See the output of our image tagging API for this photo:
AlchemyVision takes simple tagging a step further by associating a “confidence score” between 0 and 1 (1 being the most confident) with each tag.
Note how different these two scores are. The confidence of “iphone” is significantly higher than that of “apple.” Why? If we know that this is an iPhone, don’t we also know that it is an Apple product? Shouldn’t we be equally confident in those terms as appropriate tags for this image?
The thing about these scores is that they do not necessarily demonstrate the correctness of a particular tag, but rather they indicate how appropriate the tag is for a given image. In the example above, “iphone” and “apple” are both correct tags for the picture. However, it turns out that “iphone” is actually a better fit, which is why we see such a large difference between these scores.
While there is no silver bullet for selecting the number of terms or tags to associate with your images, you can turn the knobs yourself. Figure out the appropriate thresholds of scores for your image tagging purposes. As a general rule of thumb, a score of 0.9 or higher with AlchemyVision signifies that a tag that is pretty spot-on.
Also, for text analysis features like entity extraction and concept tagging, relevance scores are calculated for each entity or keyword in a document. The relevance score depicts the significance of each unique term, in a similar vein to the confidence scores returned in image tagging. The higher the relevance score, the more important that term to the central meaning of the document.
As tags can be subjective, we recommend familiarizing yourself with the outputs of our analysis engines by test driving the AlchemyLanguage and AlchemyVision demos. If you’d like to run more in-depth testing with some of your images and documents, download a free API Key and get started today.
“Easy to code,” “simple to use” and “Internet-scale” are probably not terms you associate with developing applications that analyze Tweets, chats, emails and other unstructured data coursing through analytics pipelines. But, perhaps we can cause you to reconsider with an example.
Leaning on a decade of data analytics and cloud scalability, Google recently unveiled Cloud Dataflow, an SDK and managed data processing service that can analyze real-time, streaming data flows and batch sets. Dataflow gives app developers the ability to execute semantic analysis of social posts and news using just a few lines of code and API calls.
Part of the reason we’re so excited about Dataflow is because we had a front-row seat when the Google Cloud engineering team paired it with our Sentiment Analysis API during the World Cup. For this project, Dataflow grabbed tweets, converted them into objects, translated them to English, and then used our API to score the positive and negative connotations found in World Cup fans’ tweets. It was fun to see how social media activity directly correlates with real-time events on the field.
Another reason we’re talking about Dataflow is that we are hosting a web session on Thursday, September 18 with Eric Schmidt, a Solutions Architect for the Google Cloud engineering team focused on big data scenarios who led the Dataflow project. Our CMO, Richard Leavitt, will join Eric, to discuss:
Recordings of the first two web events in our Deep Learning Series, titled What is Deep Learning AI & How Should You Use It Today? and Artificial Intelligence APIs: Intro for Those Building Smarter Applications, are available today.
What compelled Google to commit Schmidt and his team to create a replacement for 10-year old MapReduce was the need to cope with exabyte-scale data while making it very easy to write pipelines, apply analytics and use the same code for batch and streaming analytics.
While you probably aren’t dealing with data at Google-scale, you probably are working to answer the same types of questions: What do people really think about my company and products? How can we get actionable information from data in real-time, rather than just store it for later? What connections can I make from my raw data to the results to confirm the trends that we see?
I invite you to join us for How Google Does It: Big and Fast Semantic Analytics Pipelines. Of course, if you register but cannot attend, we will follow up with access to the presentation, recording and Q&A.
Date/Time: Thursday, September 18, 2014 at 12pm ET/9am PT
Perfect for: Software and technology leaders who want to perform semantic analysis on real-time social media and news streams at Internet scale.
Presenters: Eric Schmidt, Solutions Architect at Google and Richard Leavitt, CMO at AlchemyAPI
Tell us about the AlchemyVision project. What did the team set out to accomplish?
The goal of the project was to build a large-scale system that could be used reliably to add valuable structured data to unstructured entities in the form of image data. This would provide a balance in the scope of our company, whose primary goal at that point was adding structured data to unstructured text.
At the outset, the team knew that, eventually, the vision project should be able to extract accurate knowledge from the potentially billions of images that make up the world's growing corpus of data.
What were some of the AlchemyVision project outcomes?
Our first metric of success was to measure up with a seminal paper published by Hinton’s group from 2012. The team reached that goal rather quickly. From that point on, it was a new frontier as there were no other groups who had moved past that point. After we started to see the release of other competing products, we made surpassing those products’ results our goal. We’re proud to say that the AlchemyVision system can accurately tag a wider range of images than these other competing systems.
We’ve had positive feedback from several customers, such as Simply Measured, whose CPO and Co-Founder, Aviel Ginzburg, is excited to use AlchemyVision in its product.
"As shoppers become increasingly comfortable making their purchases online, major brands are driving a large amount of sales through eCommerce offerings," said Aviel Ginzburg, CPO and Co-Founder of Simply Measured. "It's important that we provide brands a way to track which campaigns resonate and drive action online. The ability to track and measure everything in a campaign, including the images used, gives brands a competitive advantage when targeting customers and driving sales. With AlchemyVision, we have been able to accurately tag and classify a good portion of images at very high rates with minimal human effort."
AlchemyVision now has several customers, including CamFind, who has seen response rates as fast as a second when querying our API. They use our service to provide the first-response tagging results for their image recognition app.
What was one of the major lessons you learned during the AlchemyVision project?
KISS, or “keep it simple, stupid,” was a consistent theme in many stages of development of AlchemyVision. The simplest idea usually ended up being the best. Some approaches seemed like they were too basic to work, but in the end, they were the optimal solutions to our problems. In the course of a brainstorming session, the team would come up with an idea that was so simple it appeared to trivialize the problem. We thought that there had to be a more elegant solution. But when we tried these, most often we’d end up back at the first idea, refining it to make it more robust.
Topic Artificial Intelligence APIs: An Intro for Developers Who Must Build Smarter Applications
Perfect for: Developers, programmers, engineers & hackers getting started with AI
Presenter: Devin Harper, AI Research Scientist
Application programming interfaces, or APIs, allow developers to utilize the power of artificial intelligence (AI) to create applications and build features on top of them. AI is being used everyday to detect fraud, recommend relevant content and products, power e-commerce platforms, listen to consumer sentiment in social media channels and much more.
But, we think the coolest thing about artificial intelligence APIs is that they show no bias toward company size, industry or job title. Anyone with a little programming experience can use them.
Below are four cutting-edge “Alchemists” that demonstrate what you can do with a creative idea, the right tools and a little perspiration. While there are many more examples worth recognizing, we had to pick just a few!
For even more ideas, join Devin Harper, AI Researcher at AlchemyAPI on Thursday, August 28 for "Artificial Intelligence APIs: An Introduction For Those Who Must Build Smarter Applications." Learn about the cloud APIs available to you today like AT&T Speech, AlchemyVision and Google Translate and get ideas for how you can apply AI to your specific business challenges.
Advertising network, AdTheorent matches web page content to reader interest with hyper-relevant ad targeting, which goes far beyond simply categorizing a web page or a tweet. They incorporate AlchemyAPI’s Keyword Extraction and Sentiment Analysis APIs to process more than 2 billions records each day and tie in important factors such as emotions, intent and facts expressed within the content. Their efforts have increased click-through rates (CTRs) on their ads by more than 200% and enable more effective monetization of audiences for their clients.
BrainJuicer, a market research agency, drives new sources of revenue for their clients by providing product recommendations aligned with consumers’ online behavior. To fuel their recommendations, BrainJuicer created digital avatars, or DigiViduals®, to seek out online content and discussions aligned with designated buyer profiles. When combined with AlchemyAPI’s Keyword Extraction, Language Detection and Relation Extraction APIs, it becomes easy to uncover trends, connect multi-channel activity and expose buyer preferences from the data to make high-performing product recommendations.
“We have run DigiViduals® for a couple of years now,” Richard Shaw, VP and DigiVisionary explains, “Our clients are pleased… In pre-market testing, we have noticed that ideas coming from DigiViduals® outperform ideas coming from other approaches like focus groups and brainstorming.”
After extensively exploring open source tools and considering building their own system, CrisisNET chose to partner with AlchemyAPI to accurately and quickly fill their “firehose of crisis data” with images from around the world. The team at CrisisNET uses AlchemyVision to pull in images from thousands of data sources, ranging from individual Facebook posts to UNHCR refugee updates to LERN's ebola case data, to drive their platform that aggregates and disseminates timely, relevant, and accurate information to news organizations reporting on natural disasters and humanitarian conflicts.
Altura Interactive, a Spanish digital marketing agency, uses AlchemyAPI to hone their SEO strategies. They employ the Entity Extraction API to help translators understand the entities they should translate and the ones that should remain untouched. They also use the Keyword Extraction API to map keywords to specific pages.
These services help Altura Interactive enrich their content by providing relevance scores and sentiment analysis for terms. And, they use the Language Detection API to divide the backlinks (incoming links), analyze them and reach out to websites that are in other languages to ask them to point their links to the appropriate pages.
In the world of market research, you can’t avoid the need to understand consumer actions and preferences. That can take a lot of time. In this case study, BrainJuicer shows us how you can bypass the amount of time your team spends parsing all of the information for useful signals with natural language processing (NLP).
The team at BrainJuicer spends a lot of time determining what intrigues their clients’ audiences and using that information to develop new ideas for products and campaigns that drive revenue. However, it is difficult to figure out exactly what consumers want. The sheer volume of data regarding their online and social interactions is enough to overwhelm any researcher.
Seeking the ultimate focus group to solve this problem, Richard Shaw, VP and DigiVisionary at BrainJuicer had an idea. Why not get insight into consumers’ real preferences and interests by creating digital buyers that mimic online behavior and gather information on their own? By going to where consumers are (Twitter conversations, forums, articles, etc.), BrainJuicer would be able to take the guesswork out of campaign strategies.
There was one problem. How would they create these avatars, now known as DigiViduals®? With a tight timeframe and a small budget, Shaw looked for a partner for help. “I tried a few APIs and found AlchemyAPI’s services to be the fastest to implement and easiest to use. And the documentation they provide is extremely user-friendly… Someone like me, who has a great concept but not millions of dollars or a team of developers, can realize their idea,” he states.
DigiViduals® have run for a couple of years and clients are pleased. “It is a great way to bring new ideas to life and it has shortened the time it takes for ideas to go from concept to production and release. In pre-market testing, we have noticed that ideas coming from DigiViduals® outperform ideas coming from other approaches like focus groups and brainstorming,” says Shaw.
Are digital avatars the ultimate focus group? Maybe. But for Shaw and the team at BrainJuicer, this is just the start of helping companies determine how to better serve consumers. Next up, BrainJuicer will enhance DigiVidual® profiles using AlchemyVision to process images posted on sites such as Instagram and Pinterest.
With the rise of young entrepreneurs like Mark Zuckerberg of Facebook, Andrew Mason of Groupon and others, millennials are focused more than ever on developing the next “big thing” and starting their own companies. Free Ventures, a non-profit startup incubator founded by students at UC Berkeley, is now helping budding entrepreneurs fulfill their dreams by acting as a launchpad that provides resources, funding, mentorship, and workspace to build products into companies.
“For a student to build a ‘cool idea’ into a full-fledged startup takes resources, capital, hard work and guidance from a mentor,” explains Cameron Baradar, Co-Founder of Free Ventures. “This is exactly what Free Ventures offers our teams.”
Two Free Ventures teams, Einstein and Iris, use AlchemyAPI to power their startup ideas.
Einstein, halfway into its second year, is a product recommendation platform that uses AlchemyAPI to process consumer reviews and make intelligent purchase suggestions to buyers.
The second team, Iris, uses AlchemyAPI to power their keyword-based aggregation of high quality blog content, intelligently connecting bloggers discussing similar content.
“While not every Free Ventures team will raise a seed round or launch publicly, a community of supporters like Amazon Web Services, AlchemyAPI, and others give our teams the ability to build without concern. Whether they garner a six figure investment or disband after a semester, first and foremost, these teams are here to learn,” shares Baradar.
If you are in the UC Berkeley neighborhood, share your startup prowess by becoming a mentor for a new team. Learn more at freeventures.org or contact the Free Ventures team at free[at]berkeley.edu.
Gartner estimates that a staggering 80% of business data is unstructured, which means it is in hard-to-analyze formats such as emails, tweets, chats, blogs, images and more. Development teams are being overwhelmed with requests to create applications and services that automatically gather and synthesize data so that organizations can make better content and purchasing recommendations, extract keywords for search engine optimization (SEO), collect brand intelligence to develop effective messages, and more.
Many application developers, engineers and their leaders are supporting their image and text analysis efforts with deep learning, a new area of machine learning and one of the ten breakthrough technologies featured in the 2013 MIT Tech Review. At a high-level, deep learning deals with the use of neural networks to improve things like computer vision and natural language processing to solve unstructured data challenges. With deep learning, businesses can efficiently process and make sense of all of the data at their fingertips to drive increased productivity, innovation and profit.
Here are five of our favorite deep learning resources. Take a look and let us know if you have others to add to the list. And for a more interactive approach, join our Chief Scientist, Aaron Chavez on August 14 for the first webinar in our Deep Learning Webinar Series -- “What is Deep Learning AI and How Should You Use It Today?”
Bookmark these resources for future reference:
In our blog on the Digital Disruption at The New York Times, we delved into their leaked innovation report where Times’ staffers identified the need to attract readers by recommending older stories, suggesting related stories, and better packaging and personalizing content to individual readers' interests. The report especially emphasizes the need to reach readers where they are, which is often on their mobile devices.
Well, here’s the Hot News Timemachine, a Chrome browser extension that demonstrates a little of what you can do by accurately tagging your news archives with high-level, human-like concepts. It’s by a couple of Aussies, Kenni Bawden and James Edwards, who developed it for the GovHack2014 Hackerfest.
Here’s what the creators say about it:
"Hot News Timemachine is a fun, new Google Chrome browser extension that shows you that anything new in the news, is really old news, and provides you with a serendipitous and intriguing alternative to today's click-bait fluff."
"When you click on an Aussie web news story, the Hot News Timemachine roars into action! Hot News Timemachine swaps out the boring, current 'news' story that you are reading for a much more interesting, old fashioned one."
Technical Details: Hot News Timemachine was created by Kenni Bawden and James Edwards, for the GovHack2014 Hackerfest. It utilizes AlchemyAPI's Concept Tagging engine to process the extensive, digitized collection of Australian newspapers found on Trove, and provides a link to additional relevant insights gleaned from the Humanities Networked Infrastructure (HuNI) collection.
Give it a try and share your thoughts on how repackaging "old news" can generate new revenues.
By Marissa Kaufmann
Recently at MozCon 2014 - a gathering of digital marketers - Zeph Snapp, CEO and Founder of Altura Interactive presented his tips for leveraging existing content in other languages. Attendees learned how to go beyond the technical implications of international SEO and the benefits of using natural language processing tools to perform tasks such as pinpoint keywords and map them to specific pages. We got together with Zeph to learn a bit more about his presentation and how he uses AlchemyAPI to improve international SEO.
1. Tell us about your presentation at MozCon.
My presentation was about localizing content for international audiences. You can download the slides or watch the presentation here. I want to teach digital marketers in the U.S. how to plan for and distribute content for audiences in other languages. There are so many tools out there that can make digital marketers’ and SEO experts’ lives easier. I wanted to share the resources, like AlchemyAPI, that I’ve found so that others could benefit.
2. In your presentation, you discussed how you incorporate AlchemyAPI in your work. Tell us how you use natural language processing as a digital marketer.
At Altura Interactive, we use AlchemyAPI’s Entity Extraction API to help our translators understand the entities they should translate and the ones that should continue untouched. We also use the Keyword Extraction API to map keywords to specific pages. These services help us enrich our content by providing relevance scores and sentiment analysis for terms, among other things. And, we use the Language Detection API to divide the backlinks, or incoming links, to a specific page so that we can analyze them and reach out to websites that are in other languages to ask them to point their links to the appropriate corresponding pages.
3. What are your tips for others who are just getting started?
Before using AlchemyAPI, you and your resident app developer need to read this guide to getting started with AlchemyAPI, and then go to town!
4. How did you find out about AlchemyAPI?
At a conference, a colleague and I were talking about how we could leverage natural language processing and your team came up.
5. Where do you see unstructured data analysis heading in the future?
Some of the most interesting and important work for digital and SEO marketers is analyzing language at a massive scale. It will be important for NLP technology to understand the connotations of specific words in specific contexts. The implications of NLP tools for SEO and building international content are amazing. I’m excited to see solutions, like AlchemyAPI, used more and more by professionals in my field.
For most of us, summer means backyard BBQs, beach vacations, and relaxing by the pool. While our team members at AlchemyAPI have gotten their fill of hamburgers and potato salad, we have also been busy adding new features to our natural language processing (NLP) and computer vision solutions.
With AlchemyAPI’s NLP and computer vision APIs, customers gain access to updates the moment they become available. Read on to learn about the most recent enhancements and visit our Recent Updates timeline for additional information.
1. Combined Call
With AlchemyAPI’s combined call, you are able to analyze a single piece of content (URL, HTML, Text) with multiple text and image analysis features all at once.
2. Publication Date API
This newly available API helps you group articles by extracting publication date information from web pages and normalizing the data to give you standardized formats. This API solves the problem of determining a publication date when faced with the following challenges: varied date formats (05/10/2014 or May 10th, 2014 or the 10th of May 2014), placement on the webpage (header, h1, main body), and differentiating between multiple dates on a page.
Publication date extraction combined with other text analysis features enables the generation of tag clouds, sentiment towards specific topics, and more on a temporal basis.
3. New Web Page Cleaning
The new web page cleaning system is more accurate, has increased precision on extracting the main text from article pages, and recognizes a wider variety of types of pages and handles them appropriately. Overall, the new system makes text extraction results more meaningful and is able to extract data from a larger percentage of pages than was previously possible.
4. Constituent Parser
The constituent parser builds a parse tree from the words in a sentence. Parse trees are useful structures to show the relationship between words. For example, in the phrase "new cars and trucks," we know that the word "new" applies to both cars and trucks. AlchemyAPI's technology understands the structure of complex sentences and we are now exposing this powerful process to customers.
5. New Taxonomy
In Q2, we rolled out a significant taxonomy update, offering improved accuracy on web page content analysis. Confidence scores have been improved to more accurately convey when the results can be trusted. If you need custom categories, we can help with that, too. Contact us for more information.
6. Hashtag Sentiment
Sentiment hashtag decomposition makes it possible to determine the sentiment of hashtags by splitting them into individual words and commonly used phrases.
Given any URL, the Image Link Extraction API will scan the designated page to find the most prominent image and directly retrieve the URL for that image. It can then be appropriately classified and tagged. You can use the Image Link Extraction API to aggregate images and understand the context in which they are being served.
With the Image Tagging API, you can quickly categorize and organize image libraries at a massive scale. By understanding complex visual scenes in their broader context, you can automatically extract knowledge from images and act upon what you learn.