Over the last few months you’ve probably seen a number of announcements about how OpenCalais has been chosen by one organization or another to support its business.
In a number of recent meetings I’ve been asked the (very fair) question, Why OpenCalais and not one of the other entity extraction services out there?
Given that the question seems to be coming up more often as the number of extraction services increases, I thought I’d get my best understanding of why many major players we’ve announced (and an equal number we haven’t) have chosen to go with OpenCalais. And – at the end – I’ll mention a few reasons why others haven’t chosen OpenCalais.
So, in no particular order, why do organizations choose Calais?
OpenCalais is provided by Thomson Reuters – the largest professional information organization in the world.
If you’re interested in kicking around some semantic technologies in your spare time this doesn’t really matter. If you’re incorporating those technologies deep within your business – or, as is true with many users – actually building a new business on top of them, this becomes pretty important. Basically – you need to know that the service is going to be there for you.
Facts & Events
With the increase in structured content assets like Wikipedia / DBpedia, it’s become pretty easy to knock out a basic entity extraction tool. And – while we like entity extraction as much as anyone else – it’s really just the tiniest starting point in what you can and will need to do.
OpenCalais extracts a wide range of facts and events from unstructured content and lets you know what’s happening in your content– not just tags for things.
- Facts are things like “John Doe is CEO of XYZ Corporation.”
- Events are things like “XYZ Corporation today announced that it would acquire ACME Corporation.”
OpenCalais is the only service that does this in a production-strength manner.
OpenCalais stays up. It’s hosted in mirrored data centers thousands of miles apart from each another. It’s monitored 7*24. It basically doesn’t go down – even during system upgrades and maintenance. We stopped adding 9s after we got beyond 99.99% uptime.
We’ve been building the tools underneath OpenCalais for over a decade. They’ve been used by hundreds of organizations and many many thousands of end users. One of the things we’ve learned is that accuracy matters. While no NLP system is perfect, we’re convinced ours is the best and we have some ideas in the pipeline to increase accuracy even more.
We basically focus on providing great semantic plumbing. But we know that not everyone wants to be a plumber. We’ve worked to integrate (or motivate others to integrate) OpenCalais with a wide range of tools including Drupal, WordPress, WordPress Multiuser, Oracle, Lucene, Coldfusion, Flash, Firefox, Prolog, Lisp, Django, Java, PHP, Python, Alfresco, Perl, .NET, Ruby, TopBraid and a few others.
From content management systems to language-specific libraries – there are lots of ways to get started quickly.
We’re serious about Linked Data. We’re also worried about the proliferation of incorrect links and the effects of link rot. So, rather than just pointing to Linked Data assets out on the cloud and risking that they’ll go stale, we host our own Linked Data cloud, which is kept up to date with both Thomson Reuters contributed content as well as regularly validated links to other sources such as DBpedia, Freebase and others.
Pure semantic extraction is great – but sometimes you need more. If you’re writing about Porsches and Ferraris you’d probably like to have categorization concepts like “sports cars” and “automobiles” returned to you with your semantic metadata. OpenCalais does this via our ever-improving SocialTags concept tagging capability. It’s good now, and it’s going to get a lot better soon.
OpenCalais is here to provide great semantic plumbing. We’re not trying to sell ads. We’re not trying to provide the prettiest decorations for blogs. We build the plumbing – you architect the solutions.
Now, in a spirit of transparency, here’s why some people don’t choose OpenCalais:
We’re great in English and okay in French and Spanish (we extract entities but neither facts nor events in these two languages). We intend to implement more languages in the future – but for the time being we’re concentrating our efforts on improved functionality and accuracy in English.
OpenCalais isn’t a simple tagging tool. What it returns to the calling application is a reasonably complex RDF construct. It takes a little time to get up to speed on RDF and how to use it in your applications. We think it’s worth it because it’s the most flexible and powerful format we know of.
Performance in Knowledge Domain ‘x’
Where ‘x’ is fashion or square dancing or rugby. OpenCalais is optimized for performance in the general world of business – that’s where we excel.
We have extended OpenCalais to take steps in other areas (such as sports, media, etc.) – but if you need deep semantic extraction capabilities related to protein binding – there are better places to look.