Artificial Intelligence Algorithm behind IBM Watson

Tarandeep Singh

13 years ago

AI is everywhere from your robots to search engine that bring back the most appropriate results. But there’s something extraordinary with IBM’s latest Robot/Machine- It can answer just about any question based upon the data available on the internet.

IBM’s Watson has blazed the news when the system played Jeopardy, showcasing the greatest advancements in AI. Prior to that IBM was famous for the Deep Blue, a chess playing Robot, which took a brute-force approach, relying on a giant opening book and sheer computational power to search the game tree, rather than the intuitive approach apparently taken by human players.

In computer science, we often get to see a large number of algorithms that deal with making decisions. The one being used here for Jeopardy is Question Answering (QA), which is part of algorithm called Information Retrieval (IR). IR helps in retrieval of relevant information or documents from a store (which can be wikis), while QA specifically deals with returning information in response to natural language questions (who, where, when, what, etc.).

On one side where QA goes focuses on answering specifically (just a word or phrase), on the other side IR commonly returns whole documents (as search engines do). IBM calls such a combination of algorithm : DeepQA.

“DeepQA”, Davidian explained, “scales out with and searches vast amounts of unstructured information. Effective execution of this software, corresponding to a less than three second response time to a Jeopardy! question, is not just based on raw execution power. Effective system throughput includes having available data to crunch on. Without an efficient memory sub-system, no amount of compute power will yield effective results. A balanced design is comprised of main memory, several levels of local cache and execution power. IBM’s POWER 750’s scalable design is capable of filling execution pipelines with instructions and data, keeping all the POWER7 processor cores busy. At 3.55 GHz, each of Watson’s POWER7 on-chip bandwidth is 500 Gigabytes per second. The total on-chip bandwidth for Watson’s 360 POWER7 processors is an astounding 180,000 Gigabytes per second!”

The algorithms used to compute the answers are far more complex than normal AI used in Robots. It can be considered as complex as a search engine itself which actually makes decisions. The paper (Ferucci et al 2010, AI magazine), outlines Watson’s approach to Question Answering in detail. It’s 20 pages long, but it’s not incredibly technical, highly recommended for everyone who have the passion.

The machine is named after Thomas J Watson, the founder of IBM.

If you want to succeed, double your failure rate – Thomas J. Watson

The paper begins with acknowledgements of the complexity of the problem and the scope:

“we appreciate that this challenge alone does not address all aspects of QA and does not by any means close the book on the QA challenge the way that Deep Blue may have for playing chess. However, note that while the Jeopardy! game requires that answers are delivered in the form of a question … this transformation is trivial and for purposes of this paper we will just show the answers themselves”

How Watson is able to Answer in Jeopardy

When it starts, Watson patiently hears to the question and then figures out answer within a span of 1-3 seconds. The category becomes the core fact for answering the question — determining the domain of the answer.
Watson heavily relies on clues & sub-clues. When the clue contains more than a single statement, Watson decides to break it down into subclues. The next step is to identify ‘what kind’ of answer the is being looked for. Its being called Lexical Answer Type (LAT) – Algorithm that helps figuring out which word among the clues represents the required answer. e.g if the question being asked is like “What was the year in history …”, the Lexical Answer Type is ‘year’. However, its really tough to for watson when the question has more pronouns than nouns. “If this is like it, how would that …”. Its easy to lose the context for an Artificial Intelligence engine.
If all had been well so far, Watson knows the type of the answer it is looking for (year, city, singer), one or more clues relating to the answer, and the category the clue is in. It then searches in its repository for answers — no it doesn’t google search for the answer, as we know, its not answering engine. Apart from that, Jeopardy restricts access to internet, as per the game’s rules. So instead, the DeepQA system builds a large database of web documents which are then stored and searched offline during the live Jeopardy game. It looks for Q&A, encyclopedias, etc as the base to generate web searches used to retrieve pointers to relevant web snippets. These web snippets, together with the initial resources and many other downloadable knowledge bases (dbpedia, freebase, etc) form the “extended corpus” that is Watson’s knowledge base.
The next step is to analyze the clue further in order to find out how to search the knowledge base. The question analysis section involves intensive Natural Language Processing techniques. The clue fragments are parsed, first syntactically and then semantically, to extract a logical form. The semantic roles in the clue are identified. Semantic role labeling is a major breakthrough in NLP, giving us algorithms that can identify the type of the subject and object in a parsed sentence.
Coreference resolution detects which entities pronouns refer to, named entity recognition detects terms that are proper nouns referring to people or organizations, relation extraction identifies the parts of the clue that make up predicates or relations between nouns or entities in the clue.
The process of searching the extended corpus is divided into two steps – at first the focus is on recall – the system generates a bulk of answers in the hope that the correct answer is somewhere in this list. This is done by querying its corpus in varied ways ( using reverse algorithms like Lucene and SPARQL), and returning small snippets of text and individual candidate answers. This list of possible answers is then weighted based upon some additional standalone algorithms, which employs computational lingusitics techniques to examine a particular aspect of the answer. This is actually much tougher and unclear. IBM didn’t explain the aspect in detail.
“Each of the scorers implemented in Watson, how they work, how they interact, and their independent impact on Watson’s performance deserves its own research paper.”
The information range and variation collected at this stage is sky-high. There are various components then come into play next — e.g. a component that can compare dates and times,a component for figuring out geographic coordinates, a component for matching the syntactic form of the clue with passages retrieved from web documents, and other scorers that can influence the confidence judgement for a particular answer. Watson tries to be as precise as possible, and it decides the confidence in the answer, based upon the above mentioned components. If its not absolutely sure, it might skip answering.
Hardware & OS: Watson’s hardware is equally complex. Its powered by lots of parallel computation using Hadoop, which is managed using UIMA. Watson is based on the IBM POWER7 750 in a standard rack mounted configuration. It can run AIX, IBM’s house-brand Unix; IBM I; and Linux. To compete on Jeopardy Watson is running Novell’s SUSE Linux Enterprise Server. IBM didn’t go into detail of the hardware, but its based upon their modern supercomputer engine.

Watson is made up of ninety IBM POWER 750 servers, 16 Terabytes of memory, and 4 Terabytes of clustered storage. Davidian sontinued, “This is enclosed in ten racks including the servers, networking, shared disk system, and cluster controllers. These ninety POWER 750 servers have four POWER7 processors, each with eight cores. IBM Watson has a total of 2880 POWER7 cores.”

Apart from the original paper, there’s one more paper that conceptualizes Large Scale Relation Detection . The paper describes methods for detecting relations between entities in a large corpus (over a billion words of text). This makes most of the things more clearer discussing how patterns of words in text such as “appeared together at the premier of “ can be used to predict other, more useful relations like “co-star in”. In order to get it to work, there is a need for a large number of built-in, detailed relations between concepts that can be populated using patterns of words in text.

Given the multitude of AI components interacting to solve all the little tasks that must be accomplished to parse a question and relate it to knowledge gained from previously seen documents, it really is Rocket science.

For more Tips, Tricks, Follow us on @taranfx on Twitter or subscribe below: