IBM Watson™ Discovery Service unlocks insights hidden in unstructured data. This node.js application demonstrates how the Discovery API can be used to build queries and perform cognitive analysis using the Watson Discovery News dataset.
-
A Bluemix account. If you don't have one, sign up.
-
node.js (alternatively, this project can run using Vagrant. A Vagrant file has been provided which creates a Virtual Machine configured to run this project.)
-
Download this project using
git clone
. -
Download and install the Cloud-foundry CLI tool if you haven't already.
-
Connect to Bluemix with the command line tool.
bx api https://api.eu-gb.bluemix.net bx login -u <your user ID>
-
Create an instance of the Discovery service (if you have a trial account, replace
standard
withfree
):bx service create discovery standard my-discovery-service
-
Create and retrieve service keys to access your instance of the Discovery service:
bx service key-create my-discovery-service myKey bx service key-show my-discovery-service myKey
-
The project needs to be configured to work with your instances of the Watson Discovery Services. Rename
.env.template
to.env
. Fill in.env
with your service instance information. The.env
file will look something like the following:DISCOVERY_USERNAME=<username> DISCOVERY_PASSWORD=<password> DISCOVERY_ENVIRONMENT_ID= DISCOVERY_COLLECTION_ID= DISCOVERY_CONFIGURATION_ID= DISCOVERY_VERSION=2016-11-07
-
Use the
GET /v1/environments
method to get the environment ID of your Discovery service instance.curl -X GET -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments?version=2016-11-07"
{ "environments" : [ { "environment_id" : "<environment_id>", "name" : "Watson News Environment", "description" : "Watson News cluster environment", "created" : "2017-06-22T08:47:35.705Z", "updated" : "2017-06-22T08:47:35.705Z", "status" : "active", "read_only" : true } ] }
Notice that an environment already exists named
Watson News Environment
. This environment contains Watson Discovery News, a public data set that has been pre-enriched with cognitive insights, and is included with the Discovery service by default. -
Use the
GET /v1/environments/{environment_id}/collections
method to get the collection ID and configuration ID of your Watson News Environment instance.curl -X GET -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections?version=2016-11-07"
{ "collections" : [ { "collection_id" : "<collection_id>", "name" : "watson_news", "configuration_id" : "<configuration_id>", "language" : "en", "status" : "active", "description" : "Watson News pre-enriched collection of curated news sources", "created" : "2017-06-22T08:47:35.705Z", "updated" : "2017-06-22T08:47:35.705Z" } ] }
-
Fill in
.env
with your environment, collection and configuration IDs. The.env
file will look something like the following:DISCOVERY_USERNAME=<username> DISCOVERY_PASSWORD=<password> DISCOVERY_ENVIRONMENT_ID=<environment_id> DISCOVERY_COLLECTION_ID=<collection_id> DISCOVERY_CONFIGURATION_ID=<configuration_id> DISCOVERY_VERSION=2016-11-07
Get more help Getting started with the Discovery API
Watson Discovery News is a dataset of primarily English language news sources that is updated continuously, with approximately 300,000 new articles and blogs added daily.
This indexed dataset is pre-enriched with the following cognitive insights: Keyword Extraction, Entity Extraction, Concept Tagging, Relation Extraction, Sentiment Analysis, and Taxonomy Classification.
The following additional metadata is also added: crawl date, publication date, URL ranking, host rank, and anchor text. Historical search is available for the past 60 days of news data.
This application demonstrates how Watson Discovery Service can be used to query the Watson Discovery News dataset to find articles or quotes about a person. The sentiment of the documents retrieved are analysed using the pre-enriched with cognitive insights added to News dataset. These results are then output to a file.
To use this application, run npm install
to install the required node.js packages:
npm install
Verify the application is working correctly by running ./analysis.sh -h
. This should output the following help about the app:
Usage: analysis.sh [options]
Cognitive analysis of Watson Discovery News data.
Options:
-h, --help output usage information
-V, --version output the version number
-n, --name [name] person name.
-d, --dir [dir] Directory to output results to.
-q, --quotes [quotes] Use Watson Discovery Service to find quotes.
-p, --personality Use Watson Personality Insights.
Use the -n
flag to pass in a name to search on. Use the -d
flag to specify a relative directory to write the results to. The following query will analyse the News dataset for articles about the tennis player Roger Federer:
./analysis.sh -n Federer -d results
The analysis of this query have been output as comma separated values to results/Federer.csv
.
"name","hits","hits_negative","hits_positive","hits_neutral"
"federer",50,14,28,8
The application has analysed the sentiment of each articles found about Federer
. In total, there were 50 hits found, 14 of these had a negative sentiment, 28 had a positive sentiment and 8 had a neutral sentiment.
Use the -q
flag to look for quotes about a particular person. The following query will analyse the News dataset for quotes about the tennis player Roger Federer:
./analysis.sh -n Federer -d results -q
The analysis of this query have been output as comma separated values to results/Federer.csv
.
"name","hits","hits_negative","hits_positive","hits_neutral"
"federer",65,19,0,46
The application has analysed the sentiment of each quote found about Federer
. In total, there were 65 quotes found, 19 of these had a negative sentiment, zero had a positive sentiment and 46 had a neutral sentiment.
To output the quotes returned from Watson Discovery Service to the console, uncomment console.dir(data);
on Line 22 in analysis-quotes.js
.
The IBM Watson™Discovery service offers powerful content search capabilities using the Discovery Query Language. In this application, a query object is formed in discoveryQuery.js
, before using the node.js request
library to send an HTTP GET to the specified endpoint:
var queryUri = 'https://gateway.watsonplatform.net/discovery/api/v1/environments/'+process.env.DISCOVERY_ENVIRONMENT_ID+'/collections/'+process.env.DISCOVERY_COLLECTION_ID+'/query';
var queryObject = {
uri: queryUri,
method: 'GET',
auth: {
user: process.env.DISCOVERY_USERNAME,
pass: process.env.DISCOVERY_PASSWORD
}
};
Query parameters enable you to search your collection, and customise the output of the data you return. A query string is added to the query object as follows:
queryObject.qs = {
version: process.env.DISCOVERY_VERSION,
query: 'entities.text:('+name+')',
filter: 'entities.type:Person',
count: 50
};
Search and structure parameters determine what data is returned:
- filter: A cacheable query that excludes any documents that don't mention the query content. Filter search results are not returned in order of relevance.
- query: A query search returns all documents in your data set with full enrichments and full text in order of relevance. A query also excludes any documents that don't mention the query content.
- count: The number of documents that you want returned in the response.
Entity Extraction enrichment extracts persons, places, and organizations in the input text. The above query string filters for articles with the entity type Person
and then searches for articles with the parameter name
in the entity text. The name
parameter has been passed in at the command line. Fifty results are return from Watson Discovery service, as specified by count: 50
.
To retrieve quotes the query string looks like this:
queryObject.qs = {
version: process.env.DISCOVERY_VERSION,
query: 'entities.text:('+name+')',
filter: 'entities.type:Person,'
+ 'entities.quotations.sentiment.type::(neutral|positive|negative)',
return: 'entities.quotations,'
+ 'entities.text,'
+ 'quotations.quotation,'
+ 'entities.type',
count: 50
};
In this case the query string filters for articles with the entity type Person
and quotations
with a sentiment
. Only a subsection of each result is returned as specified by return:
.
More details on query strings can be found here.
The IBM Watson™ Personality Insights service allows applications to derive insights about personality characteristics from social media, enterprise data, or other digital communications. This application can be used to analyse the personality of an individual using IBM Watson™ Personality Insights based on quotes retrieved from the IBM Watson™ Discovery service
To use the IBM Watson™ Discovery Service together with the IBM Watson™ Personality Insights service, complete the following steps in addition to the Prerequisites steps stated above:
-
Connect to Bluemix with the command line tool.
bx api https://api.eu-gb.bluemix.net bx login -u <your user ID>
-
Create the Personality Insights service in Bluemix (if you have a trial account, replace
tiered
withlite
)bx service create personality_insights tiered my-personality-insights-service
-
Create and retrieve service keys to access your instance of the Personality Insights service:
bx service key-create my-personality-insights-service myKey bx service key-show my-personality-insights-service myKey
-
The project needs to be configured to work with your instances of the Watson Personality Insights Services. You will have previously renamed
.env.template
to.env
. Fill in.env
with your service instance information. The.env
file will look something like the following:DISCOVERY_USERNAME=<username> DISCOVERY_PASSWORD=<password> DISCOVERY_ENVIRONMENT_ID=<environment_id> DISCOVERY_COLLECTION_ID=<collection_id> DISCOVERY_CONFIGURATION_ID=<configuration_id> DISCOVERY_VERSION=2016-11-07 PERSONALITY_URL=https://gateway.watsonplatform.net/personality-insights/api/v3/profile PERSONALITY_USERNAME=<personality-insights-serivce-username> PERSONALITY_PASSWORD=<personality-insights-serivce-password> PERSONALITY_VERSION=2016-10-20
Get more help Getting started with the Personality Insights API
Use the -p
flag with the -q
flag to analyse the personality of a particular person. The following query will retrieve quotes from the Watson Discovery New dataset about the tennis player Roger Federer, before sending them to your instance of the Personality Insights service:
./analysis.sh -n Federer -d results -q -p
The analysis of this query have been output as comma separated values to results/Federer.csv
.
"name","openness","emotionalRange","conscientiousness","agreeableness","extraversion"
"federer",0.31142288181635164,0.755908433280148,0.8428408846691722,0.010573124252825084,0.0022307444673070886
The application has analysed the personality of the quotes found about Federer
using the Personality Insights service and provided values for the Big Five personality characteristics. The percentile returned for each characteristic reports the Federer's
normalized score for that characteristic; the Personality Insights service computes the percentile by comparing the author's results with the results from a sample population.
To output the quotes returned from Watson Discovery Service to the console, uncomment console.dir(data);
on Line 22 in analysis-quotes.js
.
A vagrant file in this project creates a Virtual Machine configured to run this project.
- Vagrant
- Instances of Watson Services running on Bluemix.
The project needs to be configured to work with your instances of the Watson Services. Rename .env.template
to .env
and edit the properties in the file to point at your Watson service instances.
vagrant up
ssh vagrant
cd /vagrant
npm install
./analysis.sh -h