Code Monkey home page Code Monkey logo

discovery-service-with-personality-insights's Introduction

Watson Discovery Service with Personality Insights

IBM Watson™ Discovery Service unlocks insights hidden in unstructured data. This node.js application demonstrates how the Discovery API can be used to build queries and perform cognitive analysis using the Watson Discovery News dataset.

Prerequisites

  1. A Bluemix account. If you don't have one, sign up.

  2. node.js (alternatively, this project can run using Vagrant. A Vagrant file has been provided which creates a Virtual Machine configured to run this project.)

Getting started

  1. Download this project using git clone.

  2. Download and install the Cloud-foundry CLI tool if you haven't already.

  3. Connect to Bluemix with the command line tool.

    bx api https://api.eu-gb.bluemix.net
    bx login -u <your user ID>
  4. Create an instance of the Discovery service (if you have a trial account, replace standard with free):

    bx service create discovery standard my-discovery-service
  5. Create and retrieve service keys to access your instance of the Discovery service:

    bx service key-create my-discovery-service myKey
    bx service key-show my-discovery-service myKey
  6. The project needs to be configured to work with your instances of the Watson Discovery Services. Rename .env.template to .env. Fill in .env with your service instance information. The .env file will look something like the following:

    DISCOVERY_USERNAME=<username>
    DISCOVERY_PASSWORD=<password>
    DISCOVERY_ENVIRONMENT_ID=
    DISCOVERY_COLLECTION_ID=
    DISCOVERY_CONFIGURATION_ID=
    DISCOVERY_VERSION=2016-11-07
    
  7. Use the GET /v1/environments method to get the environment ID of your Discovery service instance.

    curl -X GET -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments?version=2016-11-07"
    {
      "environments" : [ {
        "environment_id" : "<environment_id>",
        "name" : "Watson News Environment",
        "description" : "Watson News cluster environment",
        "created" : "2017-06-22T08:47:35.705Z",
        "updated" : "2017-06-22T08:47:35.705Z",
        "status" : "active",
        "read_only" : true
      } ]
    }

    Notice that an environment already exists named Watson News Environment. This environment contains Watson Discovery News, a public data set that has been pre-enriched with cognitive insights, and is included with the Discovery service by default.

  8. Use the GET /v1/environments/{environment_id}/collections method to get the collection ID and configuration ID of your Watson News Environment instance.

    curl -X GET -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections?version=2016-11-07"
    {
      "collections" : [ {
        "collection_id" : "<collection_id>",
        "name" : "watson_news",
        "configuration_id" : "<configuration_id>",
        "language" : "en",
        "status" : "active",
        "description" : "Watson News pre-enriched collection of curated news sources",
        "created" : "2017-06-22T08:47:35.705Z",
        "updated" : "2017-06-22T08:47:35.705Z"
      } ]
    }
  9. Fill in .env with your environment, collection and configuration IDs. The .env file will look something like the following:

    DISCOVERY_USERNAME=<username>
    DISCOVERY_PASSWORD=<password>
    DISCOVERY_ENVIRONMENT_ID=<environment_id>
    DISCOVERY_COLLECTION_ID=<collection_id>
    DISCOVERY_CONFIGURATION_ID=<configuration_id>
    DISCOVERY_VERSION=2016-11-07
    

Get more help Getting started with the Discovery API

About Watson Discovery News

Watson Discovery News is a dataset of primarily English language news sources that is updated continuously, with approximately 300,000 new articles and blogs added daily.

This indexed dataset is pre-enriched with the following cognitive insights: Keyword Extraction, Entity Extraction, Concept Tagging, Relation Extraction, Sentiment Analysis, and Taxonomy Classification.

The following additional metadata is also added: crawl date, publication date, URL ranking, host rank, and anchor text. Historical search is available for the past 60 days of news data.

Using the Watson Discovery Service

This application demonstrates how Watson Discovery Service can be used to query the Watson Discovery News dataset to find articles or quotes about a person. The sentiment of the documents retrieved are analysed using the pre-enriched with cognitive insights added to News dataset. These results are then output to a file.

To use this application, run npm install to install the required node.js packages:

npm install

Verify the application is working correctly by running ./analysis.sh -h. This should output the following help about the app:

  Usage: analysis.sh [options]

  Cognitive analysis of Watson Discovery News data.

  Options:

    -h, --help               output usage information
    -V, --version            output the version number
    -n, --name [name]        person name.
    -d, --dir [dir]          Directory to output results to.
    -q, --quotes [quotes]    Use Watson Discovery Service to find quotes.
    -p, --personality        Use Watson Personality Insights.

Use the -n flag to pass in a name to search on. Use the -d flag to specify a relative directory to write the results to. The following query will analyse the News dataset for articles about the tennis player Roger Federer:

./analysis.sh -n Federer -d results

The analysis of this query have been output as comma separated values to results/Federer.csv.

"name","hits","hits_negative","hits_positive","hits_neutral"
"federer",50,14,28,8

The application has analysed the sentiment of each articles found about Federer. In total, there were 50 hits found, 14 of these had a negative sentiment, 28 had a positive sentiment and 8 had a neutral sentiment.

Use the -q flag to look for quotes about a particular person. The following query will analyse the News dataset for quotes about the tennis player Roger Federer:

./analysis.sh -n Federer -d results -q

The analysis of this query have been output as comma separated values to results/Federer.csv.

"name","hits","hits_negative","hits_positive","hits_neutral"
"federer",65,19,0,46

The application has analysed the sentiment of each quote found about Federer. In total, there were 65 quotes found, 19 of these had a negative sentiment, zero had a positive sentiment and 46 had a neutral sentiment.

To output the quotes returned from Watson Discovery Service to the console, uncomment console.dir(data); on Line 22 in analysis-quotes.js.

How Watson Discovery Service works

The IBM Watson™Discovery service offers powerful content search capabilities using the Discovery Query Language. In this application, a query object is formed in discoveryQuery.js, before using the node.js request library to send an HTTP GET to the specified endpoint:

var queryUri = 'https://gateway.watsonplatform.net/discovery/api/v1/environments/'+process.env.DISCOVERY_ENVIRONMENT_ID+'/collections/'+process.env.DISCOVERY_COLLECTION_ID+'/query';
var queryObject = {
  uri: queryUri,
  method: 'GET',
  auth: {
    user: process.env.DISCOVERY_USERNAME,
    pass: process.env.DISCOVERY_PASSWORD
  }
};

Query parameters enable you to search your collection, and customise the output of the data you return. A query string is added to the query object as follows:

queryObject.qs = {
  version: process.env.DISCOVERY_VERSION,
  query: 'entities.text:('+name+')',
  filter: 'entities.type:Person',
  count: 50
};

Search and structure parameters determine what data is returned:

  • filter: A cacheable query that excludes any documents that don't mention the query content. Filter search results are not returned in order of relevance.
  • query: A query search returns all documents in your data set with full enrichments and full text in order of relevance. A query also excludes any documents that don't mention the query content.
  • count: The number of documents that you want returned in the response.

Entity Extraction enrichment extracts persons, places, and organizations in the input text. The above query string filters for articles with the entity type Person and then searches for articles with the parameter name in the entity text. The name parameter has been passed in at the command line. Fifty results are return from Watson Discovery service, as specified by count: 50.

To retrieve quotes the query string looks like this:

queryObject.qs = {
  version: process.env.DISCOVERY_VERSION,
  query: 'entities.text:('+name+')',
  filter: 'entities.type:Person,'
        + 'entities.quotations.sentiment.type::(neutral|positive|negative)',
  return: 'entities.quotations,'
        + 'entities.text,'
        + 'quotations.quotation,'
        + 'entities.type',
  count: 50
};

In this case the query string filters for articles with the entity type Person and quotations with a sentiment. Only a subsection of each result is returned as specified by return:.

More details on query strings can be found here.

Using Personality insights

The IBM Watson™ Personality Insights service allows applications to derive insights about personality characteristics from social media, enterprise data, or other digital communications. This application can be used to analyse the personality of an individual using IBM Watson™ Personality Insights based on quotes retrieved from the IBM Watson™ Discovery service

To use the IBM Watson™ Discovery Service together with the IBM Watson™ Personality Insights service, complete the following steps in addition to the Prerequisites steps stated above:

  1. Connect to Bluemix with the command line tool.

    bx api https://api.eu-gb.bluemix.net
    bx login -u <your user ID>
  2. Create the Personality Insights service in Bluemix (if you have a trial account, replace tiered with lite)

    bx service create personality_insights tiered my-personality-insights-service
  3. Create and retrieve service keys to access your instance of the Personality Insights service:

    bx service key-create my-personality-insights-service myKey
    bx service key-show my-personality-insights-service myKey
  4. The project needs to be configured to work with your instances of the Watson Personality Insights Services. You will have previously renamed .env.template to .env. Fill in .env with your service instance information. The .env file will look something like the following:

    DISCOVERY_USERNAME=<username>
    DISCOVERY_PASSWORD=<password>
    DISCOVERY_ENVIRONMENT_ID=<environment_id>
    DISCOVERY_COLLECTION_ID=<collection_id>
    DISCOVERY_CONFIGURATION_ID=<configuration_id>
    DISCOVERY_VERSION=2016-11-07
    PERSONALITY_URL=https://gateway.watsonplatform.net/personality-insights/api/v3/profile
    PERSONALITY_USERNAME=<personality-insights-serivce-username>
    PERSONALITY_PASSWORD=<personality-insights-serivce-password>
    PERSONALITY_VERSION=2016-10-20
    

Get more help Getting started with the Personality Insights API

Use the -p flag with the -q flag to analyse the personality of a particular person. The following query will retrieve quotes from the Watson Discovery New dataset about the tennis player Roger Federer, before sending them to your instance of the Personality Insights service:

./analysis.sh -n Federer -d results -q -p

The analysis of this query have been output as comma separated values to results/Federer.csv.

"name","openness","emotionalRange","conscientiousness","agreeableness","extraversion"
"federer",0.31142288181635164,0.755908433280148,0.8428408846691722,0.010573124252825084,0.0022307444673070886

The application has analysed the personality of the quotes found about Federer using the Personality Insights service and provided values for the Big Five personality characteristics. The percentile returned for each characteristic reports the Federer's normalized score for that characteristic; the Personality Insights service computes the percentile by comparing the author's results with the results from a sample population.

To output the quotes returned from Watson Discovery Service to the console, uncomment console.dir(data); on Line 22 in analysis-quotes.js.

Using Vagrant to run this application

A vagrant file in this project creates a Virtual Machine configured to run this project.

Prerequisites

  1. Vagrant
  2. Instances of Watson Services running on Bluemix.

The project needs to be configured to work with your instances of the Watson Services. Rename .env.template to .env and edit the properties in the file to point at your Watson service instances.

vagrant up
ssh vagrant
cd /vagrant
npm install
./analysis.sh -h

discovery-service-with-personality-insights's People

Contributors

rosielickorish avatar shawdm avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.