o19s / relevant-search-book Goto Github PK

Code and Examples for Relevant Search

Python 2.02% OpenEdge ABL 45.72% Jupyter Notebook 52.26%

relevant-search elasticsearch solr relevance ipython-notebook

relevant-search-book's Introduction

Relevant Search

Code and Examples for Relevant Search by Doug Turnbull and John Berryman. Published by Manning Publications.

Relevant Search is all about leveraging Solr and Elasticsearch to build more intelligent search applications with intuitive results!

How to run

Install Python

Examples for this book are written in Python 2.7 and use iPython notebook. The first thing you'll need to do is install Python, pip (the Python package installer).

Install Python for your platform here. For Windows we recommend the ActivePython distribution.
Install pip, the Python installer, by simply running easy_install pip

Install Elasticsearch

The examples expect Elasticsearch to be hosted at localhost:9200. So you'll need to install Elasticsearch to work with the examples. There's two ways to install Elasticsearch

Recommended: Vagrant

Vagrant is a tool for installing and provisioning virtual machines locally for development purposes. If you've never used vagrant, you can follow the installation instructions here. OpenSource Connections maintains a basic Elasticsearch vagrant box here.

To use the vagrant box

Install vagrant

Clone the Elasticsearch vagrant box from Github locally

git clone [email protected]:o19s/elasticsearch-vagrant.git

Provision the Vagrant box (this install Elasticsearch and turns the box on)
```
cd elasticsearch-vagrant
vagrant up --provision
```
Confirm Elasticsearch is running

curl -XGET http://localhost:9200

or visit this URL in your browser.

You should see JSON returned from the Elasticsearch instance. Something like:

   {
     "name" : "Mary Zero",
     "cluster_name" : "elasticsearch",
     "version" : {
       "number" : "2.0.0-rc1",
       "build_hash" : "4757962b01a4d837af282f90df9e1fbdb68b524e",
       "build_timestamp" : "2015-10-01T10:06:08Z",
       "build_snapshot" : false,
       "lucene_version" : "5.2.1"
     },
     "tagline" : "You Know, for Search"
   }

When you're done working with examples, turn off the Vagrant box

vagrant halt

Locally on Your Machine

Follow Elasticsearch's instructions to install Elasticsearch on your machine.

Running The Python Examples

The examples are written in Python 2.7 in ipython notebooks depending only on a few basic libraries. The only external library needed is the requests HTTP library. Some of the external APIs require API keys (for example TMDB, you can obtain one here).

To run the IPython Notebook Examples

First ensure you have git, python 2.7 and pip installed and in your PATH
Then use the following commands to install the required dependencies

git clone [email protected]:o19s/relevant-search-book.git
cd relevant-search-book
pip install requests
pip install jupyter
cd ipython/

Launch!

ipython notebook

Play!

Switch to your default browser where the Ipython examples are ready for you to experiment with. Keep in mind many examples are order dependent, so you can't just jump to an interesting listing and run it. Indexing commands with certain settings and what not need to be run. Be sure to run the prior ipython notebook commands too!

Happy Searching!

relevant-search-book's People

Contributors

Stargazers

Watchers

Forkers

gnanasundar ajohi golii xingzhixi andr3ic manisnesan sohojoe stefanozanin durong anukat2015 oldmonk101 pseemakurthi syedfa mageru bigdatasci up1 han928 dtsukiyama qicst23 mylearning2017 sridhar-newsdistill philippemejane jpmantuano weizhili-relfektion cooler122 shadowridgedev wahaha2001 vigneshprajapati eagledangar airob jomaminoza eric1992 dnvyadav codeaudit elasticsearch8 simonqiang smitsgit oscarzhao shubhampachori12110095 kmohanrao abhik1368 yingwang-clare villasv mnguyenngo rainzha geraltikus ansuaggarwal zrp1989 malongge ivanyschen zhoudaifa007 bluescharp herobigdata heroonline brusic persevere1 yyi dmitryanton68 maehue halkypi toanalien d0rodge xuliang102663 kangchangki kewsky xormazabal rockybean davidzof sudhu26 balatatree allensmile nicolanx dvoineu jasperzxy sowmya-debug gnvramanarao rk19016 satchit marcinczeczko zldesu dolugen wh416 classpert wtmmac kinomant mcalavera81 yuliy maxwellhouse34 risdenk t7y amitborkar crafter76 jasonzyx pgh79 tpnguyen allizad lukematic macohen der-ofenmeister jchenga

relevant-search-book's Issues

Error setting up Ipython NB on Mac OSX Yosemite

From Valentin

On my iMac (Yosemite), I went through the explanations on the github wiki, and when starting ipython I got an error "No module named notebook.notebookapp
". Goggled and landed on SO at http://stackoverflow.com/questions/31397421/ipython-server-cant-launch-no-module-named-notebook-notebookapp and then ran "pip install jupyter" as advised. I then got " No module named functools32". I could finally start ipython after also running "pip install functools32".

Typo in Chapter 8 notebook

In the Search-As-You-Type, Completion, and Suggestion (Elasticsearch).ipynb notebook, the two following typos (file name + path) needs to be corrected:

movies=pickle.load(open("../../movies_list.p","rb"))

should read

movies=pickle.load(open("../movies.p","rb"))

No python notebook for chapter 4

Any plan to upgrade to python 3.x?

I have made a few changes myself as required, but would be nice to have 3.x supported too.

Thanks,

Typo in Chapter 8 notebook about suggestions

In Search-As-You-Type, Completion, and Suggestion (Elasticsearch).ipynb, in the "Post-Search Suggest" section, suggest_body contains a small typo:

suggest_body = { 
    "title_completion": {       <--- should be named title_suggestion
        "text": "star trec",
        "phrase": {
            "field": "suggestion"}}}

Permission denied (publickey)

Getting permission denied message when cloning relevant-search-book.git. How can this be resolved?
See detailed message below:

$ git clone [email protected]:o19s/relevant-search-book.git
Cloning into 'relevant-search-book'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Error in getting analyzer code running

Tried using the code for analyzer as given in chapter 3 codes and got the following error. I am unable to understand whats wrong. ES version is 6.5. Any ideas on how to solve this will be appreciated. Thanks!

error:
root_cause:

type: "illegal_argument_exception"
reason: "Failed to parse request body"
type: "illegal_argument_exception"
reason: "Failed to parse request body"
caused_by:
type: "json_parse_exception"
reason: "Unrecognized token 'Fire': was expecting ('true', 'false' or 'null')\n
\ at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@1d20d049;
\ line: 1, column: 6]"
status: 400

Missing import statement for requests module

In the code for chapter 3, I would add an import requests statement in the first boilerplate setup snippet. Otherwise the snippet for indexing data won't run and complain about the missing requests module.

TMDB ssl errors

Running the code in Appendix A gives errors about trusting SSL on my Ubuntu 14.04 machine.

If I run the following "hello world" script:

import requests
import os

# you'll need to have an API key for TMDB
# to run these examples,
# run export TMDB_API_KEY=<YourAPIKey>
tmdb_api_key = os.environ["TMDB_API_KEY"]
tmdb_api = requests.Session()
tmdb_api.params={'api_key': tmdb_api_key}

httpResp = tmdb_api.get('https://api.themoviedb.org/3/movie/top_rated')

I recieve the error

(venv)doug@76$~/ws/relevant-search-book/ipython(ma) $ python tmdb_hello_world.py 
/home/doug/workspace/relevant-search-book/ipython/venv/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:79: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
Traceback (most recent call last):
  File "tmdb_hello_world.py", line 11, in <module>
    httpResp = tmdb_api.get('https://api.themoviedb.org/3/movie/top_rated')
  File "/home/doug/workspace/relevant-search-book/ipython/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 476, in get
    return self.request('GET', url, **kwargs)
  File "/home/doug/workspace/relevant-search-book/ipython/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/doug/workspace/relevant-search-book/ipython/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/home/doug/workspace/relevant-search-book/ipython/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 431, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 1] _ssl.c:510: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Typo in Listing 6.2.1

In listing 6.2.1, the index names int he comments are wrong:

# PUT albinoelphant/docs/1
# { "title":"albino", "body": "elephant"}
# PUT albinoelphant/docs/1
# { "title":"elephant", "body": "elephant"}

should be

# PUT albinoelephant/docs/1
              ^
# { "title":"albino", "body": "elephant"}
# PUT albinoelephant/docs/1
              ^
# { "title":"elephant", "body": "elephant"}

Modify readme including a reference to installing jupyter

Please consider upgrade the readme,

Running the sample following the readme I got the error:

ImportError: No module named notebook.notebookapp

I needed to

pip install jupyter

to being able to run the sample.

HTTP 429, too many requests

When looping through the top rated books I get a HTTP 429 error, i.e. I performed too many requests in a given time. I was able to work around that by specifying a timeout every tenth requests for a couple of seconds.

So basically I added

import time

and then

for page in range(1, numPages + 1):
    if page % 10 == 0:
        time.sleep(3)  # Sleep for 3 seconds every tenth request
    httpResp = tmdb_api.get('https://api.themoviedb.org/3/movie/top_rated', params={'page': page})  #(1)

But I am not sure if thats the best way to do it.

tie_breaker missing in combined section

In Chapter 9 notebook, the tie_breaker: 0.3 parameter is missing in the Combined section (content sub query).

Error when opening notebook for chapter 7

When bringing up the notebook for chapter 7, the following error popped up on screen

Notebook Validation failed: u'*' is not of type u'integer', u'null':
"*"

Can't connect to Elasticsearch on port 9200 - get "connection refused"

Hello, I followed the setup instructions on Readme, installing Elasticsearch on vagrant and using VirtualBox (all on a MacBook). When going through the setup instructions, I did the following (from command line):

vagrant up --provision

[I can see the VirtualBox virtual machine start up]

curl -XGET http://localhost:9200

I get:
curl: (7) Failed to connect to localhost port 9200: Connection refused

This used to work for me (as well as going to localhost:9200 in Chrome).

Any tips for troubleshooting this error? Thanks

Steps to run ipython notebook or python scripts for setup of TMDB

Hi. I've just received the book and am working through examples. I'm comfortable with elasticsearch, but have never used python. I've followed the readme for setting up ipython notebook (now jupyter), but can't find out how to run the cells. I've also tried to use the python directly as scripts, but that is also elusive at the moment. Since time is of the essence, might you have a set of steps to run and verify the setup of the tmdb data? I tried to use _bulk directly in elastic, but the format isn't compatible. Any help greatly appreciated.