Code Monkey home page Code Monkey logo

parlai_searchengine's People

Contributors

julesgm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

parlai_searchengine's Issues

breaks after some queries

when using this together with parlai interactive, after 2-4 conversation turns the following error appears.

requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000027318DBB388>: Failed to establish a new connection: [WinError 10049]

the full trace:

Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\connection.py", line 175, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
    raise err
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    sock.connect(sa)
OSError: [WinError 10049] La dirección solicitada no es válida en este contexto

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\connectionpool.py", line 710, in urlopen
    chunked=chunked,
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\http\client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\http\client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\http\client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\http\client.py", line 1036, in _send_output
    self.send(msg)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\http\client.py", line 976, in send
    self.connect()
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\connection.py", line 205, in connect
    conn = self._new_conn()
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\connection.py", line 187, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x0000027318DBB388>: Failed to establish a new connection: [WinError 10049] La dirección solicitada no es válida en este contexto

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\requests\adapters.py", line 450, in send
    timeout=timeout
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\connectionpool.py", line 786, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\urllib3\util\retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000027318DBB388>: Failed to establish a new connection: [WinError 10049] La dirección solicitada no es válida en este contexto'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\Scripts\parlai.exe\__main__.py", line 7, in <module>
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\__main__.py", line 14, in main
    superscript_main()
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\core\script.py", line 325, in superscript_main
    return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\core\script.py", line 108, in _run_from_parser_and_opt
    return script.run()
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\scripts\interactive.py", line 118, in run
    return interactive(self.opt)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\scripts\interactive.py", line 93, in interactive
    world.parley()
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\tasks\interactive\worlds.py", line 89, in parley
    acts[1] = agents[1].act()
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\core\torch_agent.py", line 2143, in act
    response = self.batch_act([self.observation])[0]
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\core\torch_agent.py", line 2239, in batch_act
    output = self.eval_step(batch)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\projects\blenderbot2\agents\blenderbot2.py", line 790, in eval_step
    output = super().eval_step(batch)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\rag.py", line 290, in eval_step
    output = super().eval_step(batch)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\core\torch_generator_agent.py", line 876, in eval_step
    batch, self.beam_size, maxlen, prefix_tokens=prefix_tokens
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\rag.py", line 673, in _generate
    gen_outs = self._rag_generate(batch, beam_size, max_ts, prefix_tokens)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\rag.py", line 713, in _rag_generate
    self, batch, beam_size, max_ts, prefix_tokens
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\core\torch_generator_agent.py", line 1094, in _generate
    encoder_states = model.encoder(*self._encoder_input(batch))
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\projects\blenderbot2\agents\modules.py", line 821, in encoder
    segments,
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\projects\blenderbot2\agents\modules.py", line 226, in encoder
    num_memory_decoder_vecs,
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\projects\blenderbot2\agents\modules.py", line 357, in retrieve_and_concat
    search_queries, query_vec, search_indices
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\projects\blenderbot2\agents\modules.py", line 519, in perform_search
    query_vec[search_indices]  # type: ignore
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\retrievers.py", line 411, in retrieve
    docs, scores = self.retrieve_and_score(query)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\retrievers.py", line 1192, in retrieve_and_score
    search_results_batach = self.search_client.retrieve(search_queries, self.n_docs)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\retrieve_api.py", line 132, in retrieve
    return [self._retrieve_single(q, num_ret) for q in queries]
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\retrieve_api.py", line 132, in <listcomp>
    return [self._retrieve_single(q, num_ret) for q in queries]
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\retrieve_api.py", line 111, in _retrieve_single
    search_server_resp = self._query_search_server(search_query, num_ret)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\parlai\agents\rag\retrieve_api.py", line 89, in _query_search_server
    server_response = requests.post(server, data=req)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\requests\api.py", line 117, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\requests\sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\requests\sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\parlaisearch\lib\site-packages\requests\adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000027318DBB388>: Failed to establish a new connection: [WinError 10049] La dirección solicitada no es válida en este contexto'))

@JulesGM @klshuster

Fixes

I still need lxml and chardet to run so it would be great if you could add it to the requirements

When I do the query:

curl -X POST "http://0.0.0:8090" -d "q=Wandavision&n=5"

it crashes the search_server.py and to fix it I added the line so if the title is non existant it does not crash

if output_dict["title"]:
    output_dict["title"] = output_dict["title"].replace("\n", "").replace("\r", "")

Note, this error also crashes blenderbot2 and you can reproduce it by typing the following sentence into blenderbot2 while using your search_server.py

I like the TV show Wandavision

Note, I have made and tested this fix for both of the cases above and both now work properly. Can you make this to your github?

PS: Thank you so much for writing this server! It's been a life saver.

How do I call your server from the MacOS terminal linux command line?

When trying to setup blenderbot2 with your search it does not seem to fetch the queries. On my Mac in the terminal window here's what I did:

$ git clone https://github.com/RodolfoFigueroa/ParlAI.git ./ParlAI
$ git clone https://github.com/pytorch/fairseq ./fairseq
$ pip install -r ./ParlAI/requirements.txt
$ cd ParlAI/
$ python ./setup.py develop
$ cd ../fairseq
$ pip install --editable ./
$ cd ..
$ git clone https://github.com/JulesGM/ParlAI_SearchEngine
$ pip install -r ./ParlAI_SearchEngine/requirements.txt
$ python ./ParlAI_SearchEngine/search_server.py serve --host 0.0.0.0:8080&
$ parlai interactive --model-file zoo:blenderbot2/blenderbot2_400M/model --search-server http://0.0.0.0:8080

10:39:39 | Current ParlAI commit: ad57e5281de76ed42a378bdd2bcdef2fafc54ab8
Enter [DONE] if you want to end the episode, [EXIT] to quit.
10:39:39 | creating task(s): interactive
Enter Your Message: Who went into space this week?
blenderbot2/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)

[BlenderBot2Fid]: I'm not sure, but I'm pretty sure it was Elon Musk. POTENTIALLY_UNSAFE_
Enter Your Message: who won the superbowl this year
[BlenderBot2Fid]: I don't think Elon Musk has ever been to space, but it would be cool if he did.
Enter Your Message: [BlenderBot2Fid]: That's a good question. I don't know who won the Super Bowl this year, though.
Enter Your Message:

and when I try to curl I get the following error:

$ curl -X GET "http://0.0.0.0:8080?q=baseball"
127.0.0.1 - - [21/Jul/2021 10:32:34] code 501, message Unsupported method ('GET')
127.0.0.1 - - [21/Jul/2021 10:32:34] "GET /?q=baseball HTTP/1.1" 501 -

<title>Error response</title>

Error response

Error code: 501

Message: Unsupported method ('GET').

Error code explanation: HTTPStatus.NOT_IMPLEMENTED - Server does not support this operation.

any help would be appreciated. Thank you.

Failed To Connect To Port

While chatting with blenderbot with the $HOST search_server it stopped taking inputs after a while and when I checked with the command

!curl -X POST $HOST -d "q=biggest gpt model&n=1"

it threw the error

curl: (7) Failed to connect to 0.0.0.0 port 1111: Connection refused

Using ParlAi

Just wondering is there anyway I could make a class to get a response to simply gall the other procedures with ParlAI to make it easier to use? Also is there a way to increase the number of websites and ares of text it parses so that we can expand the knowledge?

Adding custom files with information to the search engine

I've tried the search engine using your colab.ipynb and It worked perfect after several trials.
It seemed that colab works better at some hours, whereas it fails when resources demand is high.

I am still wondering in which way can I add to your ParlAI_SearchEngine a custom directory with text files so the relevant documents can be found also in that directory. This way one can add custom information to the web search.
Any idea?

ParlAI_SearchEngine license

Hi. Thanks for making this project public.

I want to fork this repository and add files, but the license is not specified. Could you please specify the license? Or is it set to standard copyright?

[Fix] Making port an integer

Thanks for making this repo!

I have been using it and noticed that I needed to make a small fix to make it work.

Need to convert port to integer otherwise it will throw an error.

def _parse_host(host: str) -> Tuple[str, str]:
    splitted = host.split(":")
    hostname = splitted[0]
    port = splitted[1] if len(splitted) > 1 else _DEFAULT_PORT
    port = int(port) # port is integer
    return hostname, port

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.