Code Monkey home page Code Monkey logo

phergie-irc-plugin-react-url's Introduction

phergie-irc-plugin-react-url's People

Contributors

clue avatar elazar avatar matthewtrask avatar meroje avatar pschwisow avatar scrutinizer-auto-fixer avatar sitedyno avatar svpernova09 avatar wyrihaximus avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

phergie-irc-plugin-react-url's Issues

Feature Request: Ignore some URLs

When third-party bots co-exist in the same channel as Phergie, the URL plugin can contribute to the noise-to-signal ratio.

travis-ci
12:58 phergie/plugin-dns#14 (version-2-updates - 672b7b9 : Joe Ferguson): The build passed.
12:58 Change view : https://github.com/phergie/plugin-dns/compare/a8c7322d1833...672b7b9d0af8
12:58 Build details : https://travis-ci.org/phergie/plugin-dns/builds/99308672
12:58 travis-ci left the room.
Phergie
12:58 [http://gsc.io/u/11] Travis CI - Test and Deploy Your Code with Confidence
12:58 [http://gsc.io/u/10] Comparing a8c7322d1833...672b7b9d0af8 · phergie/plugin-dns · GitHub

It would be nice if it was possible to ignore some URLs based on specific components (e.g. hostname). This could be done by implementing a FilterInterface with a single method that accepts a URL and returns a boolean value indicating whether that URL should be handled or not. To maintain BC, a default that returns true for all URLs could be implemented. This would allow for maximum flexibility and customization by the end-user.

301 and 302 responses not being handled

Looks like 301 Moved Permanently and 302 Found responses are not being handled.

From #phpc on Freenode:

3:48 terratoma: http://www.rawstory.com/2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriages
3:48 Phergie: [http://gsc.io/u/57]

Note the lack of a title in the output.

When I hit the URL as it's given above, I get this:

04:09:24 ~ $ curl -v "http://www.rawstory.com/2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriage"
* Adding handle: conn: 0x7f91b380aa00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7f91b380aa00) send_pipe: 1, recv_pipe: 0
* About to connect() to www.rawstory.com port 80 (#0)
*   Trying 104.239.182.155...
* Connected to www.rawstory.com (104.239.182.155) port 80 (#0)
> GET /2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriage HTTP/1.1
> User-Agent: curl/7.30.0
> Host: www.rawstory.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
* Server nginx is not blacklisted
< Server: nginx
< Content-Type: text/html
< Date: Fri, 01 May 2015 21:10:42 GMT
< Keep-Alive: timeout=20
< Location: http://www.rawstory.com/2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriage/
< X-Type: default
< Connection: keep-alive
< Set-Cookie: X-Mapping-fjhppofk=8D623D1BB3EE0A25628811CAA06CFFB8; path=/
< Content-Length: 178
<
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host www.rawstory.com left intact

But if I append a /, to make the URL consistent with what's shown in the Location response header above, I get a 200 OK response and the body contains a title:

04:10:43 ~ $ curl -v "http://www.rawstory.com/2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriages/" 2>&1 | grep '<title>'
<title>  Texas GOP lawmaker: &#8216;What is going on in Baltimore&#8217; is because of too many gay marriages</title>

(Originally opened by @elazar on a different repo, moving issue to this rep)

Parser handles XXX scene dirnames as URLs

18:22:35 <%someone> GirlsDoPorn.E157.21.Years.Old.XXX.720p.WMV-KTR
18:22:35 <+Phergie> [ http://GirlsDoPorn.E157.21.Years.Old.XXX/ ]  

Yes, I'm in weird channels. My guess is that Twitter_Extractor looks for the .xxx TLD, not sure how to change this without altering Twitter_Extractor.

(Originally opened by @hashworks on a different repo, moving issue to this rep)

Try HEAD vs GET requests first

GET requests can fetch large resources that consume lots of bandwidth unnecessarily.

Example:

2015-06-12 15:05:56 DEBUG [Url][557b3ba416813]Found url: http://upload.wikimedia.org/wikipedia/commons/4/4f/Funny_Car_AAA.JPG []
...snip...
2015-06-12 15:05:56 DEBUG [Url][557b3ba416813]Download complete (after 0.18480515480042s):

Implement a strategy whereby HEAD requests are tried first, so that the resource body isn't downloaded since a lot of desired information is generally in the response headers anyway.

There are instances where this is not the case. For example, if chunked transfer encoding is used, the size of the resource won't be available. While most responses should include a content type header, it's possible they may not. Finally, some resources won't support HEAD (in which case a 405 response should be returned). So, use GET as a fallback in cases where desired information isn't available in HEAD responses.

(Originally opened by @elazar on a different repo, moving issue to this rep)

GuzzleHttp\Exception\RequestException in resolveCallback throwing fatal error

From the PHP log:

PHP Catchable fatal error:  Argument 1 passed to Phergie\Irc\Plugin\React\Url\Plugin::Phergie\Irc\Plugin\React\Url\{closure}() must be an instance of GuzzleHttp\Message\Response, instance of GuzzleHttp\Exception\RequestException given, called in /home/phergie/phergie-freenode/vendor/phergie/phergie-irc-plugin-http/src/Request.php on line 98 and defined in /home/phergie/phergie-freenode/vendor/phergie/phergie-irc-plugin-react-url/src/Plugin.php on line 180

Current tagged version of both plugins (http and url) on version 2 of the bot.

Url that caused the issue: http://www.senzati.com/jet-sprinter/

Some entities aren't decoded in titles from HTML responses

elazar 10:22 Speaking of, if anyone wants to take on some low-hanging fruit: https://github.com/phergie/phergie-irc-plugin-react-twitter/issues/7
Phergie 10:22 [http://gsc.io/u/35] Entities aren&#39;t decoded · Issue #7 · phergie/phergie-irc-plugin-react-twitter · GitHub

Seems like they should be, but as the output above indicates, they aren't. :(

Related: phergie/phergie-irc-plugin-react-twitter#7

(Originally opened by @elazar on a different repo, moving issue to this rep)

http -> https redirect results in endless loop

When posting a domain with a http -> https 301 redirect the plugin falls into an endless loop:

2015-02-07 22:15:40 DEBUG [email protected] :hashworks!~hashworks@0Quoten3Ossi PRIVMSG moonbasetest :hashworks.net []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Found url: hashworks.net []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Corrected url: http://hashworks.net/ []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Emitting: http.request []
2015-02-07 22:15:40 DEBUG [Http]Creating new HttpClient []
2015-02-07 22:15:40 DEBUG [Http]Requesting DNS Resolver []
2015-02-07 22:15:40 DEBUG [Dns]dns.resolver called []
2015-02-07 22:15:40 DEBUG [Dns]Creating new Resolver []
2015-02-07 22:15:40 DEBUG [Http]DNS Resolver received []
2015-02-07 22:15:40 DEBUG [Http]Requesting DNS Resolver []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Sending request []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Emitting: url.host.all []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Writing body []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Response received []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Reponse (after 0.12431311607361s): 301 []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Data received []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Request done []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Download complete (after 0.12470602989197s): 0 in length length []
2015-02-07 22:15:41 DEBUG [email protected] PRIVMSG moonbasetest :[ http://hashworks.net/ ] []
2015-02-07 22:15:41 DEBUG [email protected] :[email protected] PRIVMSG moonbasetest :[ http://hashworks.net/ ] []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Found url: http://hashworks.net/ []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Emitting: http.request []
2015-02-07 22:15:41 DEBUG [Http]Existing HttpClient found, using it []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Sending request []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Emitting: url.host.all []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Writing body []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Response received []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Reponse (after 0.11247301101685s): 301 []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Data received []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Request done []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Download complete (after 0.11286997795105s): 0 in length length []
<REPEAT>

This is because it's messaging itself over and over again prefixed with the http protocol.

(Originally opened by @hashworks on a different repo, moving issue to this rep)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.