Comments (47)
@jarun Your TODO list. We could link it from README if you like it. Anything I'm missing at the moment?
from googler.
Linking from readme is very much necessary. This covers for the time... now that we already have a deb package. I'll take care of the doc stuff.
from googler.
Right, deb package... It's a WIP, so it should still be listed. Added to the list with a link to the PR.
from googler.
I think we could safely cross out Windows installation, because I just tried and it doesn't even work...
For one, apparently fcntl
and termios
shouldn't work on non-Unix or Unix-like systems, and we rely on those to get terminal size. On Python 3.3+ there's os.get_terminal_size
(which does work on Windows โ I've written a cross-platform progress bar module based on that), but we need to support 2.7, so tough luck.
Then, readline
is not available. Shouldn't be too surprising either. Goodbye line editing.
At that point I shutdown my VM out of frustration, so I don't know if there are other problems...
Seeing that no one ever reported, it's safe to assume that we have no Windows users at all.
from googler.
I updated the readme accordingly. :)
If someone really wants it, he/she can bypass these small problems. We didn't have readline
at some point and I loved googler
nonetheless. It saved me a lot of time even then. ;)
Many thanks for pointing it out. We don't wanna misguide our users. If you do have a licensed
win VM can you try out buku
as well? It uses readline, but anything else?
from googler.
Will try. I do have licenses for all Windows releases since XP... But do you realize Microsoft offer official VM images that don't require a license and are typically good for 60 days (or maybe 30) from initial boot? https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/linux/. (Looks like they have taken down XP images, but I have archived XP download URLs too, here: https://gist.github.com/zmwangx/e728c56f428bc703c6f6)
from googler.
Okay, so I tried googler on Windows again without fcntl
, termios
and readline
.
- Neither cmd nor PowerShell supports ANSI escape sequences, so colors don't work, you get raw sequences with
^[
displayed as a boxed question mark. - Neither cmd nor PowerShell supports Unicode (what year is it, again?), at least not by default, so even with
-C
you fail most of the time, because U+2013 en-dash and U+2014 em-dash are everywhere.
from googler.
Interestingly, I just noticed
googler -n 3 google
1 Google
https://www.google.com/
Search the world's information, including webpages, images, videos and more. Google has many special features to help you find
exactly what you're looking ...
2 Google (@google) | Twitter
https://twitter.com/google?ref_src=twsrc^google|twcamp^serp|twgr^author
That https://twitter.com/google?ref_src=twsrc^google|twcamp^serp|twgr^author
is very weird...
The link in source is https://twitter.com/google?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor
. We probably should not percent-decode, since the decoded string isn't a valid URL?
from googler.
As for Buku, on a fairly vanilla Windows 10 install, there's no HOME
environment variable.
from googler.
Neither cmd nor PowerShell supports Unicode
I donno what to say.
We probably should not percent-decode, since the decoded string isn't a valid URL?
Should be fine. But we need to check whether it is valid or not too.
As for Buku, on a fairly vanilla Windows 10 install, there's no HOME environment variable.
True, and when it's not available it should create the DB file in the same dir.
from googler.
We probably should not percent-decode, since the decoded string isn't a valid URL?
Should be fine.
Sorry, you mean "percent-decode should be fine", or "not percent-decode should be fine"?
I was wrong in saying https://twitter.com/google?ref_src=twsrc^google|twcamp^serp|twgr^author
is not a valid URL. It actually is, because I just checked RFC 3986 again, and ^
is not reserved. Which makes it even more problematic: what we're printing is an entirely different URL that doesn't work (try it, you'll get HTTP 400).
True, and when it's not available it should create the DB file in the same dir.
When it's not available, you get an exception when you try to os.path.join(os.getenv('HOME'), ...)
, because os.getenv
returns None. A reliable solution might be os.path.expanduser('~')
, but I don't think it's worth it to adapt just for Windows.
from googler.
Which makes it even more problematic: what we're printing is an entirely different URL that doesn't work (try it, you'll get HTTP 400).
Yes, I checked it the first time. It fails. I meant that we need to fix it. Show it as it comes to us.
When it's not available, you get an exception
I will check it out. Seems like I need to download a Windows image ;). Thanks for the VM image links.
from googler.
Yes, I checked it the first time. It fails. I meant that we need to fix it.
The fix is trivial enough:
diff --git a/googler b/googler
index f7e0e5c..ba6f387 100755
--- a/googler
+++ b/googler
@@ -39,7 +39,6 @@ if sys.version_info > (3,):
from urllib.parse import (
urljoin,
quote_plus as url_quote_plus,
- unquote as url_unquote,
)
from http.client import HTTPSConnection
@@ -50,7 +49,6 @@ else:
import HTMLParser
from urllib import (
quote_plus as url_quote_plus,
- unquote as url_unquote,
)
from urlparse import urljoin
from httplib import HTTPSConnection
@@ -159,8 +157,7 @@ class GoogleParser(HTMLParser.HTMLParser):
if self.url != "":
if self.url.find("://", 0, 12) >= 0:
index = len(self.results) + 1
- self.results.append(Result(index, self.title,
- url_unquote(self.url),
+ self.results.append(Result(index, self.title, self.url,
self.text))
else:
skipped += 1
Basically, just don't unquote.
However, I'm not sure if it will have side effects. unquote
was introduced in ff58e20, but the commit message is very brief and I'm not quite sure what problem it fixes. Double quote is not a reserved character (again per RFC 3986), and webbrowser.open('https://example.com/"')
works just fine. Can you give an example of not unquoting leading to problems?
Interestingly enough, although https://twitter.com/google?ref_src=twsrc^google|twcamp^serp|twgr^author
is a valid URL by itself, webbrowser.open
does something smartass to encode it correctly (or wrongly, I would say, and happen to land on the expected page). No luck when I use the same URL in Chrome address bar.
from googler.
However, I'm not sure if it will have side effects.
Yes, I tried the same just now. Works. The original bug was:
$ ./googler -n1 hello world
1 "Hello, World!" program - Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/%22Hello,_World!%22_program
A "Hello, World!" program is a computer program that outputs "Hello, World!" on a display device, often standard output. Being a very simple program in most ...
Note the %22 for ". I am all ears for your opinion here.
or wrongly, I would say, and happen to land on the expected page
Doesn't work for me when I try to open result 2.
from googler.
https://en.wikipedia.org/wiki/%22Hello,_World!%22_program
That URL works for me... What's the problem?
Doesn't work for me when I try to open result 2.
Then it's OS X doing the smartass thing. webbrowser.open
is an AppleScript wrapper on OS X.
from googler.
That URL works for me... What's the problem?
Trying to be perfect if possible ;). I'll add https://github.com/jarun/Buku/blob/master/buku#L796.
from googler.
Trying to be perfect if possible ;)
I would say a working implementation trumps a pretty but broken one...
I'll add https://github.com/jarun/Buku/blob/master/buku#L796.
No strong objection.
By the way, I'm be out for a hour or two. Won't be able to reply until I get back.
from googler.
No strong objection.
Wait, no. On second thought "
isn't equivalent to %22
per RFC 3986 (correct me if I'm wrong). It works with Wikipedia, but it doesn't necessarily work everywhere. I don't think it's the right thing to do.
(I'll try to write a proof-of-concept web app that handle "
and %22
differently when I get back.)
from googler.
No strong objection.
OK then.
By the way, I'm be out for a hour or two. Won't be able to reply until I get back.
Enjoy your day!
from googler.
Wait, no. On second thought...
Sorry, I pushed it before seeing this. Feel free to check out a better way of handling this.
from googler.
Back to RFCs.
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
The percent-encoding mechanism (Section 2.1) is a frequent source of
variance among otherwise identical URIs. In addition to the case
normalization issue noted above, some URI producers percent-encode
octets that do not require percent-encoding, resulting in URIs that
are equivalent to their non-encoded counterparts. These URIs should
be normalized by decoding any percent-encoded octet that corresponds
to an unreserved character, as described in Section 2.3.
IRIs are defined similarly to URIs in [RFC3986], but the class of
unreserved characters is extended by adding the characters of the UCS
(Universal Character Set, [ISO10646]) beyond U+007F, subject to the
limitations given in the syntax rules below and in section 6.1.
Therefore, U+0022 Quotation Mark isn't even an allowed character in either URI or IRI. Which is very obvious because URI/IRIs should be embeddable in HTML, and HTML attributes are wrapped in double quotes. Many (if not most) modern browsers are smart enough to automatically quote the quotation mark when it appears in an actually-invalid URI, but there's no guarantee that this will work in all browsers. Erring on the safe side, I would not do this. (Not to mention "
is totally random just one reasonable character to replace; other people might want to have other characters decoded.)
from googler.
To elaborate a bit on my last point: if you support "
, then it's a perfectly reasonable request to also support %3C
(<
) and %3E
(>
), both of which are not valid URI characters, again obviously for interoperability with HTML. And maybe other characters too.
from googler.
OK OK. Consider it gone. ๐
from googler.
BTW, if you have a collection of soothing traditional Cantonese music (lyrics-less is what I'm looking for), do share.
from googler.
BTW, if you have a collection of soothing traditional Cantonese music (lyrics-less is what I'm looking for), do share.
Missed that... Unfortunately I don't ๐ My early training in music leans on the (Western) classical side, and I mostly listen to Chinese/South Korean pop music these days; either case, no lyrics-less traditional Cantonese music.
from googler.
I would like to add support for sitelinks.
I'll implement this sooner or later.
from googler.
Awesome! Please add to the list.
from googler.
See chatroom for some questions.
from googler.
I think googler has accumulated enough complexity (1270 lines, close to 1000 if you take out comments) to the point that changes to one part of the program may break another part subtly, and since our test script only tests the core functionality, and worse, only watches for obvious failures, we risk introducing regressions. e159a44 is an example, although that's an embarrassingly simple one easily caught by static analysis.
Therefore, I'm thinking about unit tests. But in order to write unit tests, we first need to make googler importable. Which means wrapping up bare code into functional units and have a main
that is only run when __name__ == '__main__'
. We also need to reduce the reliance on globals, which is easy in some cases, e.g., GoogleParser
which is the linchpin of googler only uses news
, which could easily be an init parameter; and slightly harder in other cases, but still very doable. (Also, reducing reliance doesn't necessarily mean absolutely no globals โ debug
can certainly be a global, and so do colorize
and such that doesn't make much of a difference for testing purposes.)
Once we have the code contained, we can stop relying on Google leniently allowing us a few hundred queries. We can easily build up a couple thousand or more responses to a wide range of queries over a day or two, then do whatever we want with those queries. (And we can update the response repertoire once in a while; the test script can also do a few realtime queries to make sure there's no breaking change on Google's part.) The interactive parts are certainly somewhat harder to test, but I'm sure there are ways to stub things out and test them given a little bit more thought.
This will be a pretty significant undertaking, and I don't think either of us will have time to do this soon, but just want to put this idea out for scrutiny.
from googler.
It will be a nice improvement but we can't do this ourselves. We should add this in ToDo. Please link to your comment above.
from googler.
I don't feel too strongly about this, but here's an idea: since the colors chosen by us don't always look nice in all color schemes (honestly it doesn't even look so good with my slightly localized Solarized Dark:
), we should offer a way to customize it. An option, --colors
, and an env var to make up for the lack of config file (one executable file + no config file is great and we're not gonna break that, but reading from env should be okay). As for the actual format, I think we can take a page from either GNU LS_COLORS
(ls
/dircolors
) or BSD LSCOLORS
. I would prefer BSD because it's shorter and more straightforward, but dircolors supports 256 colors, and it may be more familiar to some or even more people.
from googler.
By the way, isn't that a cute screenshot? ๐
from googler.
Your hold on generating beautiful images/videos is unparalleled. BTW, are you into photography?
from googler.
BTW, are you into photography?
Not at all...
from googler.
Try it
from googler.
As a stay-at-home type of person and selfie hater, the main channels are closed. I do occasionally take a shot when I see something beautiful though.
More on topic, what do you say about colors? Please reply at your leisure.
from googler.
Do you mean colour presets? That would be a valuable addition. But if we want users to fiddle around with it (custom colours), we will be concentrating more on colours than other features.
A set of defined presets would be great.
from googler.
BTW, we need a new asciinema with the new prompt. Please add the prompt help as well. Your latest change makes it way more organized.
from googler.
I am planning a new release next weekend. Please let me know if your are fine. Can we pull-in preset colours by that time?
from googler.
Do you mean colour presets?
No, because implementing color presets is actually more work for us. With BSD-style LSCOLORS
, the user only needs to supply a five-letter string (which is inherently a five-element list), representing:
- Index color;
- Title color;
- URL color;
- Metadata/abstract color;
- Prompt color;
and that's all. Our default is also a five-letter string. Then we use a tiny color map (BSD has 16 colors + default, we should add reverse video too, so 18), and bam, done.
In order to have presets, you basically need to do all of the above, AND you need to be a good designer, AND even then you can't make everyone happy. (I know one one has complained thus far, just like I didn't, but maybe it's because it's too small an issue; but it's always nice to have the customizability there.) I'm not a designer, although I am somewhat into visual design, so there's it.
from googler.
we need a new asciinema with the new prompt.
I'll do that prior to the release.
I am planning a new release next weekend.
No problem.
from googler.
because color presets is actually more work for us
I get it now. The five-letter string
makes sense. I'm good.
from googler.
Added to the top, will do it when I have time, probably during the weekend or even before that when I don't feel like getting other work done...
from googler.
No hurry :)
from googler.
By the way, what do you think about rolling the todo list thread? With
> document.getElementsByClassName('timeline-comment-wrapper').length
45
comments and
> document.body.offsetHeight
13054
vertical pixels, this one is getting kind of a pain to scroll. Can we start a new thread (copy over top post while getting rid of archived items) once in a while?
from googler.
sure!
from googler.
Roll. New thread at #83.
from googler.
Related Issues (20)
- Understanding packaging HOT 1
- ๆ ๆณไฝฟ็จ HOT 1
- Googler doesn't use default browser HOT 1
- New Design for Google News HOT 9
- reCAPTCHA-solving to use the service HOT 2
- Can't see sitelinks? HOT 1
- DeprecationWarning: ssl.PROTOCOL_TLS is deprecated HOT 2
- How do I pipe the output? > HOT 1
- Search by filetype HOT 2
- How do I exclude specific keywords? HOT 1
- ssl.PROTOCOL_TLS is deprecated warning HOT 4
- No Results. HOT 4
- Video-specific search not working
- exception TrackedTextwrap: the impossible happened at offset nn of text "" HOT 2
- how we can include ads HOT 1
- Recent error message HOT 1
- googler skipping first result for some searchs HOT 3
- No results for domain search (when the site: keyword/-d switch in use) HOT 1
- Results site youtube.com (results order)
- Showing none or just a single result
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from googler.