afilipovich / gglsbl Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 37.0 142 KB

Python client library for Google Safe Browsing API

License: Apache License 2.0

Python 100.00%

gglsbl's People

Contributors

Stargazers

Watchers

gglsbl's Issues

SafeBrowsingListTestCase Wrong

When I make a copy of the SafeBrowsingListTestCase, I find that this sample get something wrong. Could you please tell me the reason? Thank you. I run it on python3.10.6.
F

FAIL: test_canonicalize (main.SafeBrowsingListTestCase)

Traceback (most recent call last):
File "/home/fujiao/Workspace/code/test.py", line 177, in test_canonicalize
self.assertEqual(URL(nu).canonical, cu)
AssertionError: 'http://3279880203/blah' != 'http://195.127.0.11/blah'

http://3279880203/blah

http://195.127.0.11/blah

Ran 1 test in 0.001s

FAILED (failures=1)

Quota exceeded when using gglsbl-rest

Traceback (most recent call last):
File "/data1/antiphish/antiphish/gglrsbl/gglsbl-rest/app.py", line 29, in _lookup
return sbl.lookup_url(url)
File "/data1/antiphish/anaconda2/lib/python2.7/site-packages/gglsbl/client.py", line 131, in lookup_url
list_names = self._lookup_hashes(url_hashes)
File "/data1/antiphish/anaconda2/lib/python2.7/site-packages/gglsbl/client.py", line 180, in _lookup_hashes
self._sync_full_hashes(matching_prefixes.keys())
File "/data1/antiphish/anaconda2/lib/python2.7/site-packages/gglsbl/client.py", line 103, in _sync_full_hashes
fh_response = self.api_client.get_full_hashes(hash_prefixes, client_state)
File "/data1/antiphish/anaconda2/lib/python2.7/site-packages/gglsbl/protocol.py", line 43, in wrapper
r = func(*args, **kwargs)
File "/data1/antiphish/anaconda2/lib/python2.7/site-packages/gglsbl/protocol.py", line 150, in get_full_hashes
response = self.service.fullHashes().find(body=request_body).execute()
File "/data1/antiphish/anaconda2/lib/python2.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/data1/antiphish/anaconda2/lib/python2.7/site-packages/googleapiclient/http.py", line 842, in execute
raise HttpError(resp, content, uri=self.uri)
HttpError: <HttpError 429 when requesting https://safebrowsing.googleapis.com/v4/fullHashes:find?alt=json&key=xxx0mXbAXx5AXUTc5frwo5NOwsHZdSgwrvg returned "Quota exceeded for quota group 'UpdateAPIGroup' and limit 'CLIENT_PROJECT-1d' of service 'safebrowsing.googleapis.com' for consumer 'project_number:612173722054'.">

Adding custom threat lists to database

I am integrating gglsbl as a backend for checking of URLs that are sent in our customers' messaging. This library has been extremely helpful in doing so - props for that.

A question I have is whether anyone has attempted to integrate data from other sources into their database. For instance, we will be checking URLs against feeds such as OpenPhish, and industry-specific feeds we've gained access to.

It's not super obvious from the code whether the functions for converting a URL to an entry which can be added to the database are present, though it seems like it might be possible using the function to get the hashes of a URL, plus the functions to add threat entries to the database.

One area of confusion for me is how I might compute "threat prefixes". Is that just intended to be the first four hash values of a URL's hash list?

Site not detected

The site http://0-facebook.com/q reports as unsafe on the transparency report, but isn't blacklisted when I run gglsbl. I initialized a new DB just now & it still says safe.

Do you have a good explanation for this difference?

Specify version of google-api-python-client in setup.py

Require at least that the version be >= 1.5.4 to ensure these CVEs have been fixed: https://nvd.nist.gov/vuln/search/results?adv_search=false&form_type=basic&results_type=overview&search_type=all&query=google+app+engine+python+sdk

Backend Error

I'm running gglsbl under gglsbl-rest, but the traceback indicates an issue on gglsbl side. If my assumption is wrong I'll open this bug report again for gglsbl-rest.

At random I see the following error:

ERROR in app: exception handling [http://example.com/index.php]
Traceback (most recent call last):
File "/root/gglsbl-rest/app.py", line 28, in app_lookup
sbl = SafeBrowsingList(gsb_api_key, active['name'], True)
File "build/bdist.linux-x86_64/egg/gglsbl/client.py", line 31, in __init__
self.api_client = SafeBrowsingApiClient(api_key, discard_fair_use_policy=discard_fair_use_policy)
File "build/bdist.linux-x86_64/egg/gglsbl/protocol.py", line 68, in __init__
self.service = build('safebrowsing', 'v4', developerKey=developer_key)
File "build/bdist.linux-x86_64/egg/oauth2client/_helpers.py", line 133, in positional_wrapper
return wrapped(*args, **kwargs)
File "build/bdist.linux-x86_64/egg/googleapiclient/discovery.py", line 237, in build
raise e
HttpError: <HttpError 503 when requesting https://www.googleapis.com/discovery/v1/apis/safebrowsing/v4/rest returned "Backend Error">

Environment:

cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core)

python --version
Python 2.7.5

cat /usr/lib/python2.7/site-packages/easy-install.pth
<snip>
./gglsbl-v1.4.5-py2.7.egg
./google_api_python_client-1.6.3-py2.7.egg
./oauth2client-4.1.2-py2.7.egg
</snip>

Is this something that can be fixed or is it Google's fault?

Question on GSB removal

Hi, I may be asking it wrong place but this is the best place to ask about this since this library has been developed based on GSB API.
I'm trying to analyze GSB and its behavior. However, I don't understand why and what URLs GSB removed and many of them are reappearing. Also, there are almost 80% of them remain in the GSB when I do a full update.
This library is developed based on GSB API and this helped me understand GSB a lot better so I wish anyone here could help me answer my question would be thankful.

Malicious site is not detected

I was testing the implementation and while it works fine for most of the urls i checked there is one in particular that passes every time while it's being blocked by the lookup API (tested it on Chrome). All urls were checked with gglsbl_client.py but for this particular one, not even a hash is being created. I thought you might wanna take a look as it might be a broader issue.

here is the malicious Url:
http://www.paypal.com.verifyaccount-update.com/us/webapps/m812/home

403 Error on Sync

When I try to sync the database with this snippet:

from gglsbl import SafeBrowsingList
sbl = SafeBrowsingList("secret")
sbl.update_hash_prefix_cache()

I get an exception: urllib2.HTTPError: HTTP Error 403: Forbidden
I added a bit of logging, so the ouput is:

init
Next query will be delayed 158 seconds

Opening SQLite DB ./gsb_v3.db
Retrieving prefixes
Sleeping for 158 seconds

performing api call to https://safebrowsing.google.com/safebrowsing/downloads?appver=0.1&pver=3.0&key=...&client=api with payload: goog-malware-shavar;
googpub-phish-shavar;

Traceback (most recent call last):
  File "sync.py", line 3, in <module>
    sbl.update_hash_prefix_cache()
  File "C:\Users\SK\Programming\gglsbl\gglsbl\client.py", line 28, in update_hash_prefix_cache
    response = self.prefixListProtocolClient.retrieveMissingChunks(existing_chunks=existing_chunks)
  File "C:\Users\SK\Programming\gglsbl\gglsbl\protocol.py", line 222, in retrieveMissingChunks
    raw_data = self._fetchData(existing_chunks)
  File "C:\Users\SK\Programming\gglsbl\gglsbl\protocol.py", line 205, in _fetchData
    response = self.apiCall(url, payload)
  File "C:\Users\SK\Programming\gglsbl\gglsbl\protocol.py", line 62, in apiCall
    response = urllib2.urlopen(request)
  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 410, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

Any idea what's going on?
EDIT: I'm sure the API Key is correct, btw. The Google Implementation works.

KeyError: 'matches'

File "build/bdist.linux-x86_64/egg/gglsbl/client.py", line 92, in _sync_full_hashes
KeyError: 'matches'

My python code for looking up multiple urls is as follows:

input_file = open('alexa.csv', 'r') # this is from alexa.com's top 1mm domain list, sanitized so its one domain per line

for line in input_file:
sbl.lookup_url(str.rstrip(line))

Reference new project in README.md

I have written a new project that basically implements a dockerized REST wrapper around gglsbl and published it at https://github.com/mlsecproject/gglsbl-rest if you are interested.

Any suggestions and comments on the implementation will be most welcome. And a link to this repository from gglsbl's README.md would be much appreciated. :)

Unable to open the database file

2018-06-20 12:07:15,988 INFO Opening SQLite DB /tmp/gsb_v4.db
2018-06-20 12:07:15,989 ERROR Can not get schema version, it is probably outdated.
2018-06-20 12:07:15,989 WARNING Cache schema is not compatible with this library version. Re-creating sqlite DB /tmp/gsb_v4.db
Traceback (most recent call last):
File "/usr/local/bin/gglsbl_client.py", line 4, in
import('pkg_resources').run_script('gglsbl==1.4.11', 'gglsbl_client.py')
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 654, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 1441, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/gglsbl-1.4.11-py2.7.egg/EGG-INFO/scripts/gglsbl_client.py", line 86, in

File "/usr/local/lib/python2.7/dist-packages/gglsbl-1.4.11-py2.7.egg/EGG-INFO/scripts/gglsbl_client.py", line 70, in main

File "build/bdist.linux-x86_64/egg/gglsbl/client.py", line 33, in init
File "build/bdist.linux-x86_64/egg/gglsbl/storage.py", line 65, in init
File "build/bdist.linux-x86_64/egg/gglsbl/storage.py", line 98, in init_db
sqlite3.OperationalError: unable to open database file

I am getting this type issues any one know how to resolve it

client behind proxy, option to specify proxy details?

If the client relies on an HTTP proxy, connections time out. It would be great to be able to use this behind a proxy.

SafeBrowsingListTestCase FAIL

python -m unittest -v tests.py
test_canonicalize (tests.SafeBrowsingListTestCase) ... FAIL
test_permutations (tests.SafeBrowsingListTestCase) ... ok

======================================================================
FAIL: test_canonicalize (tests.SafeBrowsingListTestCase)

Traceback (most recent call last):
File "/home/super/anaconda3/lib/python3.7/site-packages/gglsbl/tests.py", line 109, in test_canonicalize
self.assertEqual(URL(nu).canonical, cu)
AssertionError: "http://b'http/\\x01\\x80.com/'" != 'http://%01%80.com/'

http://b'http/\x01\x80.com/'

http://%01%80.com/

Ran 2 tests in 0.003s

FAILED (failures=1)

When I try to sync the database, I have got the error as follows:

from gglsbl import SafeBrowsingList
sbl = SafeBrowsingList("secret key for 3.0")
sbl.update_hash_prefix_cache()
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/gglsbl-0.4-py2.7.egg/gglsbl/client.py", line 26, in update_hash_prefix_cache
response = self.prefixListProtocolClient.retrieveMissingChunks(existing_chunks=existing_chunks)
File "/usr/local/lib/python2.7/dist-packages/gglsbl-0.4-py2.7.egg/gglsbl/protocol.py", line 220, in retrieveMissingChunks
raw_data = self._fetchData(existing_chunks)
File "/usr/local/lib/python2.7/dist-packages/gglsbl-0.4-py2.7.egg/gglsbl/protocol.py", line 203, in _fetchData
response = self.apiCall(url, payload)
File "/usr/local/lib/python2.7/dist-packages/gglsbl-0.4-py2.7.egg/gglsbl/protocol.py", line 60, in apiCall
response = urllib2.urlopen(request)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(_args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(_args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

My key is for protocol 3.0.
Any help?

Phishing list processing

Got database file gsb_v3.db fully updated. Checking gsb malware urls reports ok (e.g. hxxp://www.hypnotherapie-kinderen.nl/cv2QKCwx.php?id=15384739). Checking phishing urls always reports clean. (test uri: hxxp://sogecanjou.com/3e5fe5eef229335a35068e4091591341/
hxxp://ttb.ddsoftultimate.com/download/request/52387a805f1c1e8274000009/4vEoIKcf',%20'4vEoIKcf',%20'Player.exe')
hxxp://paulrappandco.com.au/pictures/googledocs/sss/)
Phishing warnings are ok with browser or via lookup api (gsb key is valid).

gglsbl_client.py is not working under python 3.4.5

I think that it doesn't work under python version less than 3.5. There is an error in utils.py

File "/usr/lib/python3.4/site-packages/gglsbl-1.4.5+5.g1c91278-py3.4.egg/gglsbl/utils.py", line 8, in to_hex_3
    return v.hex()
AttributeError: 'bytes' object has no attribute 'hex'

IndexError after upgrading to 1.4.6

I am using gglsbl-rest for queries at a large scale, and after upgrading to 1.4.6 I've started seeing the following errors in my logs:

[2017-09-30 14:23:40,537] ERROR in app: exception handling [https://vignette.wikia.nocookie.net/gameofthrones/images]
Traceback (most recent call last):
File "/root/gglsbl-flask/app.py", line 37, in app_lookup
sbl = SafeBrowsingList(api_key, active['name'], True)
File "/usr/local/lib/python2.7/site-packages/gglsbl/client.py", line 33, in __init__
self.storage = SqliteStorage(db_path)
File "/usr/local/lib/python2.7/site-packages/gglsbl/storage.py", line 60, in __init__
if not self.check_schema_version():
File "/usr/local/lib/python2.7/site-packages/gglsbl/storage.py", line 74, in check_schema_version
v = dbc.fetchall()[0][0]
IndexError: list index out of range

Still trying to get steps that reliably reproduce this issue, will post them here if I get them.

GSB not returning correct results

Hello,

(I realize this is probably not the right place to ask this question, but I can't delete this issue anymore, so hopefully, someone will be able to answer me).

I've followed the instructions to install this python library which then sets up a 1.7G database. Whenever I try any URL even those which are supposed to be blacklisted like ianfette, it keeps saying that it is a clean URL. Am I missing a step here?

Thank you!

Web Risk API

Apparently Google changed the license overnight on how the Safe Browsing API can or cannot be used:

Effective immediately, Safe Browsing API is for non-commercial use only. Existing contractual
agreements will be honored, and academic researchers, NGOs, operating systems, and browsers
can continue to use the Safe Browsing API at no cost. To detect malicious URLs for commercial
purposes—meaning for sale or revenue-generating purposes—you can use Web Risk API, which is
updated to meet the needs unique to enterprises.

I haven't looked into this Web Risk API, would it make sense to include support for it in gglsbl or is it different enough to warrant a separate implementation?

GSB cannot detect the phish url

I used this GSB api check many phishing URL, but no one was detected. The response 'threat_list' is None.

sqlite3.IntegrityError: UNIQUE constraint failed

Hello,

I'm using this library to check Google Safe Browsing URLs.
I have collected responses from "get_threats_update" into JSON and then tried to insert them into the DB.
However, after inserting it into DB for a while, it gives me this integrity error at line 323 in Storage.py (when inserting hash_prefix with populate_hash_prefix_list function):
sqlite3.IntegrityError: UNIQUE constraint failed: hash_prefix.value, hash_prefix.threat_type, hash_prefix.platform_type, hash_prefix.threat_entry_type

For this function, is it possible to add INSERT OR IGNORE? or does it need to check for integrity?
What should happen if integrity fails? Is it supposed to get a full update instead of a partial update?

Thank you in advance.

Place link to Python 3 version in README

Hi,
I created a Python 3 (only) version of this library that can be found here (And also changed a bunch of other things, like unit tests and metadata for a lookup). A small hint in the README would be useful for people who want to use the library with python 3.

Best Regards,
Stefan

URL canonicalization failure

Opening this issue here because I think it might be relevant. Through unit tests on windows I discovered this (minor) issue. A better way to handle the numeric host to 'normal' host conversion (link to your current version) would be using a int to ip function which comes from the util package in my fork.

Error in storage.py:cleanup_full_hashes()

It seems that query to clean up full_hash entries expired more than X seconds ago is incorrect. It should be:
DELETE FROM full_hash WHERE expires_at < datetime(current_timestamp, '-{} SECONDS')
instead of (in current code)
DELETE FROM full_hash WHERE expires_at=datetime(current_timestamp, '{} SECONDS')

How to remove the process of verify in Google server?

Dear sir:
There is a problem for me, getting VPN for me to use google smoothly is too difficult, so how can i remove the process of verifying in google server when a url is in the local database blacklist?
Thank you very much for help.

HTTP Error 413: Request Entity Too Large

I'm able to run update_hash_prefix_cache() a number of times to update the local database. But on about the 8th update Google's servers return an HTTP 413 error. Looks like the payload may be too large?

minimumWaitDuration should be separated for for get_threats_update and get_full_hashes

There is currently a single wait period for all calls, however the docs mention the minimumWaitDuration is separate per call type:

https://developers.google.com/safe-browsing/v4/reference/rest/v4/threatListUpdates/fetch

The minimum duration the client must wait before issuing any update request. If this field is not set clients may update as soon as they want.

https://developers.google.com/safe-browsing/v4/reference/rest/v4/fullHashes/find

The minimum duration the client must wait before issuing any find hashes request. If this field is not set, clients can issue a request as soon as they want.

Also get_threats_lists has handling for minimumWaitDuration, but the docs do not mention this response field in the threatLists.list call

https://developers.google.com/safe-browsing/v4/reference/rest/v4/threatLists/list

getting HTTP Error 403: Forbidden for a valid server/browser key

I know this error:
File "/usr/lib/python2.7/urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden
popped up a couple of times, but that was due to old format of API keys.
I'm using the current one, from Developer Console, created as described here:
https://developers.google.com/safe-browsing/v3/update-guide#GettingStarted

and I'm trying to execute
gglsbl_client.py --api-key ... --onetime
but keep getting 403 forbidden (tried both Server and Browser key). Can someone enlighten me on this... TIA!

insert warning

I get many of the following warn message while downloading the database:

WARNING Failed to insert chunk because of columns value, chunk_number, list_name, chunk_type are not unique

Does it mean my database is inconsistent?

Use a package-specific logger instead of the root logger

As seen in https://github.com/afilipovich/gglsbl/blob/master/gglsbl/client.py#L6 the current code logs issues using the root logger.

Perhaps update this (and other occurrences) to logging.getLogger('gglsbl') instead?

Error when too fast queries: Insufficient tokens for quota 'UpdateAPIGroup' and limit 'CLIENT_PROJECT-100s' of service 'safebrowsing.googleapis.com'

I have the error below. What solution would you recommend?

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    r=sbl.lookup_url(i)
  File "gglsbl/client.py", line 119, in lookup_url
    list_names = self._lookup_hashes(url_hashes)
  File "gglsbl/client.py", line 165, in _lookup_hashes
    self._sync_full_hashes(matching_prefixes.keys())
  File "gglsbl/client.py", line 88, in _sync_full_hashes
    fh_response = self.api_client.get_full_hashes(hash_prefixes, client_state)
  File "gglsbl/protocol.py", line 43, in wrapper
    r = func(*args, **kwargs)
  File "gglsbl/protocol.py", line 151, in get_full_hashes
    response = self.service.fullHashes().find(body=request_body).execute()
  File "/home/martin/.local/lib/python3.5/site-packages/oauth2client/_helpers.py", line 133, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/martin/.local/lib/python3.5/site-packages/googleapiclient/http.py", line 840, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 429 when requesting https://safebrowsing.googleapis.com/v4/fullHashes:find?alt=json&key=AIzaSyBND-B5coxYowdEc9iiUadKmDk60jhBSgg returned "Insufficient tokens for quota 'UpdateAPIGroup' and limit 'CLIENT_PROJECT-100s' of service 'safebrowsing.googleapis.com' for consumer 'api_key:'.">

Question: is it possible to download all full digests?

With gglsbl, is it possible to download all full digests by querying the completer for all short 4-bytes hashes?

Would the Google API throttling be the only problem?

Cant update DB process is Killed prematurely

Python 2.7 on Ubuntu 16 Server

 gglsbl_client.py --api-key 'API_KEY' --onetime
2018-02-20 10:23:08,495 WARNING Circumventing request frequency throttling is against Safe Browsing API policy.
2018-02-20 10:23:08,649 INFO Opening SQLite DB /tmp/gsb_v4.db
2018-02-20 10:23:08,862 INFO Cleaning up full_hash entries expired more than 43200 seconds ago.
2018-02-20 10:23:22,160 INFO Storing 40807 entries of hash prefix list MALWARE/IOS/URL
2018-02-20 10:23:22,838 INFO Local cache checksum matches the server: 95ac5bd2f9429a9653d14ff5cb8673ebcd2f9c287c34cf7e5aaa0814d9e4e132
2018-02-20 10:23:24,934 INFO Storing 468600 entries of hash prefix list MALWARE/OSX/URL
2018-02-20 10:23:35,361 INFO Local cache checksum matches the server: 3876c9ae6517164006c72ef3d22fd2655f3f532a70df49447f3953a8ddbf3385
2018-02-20 10:23:37,149 INFO Storing 1090026 entries of hash prefix list SOCIAL_ENGINEERING/OSX/URL
Killed

The process is repeatable and fails exactly here. I am using the master branch.

Set sqlite database to WAL mode

As briefly discussed on #28 I think it would be a good idea to set the sqlite database to WAL mode. This would allow the update process to happen concurrently with queries, and that would bode well for multiprocess use of gglsbl, such as what we do in gglsbl-rest.

That would involve:

Removing the current PRAGMA synchronous = 0 entry;
Issuing a PRAGMA journal_mode=WAL when the database is created;
Perhaps exposing a method on the client class that allows a WAL checkpoint to be performed. In gglsbl-rest, for example, there is a regular chron job that could execute the PASSIVE checkpoint at regular intervals.

If this was implemented, I think gglsbl-rest could do away with maintaining two separate databases and instead work with a single one. That would simplify things tremendously, reduce disk space usage and possibly eliminate race conditions in the database-switching logic.

URLs with Unicode cause exception

I am getting an exception when the URLs have Unicode characters in them:

Python 2.7.13 (default, Dec 18 2016, 07:03:39)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gglsbl import SafeBrowsingList
>>> sbl = SafeBrowsingList('<redacted>')
>>> sbl.lookup_url(u"http://www.google.com/?q=áéíóú'")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/gglsbl/client.py", line 111, in lookup_url
    url_hashes = URL(url).hashes
  File "/usr/local/lib/python2.7/site-packages/gglsbl/protocol.py", line 158, in __init__
    self.url = str(url)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 25-29: ordinal not in range(128)

I'm not sure what is the best way to handle this, either make the protocol handler be Unicode-aware, or perhaps percent-encode any non-ASCII characters.

This package not work with windows.

client.py :21 line

def __init__(self, api_key, db_path='/tmp/gsb_v4.db', discard_fair_use_policy=False,

This will cause sqlite error, because windows does not have path('/tmp/gsb_v4.db').

I just changed path to './gsb_v4.db' and It work fine.

URLs with no protocol

When I submit URLs with no protocol I'm getting an exception on lookup_url:

>>> sbl.lookup_url('www.google.com:443/')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/gglsbl/client.py", line 112, in lookup_url
    list_names = self._lookup_hashes(url_hashes)
  File "/usr/local/lib/python2.7/site-packages/gglsbl/client.py", line 122, in _lookup_hashes
    full_hashes = list(full_hashes)
  File "/usr/local/lib/python2.7/site-packages/gglsbl/protocol.py", line 163, in hashes
    for url_variant in self.url_permutations(self.canonical):
  File "/usr/local/lib/python2.7/site-packages/gglsbl/protocol.py", line 188, in canonical
    host = full_unescape(url_parts.hostname)
  File "/usr/local/lib/python2.7/site-packages/gglsbl/protocol.py", line 171, in full_unescape
    uu = urllib.unquote(u)
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1234, in unquote
    bits = s.split('%')
AttributeError: 'NoneType' object has no attribute 'split

Is it required to have a full and well-formed URL to perform the query? If so, maybe it would be better to check for that explicitly and raise a ValueError instead.

URL permutations not covering all cases

When looking up http://malware.testing.google.test/testing/malware using the python implementation, it returns NOT blacklisted.
When performing the same lookup on the google lookup page it is listed.
Maybe when generating the permutations, a trailing slash should be appended to urls like http://foo.com/bar/file because that's what google seems to do on their lookup page.
When looking up http://malware.testing.google.test/testing/malware/ with python, you see it is blacklisted.
Link to gglsbl3 issue

500 hash limit per API request to /v4/fullHashes:find

This came up in google/safebrowsing#87. I believe it affects this project as well.

In short, there is a 500 hash limit on fullHashes:find requests. Unfortunately this is not currently documented in the developer docs (I'll make sure they're fixed soon).

I just wanted to give you a heads up before someone using this client runs into the problem. This limit has been in place for well over a year, so it seems it hasn't caused much trouble until now.

fair_use_delay infinite sleep

sleep_for = self.next_request_no_sooner_than - time.time() can return a negative value leading to an infinite sleep

afilipovich / gglsbl Goto Github PK

gglsbl's People

Contributors

Stargazers

Watchers

Forkers

gglsbl's Issues

When I make a copy of the SafeBrowsingListTestCase, I find that this sample get something wrong. Could you please tell me the reason? Thank you. I run it on python3.10.6. F

FAIL: test_canonicalize (main.SafeBrowsingListTestCase)

====================================================================== FAIL: test_canonicalize (tests.SafeBrowsingListTestCase)

Recommend Projects

Recommend Topics

Recommend Org

When I make a copy of the SafeBrowsingListTestCase, I find that this sample get something wrong. Could you please tell me the reason? Thank you. I run it on python3.10.6.
F

======================================================================
FAIL: test_canonicalize (tests.SafeBrowsingListTestCase)