Code Monkey home page Code Monkey logo

Comments (20)

lc avatar lc commented on August 16, 2024

Hey @uBadRequest,

You might be running out of memory, for the "could not marshal json" errors? I can add HTTP request retries to get the common crawl index.

from gau.

lc avatar lc commented on August 16, 2024

@uBadRequest,

Can you share the command you were using to run this in parallel? It might be caused by a memory leak on my part, I'll investigate

from gau.

lc avatar lc commented on August 16, 2024

@uBadRequest,

I think I may have fixed the issue. Can you re-install this new version and test?

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

Same for me with even with your last commit.
"could not unmarshal json from commoncrawl: invalid character '<' looking for beginning of value"

from gau.

lc avatar lc commented on August 16, 2024

@iTestAndroid,

That is an expected error when CommonCrawl rate-limits you. The fix was meant to help with the panic

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

@lc
Can I disable CommonCrawl? Can I limit pages?

from gau.

lc avatar lc commented on August 16, 2024

@iTestAndroid, I thought about implementing this recently. I can add a flag that allows you to specify which ones you want to use

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

Sure, I tried to remove CC as I only want the OTX, but I think I messed up some other parts of code. Switch would help, alternatively when you hit the CC limit, it can continue instead of breaking or something

from gau.

lc avatar lc commented on August 16, 2024

Hey @iTestAndroid,
If you only want URLs from OTX you can alternatively use this tool: http://github.com/lc/otxurls

I'll add limiting the fetchers to my todo list.

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

Great, thanks.
Also I commented out CommonCrawl and I still got this:
"error in parsing JSON from alienvault: invalid character '<' looking for beginning of value"
and
"could not decoding response from wayback machine: net/http: request canceled (Client.Timeout exceeded while reading body)"

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

@lc Am I missing something?
I tried otxurls and I still got this:

2020/04/23 18:00:54 Could not decode json: invalid character '<' looking for beginning of value

from gau.

lc avatar lc commented on August 16, 2024

Yeah that happens when AlienVault's OTX does not respond with JSON. You might just be getting IP-blocked / rate-limited. Maybe try from another IP

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

@lc
But when I manually open and type the URL in my browser I can clearly see the JSON data
https://otx.alienvault.com/api/v1/indicators/hostname/google.com/url_list?limit=50&page=1

from gau.

lc avatar lc commented on August 16, 2024

Many concurrent requests might be blocked, not 100% sure, however, the error catching is as intended

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

@lc
Can we slow it down so it works? One thread at time or something?

from gau.

lc avatar lc commented on August 16, 2024

@iTestAndroid, it currently runs on one thread, but I could add an option to set the delay between the fetchers

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

@lc
Sure, but I'm not 100% sure that is the problem with the feed. Also if you read 200 pages and page number 201 doesn't work or return wrong JSON, don't break. Maybe exception handling and then returning list of all URLs code was able to capture so far?

from gau.

lc avatar lc commented on August 16, 2024

Hey @iTestAndroid @uBadRequest,

I just merged a great pull-request that should fix these issues.

Closing this now

from gau.

lc avatar lc commented on August 16, 2024

@iTestAndroid, I just added the -providers flag as well now so you can specify which providers you want to fetch URLs from.

from gau.

iTestAndroid avatar iTestAndroid commented on August 16, 2024

All working well now. Thank you

from gau.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.