Code Monkey home page Code Monkey logo

atenreiro / opensquat Goto Github PK

View Code? Open in Web Editor NEW
681.0 22.0 131.0 6.75 MB

The openSquat is an open-source tool for detecting domain look-alikes by searching for newly registered domains that might be impersonating legit domains and brands.

Home Page: https://opensquat.com

License: GNU General Public License v3.0

Python 100.00%
threat-intelligence typosquatting cybersquatting phishing-detection phishing osint security-tools domain-squatting threat-hunting phishing-domains

opensquat's People

Contributors

atenreiro avatar htanskanen avatar l3str4nge avatar maaaaz avatar sporetec avatar t0sc4n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opensquat's Issues

How to use Virustotal?

Hello,

After I run opensquat how do I check the results.txt against virustotal within opensquat?

Thank you

Daily domains not updated

Hello,

Daily domains list not updated for the last two days. I did a pull but that did not fix the issue.

Thank you

[ROADMAP] Social media squatting

Hello, I have question about roadmap issue regarding social media squatting. I could work on this but I need some tips for that :) How can we detect social media squatting?
The only thing that comes to my mind is somehow detect usernames just like domains.

Daily feed is down

Seems Opensquat is no longer able to pull the daily feed

+---------- Checking Domain Squatting ----------+
[] Downloading fresh domain list from https://raw.githubusercontent.com/CERT-MZ/projects/master/Domain-squatting/domain-names.txt
[ERROR] domain-names.txt not found,trying the weekly file.
[
] Downloading fresh domain list from https://raw.githubusercontent.com/CERT-MZ/projects/master/Domain-squatting/domain-names-week.txt
[ERROR] File not found! Contact the authors or try again later. Exiting...

Latest feeds

Hello,

For the last two days, OS is not pulling the latest feeds.

(Possible bug) Double DNS reputation check

Hi!

I was reading through your source code, and noticed that in the app.py file, there is a duplicate DNS reputation check in the _process_levenshtein function.

A DNS check happens, even if the flag is not set, as can be seen here:

        elif (leven_dist <= self.confidence_level) and homograph_domain:
            self.on_homograph_detected(
                keyword,
                domains,
                self.confidence[leven_dist]
                )
            self.dns_reputation(domains)

            #  DNS Validation
            if(self.dns_validation):
                self.dns_reputation(domains)

ie. if self.dns_validation is True, then dns_reputation gets called twice:]

Cheers.

PEP8/CI/CD

Hello,
Do you have any idea about pep8 compiliance with some CI/CD (regarding tasks in project roadmap)?

My proposal is standard stack: black, flake8, pytest running on TravisCI.

Database issue

Hello,

There is something wrong with the latest database set of today 17-5-2021

I pulled the data and noticed that some domains looked very familiar:

digi-dtoeslag-nl.icu

digi-dtoeslagen-nl.icu
digi-toeslagen-nl.icu
digid-toeslagen-nl.icu
digidtoeslagen-nl.icu
digitoeslagen-nl.icu

When I checked these domain names I noticed that they were registered on 06-05-2021
More importantly, I already submitted these indicators to Pulsedive on 10-05-2021
See example https://pulsedive.com/indicator/?iid=24489603 or https://pulsedive.com/indicator/?iid=24489595

From what I sampled, the .com TLD seems to be working fine, but other TLDs have older data, including .NET (same registry operator as .COM)

During this weekend I noticed that the daily amount of fresh domain names was much much smaller as usual, 28k and 44k. But usually, it is constant around 100-200k it sometimes drops between 100k but not often is my experience and I pull data every day.

Crash when using csv output

Hi there,

When I run the program with -t csv in the command, it always craches:
python3 opensquat.py --phishing phishing_domains.txt --dns --ct --subdomains --portcheck -c 4 --vt -p day -t csv -o results.csv

No problem with json or txt type.

This is the error I get :


+---------- Checking Domain Squatting ----------+
[*] Checking for the latest feeds...
[*] You have the latest feeds

[*] keywords: mykeywords.txt
[*] keywords total: 1
[*] Total domains: 137,271
[*] Threshold: very low confidence

[*] Verifying keyword: bluesky [ 1 / 1 ]
[+] suspicious certificate detected between bluesky and bluemountainbluesky.online
[+] suspicious certificate detected between bluesky and blueskyb-consultancy.com
[+] suspicious certificate detected between bluesky and blueskydogtoys.com
[+] suspicious certificate detected between bluesky and blueskye.site
[+] suspicious certificate detected between bluesky and blueskyit.net
[+] suspicious certificate detected between bluesky and blueskymyk.com

[*] Total found: 6

+---------- Checking for Subdomains ----------+
[*] bluemountainbluesky
 \_ VirusTotal might be throttling/blocking
[*] blueskyb-consultancy
 \_ VirusTotal might be throttling/blocking
[*] blueskydogtoys
 \_ VirusTotal might be throttling/blocking
[*] blueskye
 \_ VirusTotal might be throttling/blocking
[*] blueskyit
 \_ VirusTotal might be throttling/blocking
[*] blueskymyk
 \_ VirusTotal might be throttling/blocking
[*] Total found: 0

+---------- VirusTotal ----------+
[*] Total found: 0

+---------- Checking Phishing sites ----------+
[*] Downloading fresh Phishing DB from https://raw.githubusercontent.com/mitchellkrogza/Phishing.Database/master/phishing-domains-ACTIVE.txt
[*] Download volume: 0.59 MB

[*] Verifying keyword: bluesky [ 1 / 1 ]
  \_ Similarity detected between bluesky and blueskyint.com.np
  \_ Similarity detected between bluesky and blueskyparts.com
  \_ Similarity detected between bluesky and blueskypropertiesinc.com

+---------- Domains with open webserver ports ----------+
[*] Total found: 0
Traceback (most recent call last):
  File "opensquat.py", line 150, in <module>
    output.SaveFile().main(args.output, args.type, file_content)
  File "/home/user/cert/opensquat/opensquat/output.py", line 113, in main
    self.as_csv()
  File "/home/user/cert/opensquat/opensquat/output.py", line 68, in as_csv
    file_csv.close()
AttributeError: '_csv.writer' object has no attribute 'close'

Error While Installing Requirements

ERROR: Could not find a version that satisfies the requirement black (from -r requirements.txt (line 11)) (from versions: none)
ERROR: No matching distribution found for black (from -r requirements.txt (line 11))

image

Feeds not updating

Hello,

Just a FYI but the daily feed is not being updated since the last 3 days.

Release Notes 1.97

v1.97 (2020-10-18)

  • Domains can be verified is they are blacklisted to any VirusTotal engine
  • You can now filter results using different args
  • minor change, the number of domains showing with the thousand comma

When using VirusTotal, the website throttles the queries if a large number of queries in a short time is detected.

No attribute port checl?

I am running the latest version, but I am getting this error;

[*] Verifying keyword: nescafe [ 101 / 102 ]

Progress: 98.5 %

[*] Verifying keyword: mastercard [ 102 / 102 ]

Progress: 99.0 %
Progress: 99.6 %
Traceback (most recent call last):
File "opensquat.py", line 78, in
or args.portscheck:
AttributeError: 'Namespace' object has no attribute 'portscheck'

Use of simple regex commands in keywords.txt

Please include regex features like ^ and $ to be implemented in keywords.txt as it has to start (^string) or end (string$) with a given string value. Some short keywords (usually abbreviations) used as domain names or page titles produce a lot of false positives when found anywhere in the resulting domain names.

Quad 9 results

Hello,

When I run the daily scan I have around 1500+ suspicious domain names.
After I check with quad 9 I got around 700 malicious domain names.

How can I get an overview/export of those malicious domain names so I can paste and copy them?
Screenshot from 2021-02-28 10-55-06

As you can see this is pretty hard to put in a spreadsheet and filter out the malicious stuff

Daily new domains

Hello,

It seems after the new update the number of daily domain names dropped below 3000
daily
domains?

Target response informations

Feature request: Site response informations

It would be useful to have informations about the potential squatted site information, such as:

  • Response code
  • If keyword is response html
  • If some phishing-related keywords are present in the response html
  • Notice SSL connection errors

I've developed a very simple solution locally and I think it would be a useful addition to the project.
If the feature request is accepted, I can make a pull request

Suspicious certificate results not showing in results.txt

Is your feature request related to a problem? Please describe.
When I run opensquat on my keywords with the command below, event if it finds "suspicious certificate" in the "Checking Domain Squatting" section, I have nothing written in results.txt. Maybe I missed something, but is there a way to have this results written somewhere ?

python3 opensquat.py --phishing phishing_domains.txt --dns --ct --subdomains --portcheck -c 4 --vt -p month

Many thanks for your great tool

Looking for contributors

Hello everyone,

I am looking for code contributors for:

  • Feature 1: Add "AND" logical condition for keywords search (example: google+login)

Selecting Top level domain

I don't know how you get your feed for the domain names registered, but it appears mostly to be .com domains. It would be a nice feature to add more TLD's, so they you can search for specific countries.

Describe the solution you'd like
If you are able to get more datasets based on TLD's, you could just add an option which TLD you want to look at (eg. --TLD com,fr,de) and then based on the TLD specified, you download the corresponding datasets after which opensquat does the same as before.

Travis-CI Failing

Hi @mateuszz0000

I don't understand why Travis-CI is failing, barely changed the code. Any clue?

pluggy.manager.PluginValidationError: Plugin 'pytest_cov' could not be loaded: (pytest 4.3.1 (/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages), Requirement.parse('pytest>=4.6'))!
The command "pytest --cov=opensquat --cov=tests --cov=docs/src --cov-report=term-missing tests" exited with 1.

JaroWinkler method not working properly

After pull #9, when running the application with Jaro-Winkler method, it's only showing results from direct string comparison and not using the Jaro-Winkler calculation.

Comparing the two method results:

Jaro Winkler
JaroWinkler

Levenshtein
Levenshtein

Multiple Errors

Traceback (most recent call last):
File "/home/USER/opensquat/opensquat.py", line 67, in
file_content = app.Domain().main(
File "/home/USER/opensquat/opensquat/app.py", line 596, in main
return self.worker()
File "/home/USER/opensquat/opensquat/app.py", line 364, in worker
self._process_doppelgagner_only(
File "/home/USER/opensquat/opensquat/app.py", line 404, in _process_doppelgagner_only
if not ct.CRTSH.check_certificate(domains):
File "/home/USER/opensquat/opensquat/ct.py", line 49, in check_certificate
for table in soup.find_all("table")[1]:
IndexError: list index out of range

keywords.txt =
facebook
google
instagram
microsoft
fedex
amazon
twitter

Add whoisrd datasource

Is your feature request related to a problem? Please describe.
We need to pull in more domains. Lets use whoisds.com

Describe the solution you'd like

def getNewlyRegisteredDomains(self, date, output_path):
        base64_date = base64.b64encode((date + '.zip').encode()).decode()
        url = 'https://www.whoisds.com/whois-database/newly-registered-domains/' + base64_date + '/nrd'
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Trident/7.0; rv:11.0) like Gecko'}
        result = requests.get(url, headers=headers)
        statuscode = result.status_code
        if statuscode == 200:
            with open(output_path, 'wb') as f:
                f.write(result.content)
            return True
        else:
            return False

WHOIS

Describe the solution you'd like
for each returned domain result, include a second column of WHOIS output to include org, registrar, dates, name servers, IP address, country

Issue with the daily feed

Hello,

It looks like the daily domain names list is stuck in a loop. I keep getting the same domain name results for the last 2 days.

When I look at https://phishydomains.com/ the number of domain names yesterday was 126735
But when I pull the domain names list it is 92460

[] Checking for the latest feeds...
[
] You have the latest feeds

[] keywords: keywords.txt
[
] keywords total: 95
[] Total domains: 92460
[
] Threshold: high confidence

[*] Verifying keyword: financien [ 1 / 95 ]
[+] Found afhandelingen-financien.site
[+] Found belgiumfinancien.site
[+] Found financien-betaling.online
[+] Found financieninbeeld.com
[+] Found inbeslagname-financien.site

Thanks.

RapidAPI question

Mate, do you have plans to set up other endpoints with RapidAPI? Maybe the domain list feed?

Experimental Support for DNS

The new branch has been created with experimental support for Quad9. The openSquat will validate with Quad9 if the domain has been flagged as malicious or not.

I have chosen to use DoH (HTTPS) versus traditional DNS (53), which this last one by design is faster than DoH. However, some ISP or corporate networks might block external DNS providers, so I opted for DoH. It's also possible than some networks also block DoH as well.

The same branch as support for Cloudflare DNS, however, this still has unfinished code. Cisco Umbrella will also be supported later on.

Usage:
ec2-user:~/environment (quad9) $ python main.py --dns_validation quad9

Screenshot
image

Domains-names-txt not found

Downloading the fresh domain list seems to be broken when using -p day for the daily download.
Executing without -p day still works.

Daily feed.

Hello,

Seems the daily feed is no longer updating.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.