netspi / netblocktool Goto Github PK

View Code? Open in Web Editor NEW

327.0 11.0 63.0 82 KB

Find netblocks owned by a company

License: Other

Python 100.00%

netblocktool's Introduction

NetblockTool

Find netblocks owned by a company

Overview

Use NetblockTool to easily dump a unique list of IP addresses belonging to a company and its subsidiaries.
All data gathering is passive. No traffic is ever sent to the target company.
Sources include ARIN API, ARIN GUI search functionality, and Google dorking. Company subsidiaries are retrieved from SEC's public database.

Quick Run

git clone https://github.com/NetSPI/NetblockTool.git
cd NetblockTool && pip3 install -r requirements.txt
python3 NetblockTool.py -v Company

Output

Results are written to a CSV called Company.csv where Company is the provided company's name. The truncated output for Google is shown below.

How does this script work?

In depth information on the tool and how it works can be found here.

A target company is provided
Google dorking is used to find netblocks
Traffic is sent that simulates a user searching ARIN's database for the company name
All ARIN links are found, visited, and processed from the previous database query
Duplicate networks are removed
Each netblock is given a confidence score
Netblocks are sorted by confidence score and written to a CSV

Common Use Cases

Simple run. Get results from Google dorking and ARIN database:

python3 NetblockTool.py Company

Include the verbose flag to print status updates:

python3 NetblockTool.py -v Company

Extract netblocks owned by your target company’s subsidiaries:

python3 NetblockTool.py -v Company -s

Extract point of contact information:

python3 NetblockTool.py -v Company -p

Get as much information as possible, including netblocks found using wildcard queries, points of contact, geolocation data, and physical addresses:

python3 NetblockTool.py -wpgav Company -so

Help

$ ./NetblockTool.py
usage:
  _   _      _   _     _            _    _____           _
 | \ | | ___| |_| |__ | | ___   ___| | _|_   _|__   ___ | |
 |  \| |/ _ \ __| '_ \| |/ _ \ / __| |/ / | |/ _ \ / _ \| |
 | |\  |  __/ |_| |_) | | (_) | (__|   <  | | (_) | (_) | |
 |_| \_|\___|\__|_.__/|_|\___/ \___|_|\_\ |_|\___/ \___/|_|

./NetblockTool.py [options] {target company}
    Find netblocks owned by a company

Positional arguments:
    {target} Target company (exclude "Inc", "Corp", etc.)

Optional arguments:
    Common Options:
    -l        List mode; argument is a file with list of companies, one per line
    -o        File name to write data to (no extension, default is target name)
    -v        Verbose mode
    -q        Quiet mode
    -h        Print this help message

    Data Retrieval & Processing:
    -n        Don't perform thorough wildcard queries (query = target)
    -ng       Don't perform Google Dorking queries
    -w        Perform more thorough complete wildcard queries (query = *target*). Note
                  that this option may return significantly more false positives.
    -c        Company name if different than target (may affect accuracy of confidence
                  scores, use carefully; exclude "Inc", "Corp", etc.)
    -e        Only return results greater than a given confidence score
    -p        Retrieve and write point of contact information to a text file. Note that
                  retrieval of PoC information will likely take some time.
    -4        Only return IPv4 netblocks
    -6        Only return IPv6 netblocks

    Company Subsidiaries:
    -s        Fetch subsidiary information and return netblocks of all subsidiaries in
                  addition to initial target
    -sn       Company name to use when fetching subsidiaries
    -sp       Use alternate parsing method when fetching subsidiary information; use
                  if the default method isn't working as expected
    -so       Write subsidiary information to a text file (CompanyName_subsidiaries.txt)

    Physical Location:
    -g        Retrieve geolocation data (if available)
    -a        Write netblock address information to output
    -ag       Write netblock address information to output but only if it contains a
                  given string

Examples:
    python NetblockTool.py -v Google
    python NetblockTool.py -so -wv Facebook -o Results
    python NetblockTool.py -gavl companies.txt

This script isn't working

Ensure the following:

Are all of the dependencies listed in requirements.txt installed?
Is the edgar folder in this repository in the same folder as the NetblockTool.py script?
Is the script printing out Google CAPTCHA detected? You may need to change your public IP or wait ~60 minutes to retrieve Google dorking results.
You may need to use Python 3.7+

netblocktool's People

Contributors

Stargazers

Watchers

Forkers

polling-repo-continua yehias gameofwar12 arryboom raystyle hatchetxuexi h1d3r killvxk crackercat gavz jack51706 bbhunter antichown lp008 rwincey chouaibhm shahid1996 lunarobliq v4nyl hasanakgoz mmg1 hiltojd lijinta1984 azurecloudmonk gprime31 s41n1k nfulp eabase bx0ne lambdaeasy nothingcw nieldk argonauta666 chanpu9 atlassion cybersecops m3g4byt3 secthebit plug03 ulfox taipansec fj-oms cvlabsio jpg0mez leandrodaher n7wera alaa-abdulridha chelaxian excloudx6 rokoman mjwireless matitanium kyuan1088 smhuda websitemobil amnashanwar venkataramanab qordnjs jrhopper sitedata sanqudui8ban

netblocktool's Issues

Missing requirements

Dear developers,
wanted to leave a note for requirements.txt, after building v2.1.0, installing and starting NetblockTool on an IPFire environment i got some 'ModuleNotFoundError's for edgar, requests and idna .

Hope this is helpful for you.

Best,

Erik

Project dependencies may have API risk issues

Hi, In NetblockTool, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

netaddr
bs4
lxml
requests
fuzzywuzzy
tqdm

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency requests can be changed to >=2.4.0,<=2.15.1.
The version constraint of dependency tqdm can be changed to >=4.36.0,<=4.64.0.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the requests

requests.packages.urllib3.disable_warnings
requests.post
requests.get

The calling methods from the tqdm

tqdm.tqdm

The calling methods from the all methods

round
sub_list.join.lower.replace.replace
context_ref.startswith
test_string.rstrip
ranges.append
temp.replace.lower
result_list3.append
sub_list.str.lower
edgar.Company.get_documents
last_row.getchildren.getchildren
netaddr.IPAddress
replace
hyphen_check_string.split.lower
company.lower.rstrip.startswith
company.get_all_filings.xpath
blacklist.append
document.getchildren
potential_nets.append
get_nethandle
self.get_10Ks
process_company_name
random.uniform
zip
str.replace
re.sub
s.datetime.strptime.date
len
int.lower
companies_page_path.open.read
sys.exit
url.replace.split
f.endswith
retrieved_pocs.append
process_potential_company
result_list.clear
result_list.append
all_companies_page.content.decode
sub_list.join.lower
address.split.rstrip
get_org_address_info
possible_companies.append
self._group_document_type
Company.get_request
name.lower
url.replace.replace
netaddr.iprange_to_cidrs
haystack.split
lxml.etree.XMLParser
arg_target.replace.replace
tqdm.tqdm
return_string.replace.replace
csv.writer.writerow
tag.text.split
ranges.sort
last_row.getchildren.getchildren.replace
process_output_file
tree.find_class.find_class
a.text.str.strip
blacklist6.append
str.split
cls._clean_text_
get_asn_subnets
Company.get_documents
arg_address_grep.lower.lower
USER_AGENT.edgar.Edgar.find_company_name
get_asn_info
content_page.find_class.getchildren
process_dedup_filter.append
process_poc_output
elem.attrib.get
self._document_urls.append
last_row.getchildren.getchildren.split
re.findall
ranges6.sort
elem.text_content
html.text_content
self.attrib.get
os.path.dirname
tag.text.str.count
edgar.Company.replace
edgar.Company.title
company.lower
links.append
main
company.lower.rstrip
sorted
get_asn_subnets.append
grouped.append
time.sleep
socket.inet_aton
warnings.filterwarnings
properties.get
super
hyphen_check_string.split.split
context_ref_to_date_text
returnName.replace.replace
company.lower.rstrip.endswith
netaddr.IPNetwork
tag.text.split.lower
page.xpath.getchildren
int.replace
tag.text.str.split.split
process_dedup_filter
document.Documents
XBRLElement
elem.getchildren
result.append
process_ip_count
elem.tag.find
get_google_networks
temp.lower.split
cls.get_request
process_company_extension.append
elem.xpath
dict
address.split.split
argparse.ArgumentParser.add_argument
arin_org_addresses.append
company_name.lower.check_string.lower.split.str.isalpha
e.isalnum
Company.get_all_filings
format
enumerate
company.lower.replace
isinstance
get_statistics
all
self.all_companies_dict.items
urllib.parse.quote_plus
argparse.ArgumentParser
self.__parse_base_elem__
process_output_addresses
item.endswith
context.getchildren
edgar.Company
get_net_info
all_companies_page.content.decode.split
self._get
self.get_company_info
join
companyInfo.getchildren.getchildren
no_ip_data.append
ranges6.append
words.lower.lower
company_name.lower
href.split.split
get_org_poc_info
self.__get_text_from_list__
lxml.html.fromstring.find_class
company.replace.replace
company_name.replace.lower
lxml.html.fromstring.xpath
sub_list.append
lxml.html.fromstring
table.getchildren.getchildren
val.text_content
process_duplicate_ranges
requests.get.xpath
get_key_tag
findnth
sub_company_list.append
edgar.Company.get_all_filings
all_tags.count
process_output_name
edgar.Company.lower
USER_AGENT.edgar.Edgar.get_cik_by_company_name
process_netblock_confidence
get_poc_info
bs4.BeautifulSoup.findAll
Company
html.unescape
bs4.BeautifulSoup
poc_list_unique.append
line.split.lstrip
result_list2.append
item.split
names_list.append
range
fuzzywuzzy.fuzz.partial_ratio
XBRL.is_parsable
context_ref.split
company.lower.endswith
str
XBRL.clean_tag
sub_list.split
datetime.datetime.strptime
unique_tags.append
csv.writer
join.isalnum
target.lower
get_customer_info
os.path.isfile
process_geolocation
context_ref.find
self.child.text.replace.strip
child.attrib.get
edgar.Company.rstrip
recent.append
doc.tostring.decode
tag.text.str.split.rstrip
s.datetime.strptime.date.strftime
unique_companies.append
check_string.lower.split
os.path.basename
data_file.write
self.get_all_filings
tag.text.str.split
get_usage
requests.packages.urllib3.disable_warnings
elem.xpath.getchildren
asn_dups.append
process_addresses
company.title.rstrip
arin_pocs.append
self.__parse_context__
elem.name.lower
req.text.encode
potential_name.replace
open
sorted.append
int
main.append
int.append
name_check.lower
dups.append
i.parsed.str.rstrip
list_companies.append
get_arin_objects
list
words.lower.split
html.text_content.replace
ip.split.split
requests.post
row.getchildren
operator.itemgetter
process_output_name.endswith
get_ip_coordinates
doc.getchildren
lxml.etree.fromstring
super.__init__
sub_list.join.lower.replace
get_subsidiaries
lxml.etree.tostring
self.child.getchildren
set
process_url_encode
all_tags.append
float
edgar.Edgar
Company.__get_documents_from_element__
json.loads
self.context_ref.get
self.get_filings_url
warnings.catch_warnings
company_strings.append
tuple
Company.get_request.find_class
self.getchildren
cls.get_request.xpath
print
glob.glob
row.getchildren.getchildren
i.isdigit
url.replace.endswith
TXTML.get_HTML_from_document
requests.get
process_company_name.lower
argparse.ArgumentParser.parse_args
re.match
int.split
bs4.BeautifulSoup.find_all
all_companies_array_rev.append
self.child.text.replace
input
csv_headers.append
company_name.replace
basic_ranges.append
process_confidence_threshold
process_company_extension
address_list.count
Edgar.split_raw_string_to_cik_name

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

Tagging a release

I would like to contribute by packaging RPMs for this project to facilitate easy installation on CentOS/RHEL.

However, there are no release versions added. Can you please consider tagging a release?

Great work on NetblockTool.

Parsing issues with Subsidiaries

Parsing issues with Subsidiaries
There a few different errors that occur related to subsid parsing. Below are some examples.

Example 1

python3 NetblockTool.py -v "[]REDACTED]"                                                  
  _   _      _   _     _            _    _____           _
 | \ | | ___| |_| |__ | | ___   ___| | _|_   _|__   ___ | |
 |  \| |/ _ \ __| '_ \| |/ _ \ / __| |/ / | |/ _ \ / _ \| |
 | |\  |  __/ |_| |_) | | (_) | (__|   <  | | (_) | (_) | |
 |_| \_|\___|\__|_.__/|_|\___/ \___|_|\_\ |_|\___/ \___/|_|
 
[*] Retrieving networks using Google Dorking for[REDACTED] (usually < 30 pages)
  [*] Status: Scrape complete                    
[*] Retrieving ARIN objects using keyword [REDACTED]*
[*] Processing 133 retrieved ARIN objects
Traceback (most recent call last):est/customer/C05765541                     
  File "/home//[REDACTED]/NetblockTool/NetblockTool.py", line 1003, in get_customer_info
    return_street = parsed['customer']['streetAddress']['line']['$']
TypeError: list indices must be integers or slices, not str

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home//[REDACTED]/NetblockTool/NetblockTool.py", line 2033, in <module>
    results = main(arg_target, arg_company_name, query, arg_verbose, arg_threshold, arg_geo, arg_address, arg_address_grep, arg_version, arg_quiet, arg_poc, arg_no_google)
  File "/home//[REDACTED]/NetblockTool/NetblockTool.py", line 199, in main
    return_cust = get_customer_info(item)
  File "/home//[REDACTED]/NetblockTool/NetblockTool.py", line 1012, in get_customer_info
    address += ', '+str(parsed['customer']['streetAddress']['line'][i]['$']).rstrip()
KeyError: '$'

Example 2

OS: WSL Kali & Kali 2021.1
Command:  python3 NetblockTool.py -v "[REDACTED]" -s -sn "[REDACTED]" -sp
Error: 
`
  _   _      _   _     _            _    _____           _
 | \ | | ___| |_| |__ | | ___   ___| | _|_   _|__   ___ | |
 |  \| |/ _ \ __| '_ \| |/ _ \ / __| |/ / | |/ _ \ / _ \| |
 | |\  |  __/ |_| |_) | | (_) | (__|   <  | | (_) | (_) | |
 |_| \_|\___|\__|_.__/|_|\___/ \___|_|\_\ |_|\___/ \___/|_|
[*] Getting subsidiary information for [REDACTED]
  [*] Gathering company information for [REDACTED] from EDGAR database
Traceback (most recent call last):
  File "/home/[REDACTED]/Downloads/NetblockTool/NetblockTool.py", line 2080, in <module>
    for result in get_subsidiaries(arg_subsid_name, arg_verbose, arg_subsid_alt, arg_quiet):
  File "/home/[REDACTED]/Downloads/NetblockTool/NetblockTool.py", line 1487, in get_subsidiaries
    temp.append(edgar.Edgar().getCikByCompanyName(company))
  File "/home/[REDACTED]/.local/lib/python3.9/site-packages/edgar/edgar.py", line 31, in __init__
    all_companies_array[i] = (item_arr[0], item_arr[1])
IndexError: list index out of range`
Using the the tool without the subsidiaries flags works as intended.

Error Messages when executing tool.

[*] Retrieving networks using Google Dorking for facebook (usually < 30 pages)
Traceback (most recent call last):
File "NetblockTool.py", line 1725, in get_google_networks
network = str(netaddr.IPNetwork(tag.text.split(' ')[0]))
AttributeError: module 'netaddr' has no attribute 'IPNetwork'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "NetblockTool.py", line 2033, in
results = main(arg_target, arg_company_name, query, arg_verbose, arg_threshold, arg_geo, arg_address, arg_address_grep, arg_version, arg_quiet, arg_poc, arg_no_google)
File "NetblockTool.py", line 164, in main
google_networks = get_google_networks(target, verbose, quiet)
File "NetblockTool.py", line 1744, in get_google_networks
except netaddr.AddrFormatError:
AttributeError: module 'netaddr' has no attribute 'AddrFormatError'

Folder

Can a flag be added to specify an output directory?

Search by mnt-by

Is it possible to search by mnt-by?

My mail server is being attacked by tons of IP blocks with the mnt-by as "ashitt". I've tried a RIPE search on the ripe database query page but I'm not coming up with anything. Each time I look at the spam, it's by IP blocks where there are 50-100 all from the same block. If i could get a list of all ip blocks by that ashitt thing, I could probably keep this person or botnet from attacking.

Possible extension for NetblockTool ?

Hi all,
since i am currently testing some tools for blocking companies via an IPFire firewall, i wanted to made some tests according to the information gathering of CIDRs, ASNs and IPs . Have used therefor NetblockTool-2.1.0 and libloc (location 0.9.16) . A fast test to gather information about 'Cloudflare's CIDRs with both tools delivers the following results

$ wc -l cloudflare_cidrs_*
 2201 cloudflare_cidrs_libloc
  154 cloudflare_cidrs_netblocktool

whereby i´ am not 100% sure how this difference comes up. If you are interested testing this on your side, libloc should be available for Debian and Fedora, libloc comes also with native Python bindings so i thought i leave you a note about this.

Hope this is a useful contribution.

Best,

Erik

Request for Matching on Address

Consider attempting to match on the address as well to help with the confidence score.

Cannot search netblocks for what's apparently an unlisted Amazon subsidiary

When I try and use NetblockTool to search for Amazon Technologies Inc. (AT-88-Z) with or without the Inc I cannot produce any netblocks though they have dozens of /16 and even several /9 netblocks.

When I search Amazon and include subsidiaries I do not find those netblocks output by the tool. It is a non-trivial portion of the IPV4 space apparently missed. I'll try and dig into it why they're missed but I thought I should bring it to your attention.