Comments (5)
Thanks for your work here.
Does this actually result in a bug in bbot? I'm aware these regexes aren't perfect but they shouldn't be being used for validation; only for event type detection. The actual validation happens later via ipaddress
and urllib
.
These regexes were designed for speed and simplicity. In case where a full rfc-compliant regex is required, I'd much rather offload it to an official library (others have already written better validation):
try:
ipaddress.ip_address(data)
# it's an ip
except ValueError:
# it's a DNS name
So we can avoid situations this:
from bbot.
No existing module currently uses ipv6_regex directly. It just seems to get used as part of open port regexes and url regexes so perhaps indirectly it does.
I've started using it directly though, as I'm also interested in detecting as much IP addressing related to targets as possible, in particular in situations in which IP's are used directly instead of DNS names which while uncommon do occur particularly within internal networks.
I totally agree you'll want to avoid having to manage/maintain regex patterns and offloading it to a central library that's going to do a better job if it would be ideal.
That said... making patterns available for modules to use via bbot/core/helpers/regexes.py
seems to be the current approach to providing a simple and reliable interface to do that?
~/bbot$ grep -C 4 _regex\. bbot/core/helpers/dns.py
results.add((rdtype, self._clean_dns_record(record.target)))
elif rdtype == "TXT":
for s in record.strings:
s = self.parent_helper.smart_decode(s)
for match in dns_name_regex.finditer(s):
start, end = match.span()
host = s[start:end]
results.add((rdtype, host))
elif rdtype == "NSEC":
~/bbot$
~/bbot$ grep -E '_regex\.(match|find)' bbot/modules/*.py
bbot/modules/azure_tenant.py: matches = self.helpers.regexes.uuid_regex.findall(authorization_endpoint)
bbot/modules/azure_tenant.py: found_domains = list(set(self.d_xml_regex.findall(r.text)))
bbot/modules/digitorus.py: for match in extract_regex.finditer(content):
bbot/modules/git.py: if getattr(result, "status_code", 0) == 200 and "[core]" in text and not self.fp_regex.match(text):
bbot/modules/httpx.py: if tempdir.is_dir() and self.httpx_tempdir_regex.match(tempdir.name):
bbot/modules/__init__.py: if e.is_dir() and dir_regex.match(e.name) and not e.name == "modules":
bbot/modules/massdns.py: digits = self.digit_regex.findall(d)
bbot/modules/rapiddns.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/riddler.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/sslcert.py: if issuer.emailAddress and self.helpers.regexes.email_regex.match(issuer.emailAddress):
bbot/modules/sslcert.py: if subject.emailAddress and self.helpers.regexes.email_regex.match(subject.emailAddress):
bbot/modules/viewdns.py: if self.date_regex.match(table_cells[1].text.strip()):
bbot/modules/virustotal.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
~/bbot$
ipaddress only used by ipneighbour,
~/bbot$ grep ipaddress bbot/modules/*.py
bbot/modules/ipneighbor.py:import ipaddress
bbot/modules/ipneighbor.py: network = ipaddress.ip_network(f"{main_ip}/{netmask}", strict=False)
~/bbot$
None of them seem to use get_event_type()
as the test modules do though perhaps that's the best validation process after any form of extraction?
~/bbot$ grep get_event_type bbot/modules/*.py
~/bbot$
from bbot.
Ah okay, I'm starting to see your use case. Are you wanting to extract IP addresses from HTTP responses, etc.?
from bbot.
I should mention we have lots of helpers for converting to IP addresses/networks, parsing, validation, etc. that don't require you to import anything. From inside a module, these are available under self.helpers
.
from bbot.
#1399 has been merged into dev.
from bbot.
Related Issues (20)
- unstructured module (dev) doesn't work on arch HOT 5
- Run Tests on Multiple Linux Distros HOT 1
- Tool not moving on with no events in queue? HOT 6
- api key placeholders missing in fresh config HOT 1
- Ways to optimise memory usage? HOT 2
- Wayback misbehaving
- Presets: wait until .bake() to create target object HOT 1
- Merge parse_list_string() and chain_lists() HOT 1
- Fix Chicken-and-Egg Scenario with Targets HOT 1
- Optimize whitelists and blacklists to only consider hosts HOT 1
- Better tests for context discovery HOT 1
- Generic_SSRF tests sometimes fail
- Don't add subnets to whitelist + blacklist if their parent is already included HOT 1
- BBOT 2.0 URL Excavation TODOs HOT 2
- dnscommonsrv is slow on big targets HOT 1
- Dependencies fail to install in BBOT 2.0 HOT 2
- BBOT 2.0 multiprocessing oopsie HOT 2
- Scan can't start (no module named baddns) HOT 6
- Better tests for portscan module HOT 1
- Recursive decoding not working for STORAGE_BUCKETs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bbot.