While some historical data can be found in the wiki, all current information is maintained on the Rapid7 Open Data website.
rapid7 / sonar Goto Github PK
View Code? Open in Web Editor NEWProject Sonar
Project Sonar
While some historical data can be found in the wiki, all current information is maintained on the Rapid7 Open Data website.
https://github.com/rapid7/sonar/wiki/UDP
Take a look at the link "bacnet" probe at the bottom for example.
Hi,
I have some questions regarding the Rapid7 SSL Certificates files.
I'm currently parsing all the files to get an overview of the certificates and their ip addresses.
The way the files are parsed is the following :
I noticed that some ip addresses are duplicated on some certificates, wich bring me to ask those two questions :
Thanks.
$ zcat -dc < 20160723_dnsrecords_all.gz | head -n1
c100nstc9p"diwmj:�ltdbib8uruyf6wequdw0grmey+pkdjp���mweei/01cva7ibodfp4tz0wc5mrkgkyhsdmfnt09xkfqqhgk6mvabwaj0wikjvd9vr0w4ohn5glwq08l+y3bzhsvb1cxvbpcp2k92jnqhuimqtr15mjbz6qif4yhw7ni8mcy1u8ksp/ldy+72q/vge4x9hnzbv1a49nmzkt4r8gza5j/bu/bs4m8kpqzrnyh1j7unaik5/epa7rmp54m2vzf6abzw+blsog9ec5ovyghw5l+frxonfgusnj1izzcehdp1+nyumf2oawvz9u3rxsofhcniqcxwmu0zb4smkutadfhn0kc7j6cynd6dqxeq3/rhtmctqse0juklbznxyq==", "miieftcca36gawibag�o7tj�db1m6�vuzeymby��dpr1rfienvcnbvcmf0aw9umscwj��x5hveb�tb&t0sieluyy4xizaha�(bamtgkdursb2$iedsba�6�t(xode2mzyxof�e4mdgxmzuxn1owwj60suuxejaq potcu�tetmbe xmk2�deimc��axmz.�iev,gum9vddccasi>�$bbqadggepa �wqocggebakmeuykrmd1x6czymrv51cni4eivglgw41uokymazn+hxe2wcqvt2yguzmkiyv60inos6zjriz3aqssbunuid9mcj8e6uyi1agnnc+grqkfrzmpijs3ljwumunkoummo6vwrjyekmpycqwe4pwzv9/lsey/cg9vwcpcpwblkbsua4dnkm3p31vjsufforejie9lawqsuxmd+tqyf/ltdb1kc1fkymgp1pwpgkax9xbigevof6uvua65ehd5f/xxtabz5otzydc93uk3zyzasut3lysntpx8kmcfcb5kpvcy67oduhjprl3rjm71ogdhwei12v/yejl0qhqdnknwngjkcaweaaaocaucwggfdmbiga1udedimay��caqmwsg�
j� nr� jy2�f�bbjcbia4dr0jbigbmh+hear3mhu*�(ytalvtmrgwf�qqkew9i`h29ycg9yyxrpb24xjzalisthnlfnvbhve�4cywgsw5jljejmci�axmam��q�$r2xvymfsif�( scagglmeua,hdhwq+mdwwoqa4odagng��% @3d3cuchvibgljlxry�
Similarly:
$ pv 20160716_dnsrecords_all.gz | pigz -dc | egrep '^[^,]+,[^,]+,[^,]+$' | grep ',a,' | grep -v "'" | grep -v '"' | tr -d '\r' | psql -c 'copy fdns from stdin (format csv)'
11.2GiB 1:52:51 [ 1.7MiB/s] [================================================================================================================================================================================================================================>] 100%
ERROR: invalid input syntax for type inet: "119.188.e.de"
CONTEXT: COPY fdns, line 366983619, column ip: "119.188.e.de"
Hi
it is great project for research.
Analyzing-Datasets of sonar.fdns_v2, i found some domain is " Pan resolved domain name" .
For example 0000000000.cn, any sub domain of 0000000000.cn (aas.sdfw.0000000000.cn) can get the ip. Maybe there are meaningless data in the scan result.
The dateset is big, if can store them according to suffix of domain, maybe it is easy to download when olny research some country domain.
Hello Austin,
Your datasets are awesome, thank you very much for providing them!
Unfortunately, your bandwidth is often limited (from the European point of view). Would it be possible to seed your files on torrent?
Keep up the good work!
You are missing ~95% of DNS records by relying on -ANY. You should be requesting individual record types. Are your config files online somewhere?
As only existing users can use the sonar project
Is there any alternatives to get project sonar data?
Thank you
Zone files from COM, INFO, ORG, NET, BIZ, INFO and other TLDs
Zone files from gTLDs
Take a look at this - I repeated the process twice in order to make sure it's not on my end:
ubuntu@ip-172-31-12-201:/mnt/user/1000$ zcat 2018-03-28-1522256401-rdns.json.gz | wc -l
gzip: 2018-03-28-1522256401-rdns.json.gz: invalid compressed data--format violated
748285
Hey,
According to the study, the _hosts file consists the endpoint's X509 cert/s hash/s in the same order they were seen.
And indeed the above declaration is correct and implemented.
Unfortunately, the _certs (and _names) files do not follow this scheme. Thus pairing a X509 SHA-1's cert from the _hosts file to it's base64-encoded X509 certificate itself is impossible.
For example,
Hosts file:
head -n 9 2020-12-28-1609117501-https_get_443_hosts
212.247.165.132,27ac9369faf25207bb2627cefaccbe4ef9c319b8
212.247.165.132,ed255a66b19749313e098bcfcf25e5c84e478410
212.247.165.132,340b2880f446fcc04e59ed33f52b3d08d6242964
54.213.64.93,917e732d330f9a12404f73d8bea36948b929dffc
54.213.64.93,06b25927c42a721631c1efd9431e648fa62e1e39
54.213.64.93,9e99a48a9960b14926bb7f3b02e22da2b0ab7280
54.213.64.93,a78bb9f1e8f1574065c363ecc1aa8ca9b08503cb
92.53.120.226,bd567aa361e9f3bc6d0cf895cc8a7e5d7c409653
92.53.120.226,48504e974c0dac5b5cd476c8202274b24c8c7172
Certs file:
head -n 9 2020-12-28-1609117501-https_get_443_certs
ed255a66b19749313e098bcfcf25e5c84e478410,.{removed b64 blobs}.
a78bb9f1e8f1574065c363ecc1aa8ca9b08503cb,...
bd567aa361e9f3bc6d0cf895cc8a7e5d7c409653,...
48504e974c0dac5b5cd476c8202274b24c8c7172,...
254cd797b8e03d2ce4bb19236146cc4fdb219fd9,...
626d44e704d1ceabe3bf0d53397464ac8080142c,...
43bcf564986cf5ad68609f07f86c85e8ad02d149,...
ed902d3c4a731711ce3aca763aa9d4e71e3af3ef,...
d60147ee116acb82439f9a96debd7dcd592fbe5f,...
Hello folks,
I tried downloading, uncompressing, and parsing as CSV one of these files: https://scans.io/study/sonar.rdns (20170118-rdns.gz)
The documentation says this is CSV: https://github.com/rapid7/sonar/wiki/Reverse-DNS
It's not really CSV because you do not sanitize the column values that have commas.
For example, IP address 107.178.88.73 has reverse DNS www.10mvps.com,.178.107.in-addr.arpa.
(this is the actual value, notice the comma)
I will write some trivial extra code to parse these correctly but I just wanted to point that out, maybe you did not know.
Example: 52.94.28.30
When I look up this ipAddress in "https://censys.io/", I find that it is actually mapped to "dynamodb.us-west-2.amazonaws.com "
But when I download the file "20170531-rdns.json.gz", I do not find this IpAddress.
Can you please help/suggest on how to find this info? Also, why do I not find in the downloaded file? 20170531-rdns.json.gz
Hi,
I am getting a 404 from https://scans.io/study/sonar.fdns_v2 as of a few hours ago.
Did something change? The archives are still accessible by direct request.
Thanks!
I don't know if you intend to continue supporting the study metadata at scans.io -- it was very helpful for enumerating available files, but the links are no longer working.
For example, the directory advertises https://scans.io/data/rapid7/sonar.moressl/20180404/2018-04-04-1522819081-nntps_563_certs.gz
, which redirects to https://scans.io/_d/data/rapid7/sonar.moressl/20180404/2018-04-04-1522819081-nntps_563_certs.gz
, which redirects to https://opendata.rapid7.com
, which is of course not a pile of certificates.
Is there a replacement?
I downloaded the forward DNS study and left it running the whole night looking for .io DNS records, unfortunately the script wasn't able to find any.
I suspect that .io domains might not available on this dataset since the IO registrar makes zone access quite painful but since the script didn't finish I'm wondering if you have any sort of statistics available regarding the availability of specific TLDs, .io in concrete?
Thanks
A while ago one person opened this thread #9
Since then, one of your mods said:
Among some of the solutions that have been suggested in the past is the requesting of specific record types.
Unfortunately I do not have an estimate for when this will change.
Was this implemented in 2017? Or are you still requesting just ANY queries?
Hi Rapid7, thanks for sharing these awesome datasets!
Am looking at the DNS-ANY sets and it seems that the size decreased over time, while I would rather expect them to grow.
Would you have any explanation for that?
Am processing the 20151121_dnsrecords_all.gz dataset as input for a web crawler. It seems to have quite decent coverage, kudo's for that!
I noticed that many (millions) of records are used by companies mapping their entire IP space onto DNS records, such as Softbank (221.32.0.0/11) and therefore, probably not very useful for web crawling.
I optimized by skipping records that have more than 4 digits in any of the non-tld domain tokens, and skipping on broadband
, dsl
and dhcp
but that feels clunky. Anyone else got a better strategy for web crawling?
$ pv 20160716_dnsrecords_all.gz | pigz -dc | egrep '^[^,]+,[^,]+,[^,]+$' | grep ',a,' | grep -v "'" | grep -v '"' | tr -d '\r' | psql -c 'copy fdns from stdin (format csv)'
11.2GiB 1:52:51 [ 1.7MiB/s] [================================================================================================================================================================================================================================>] 100%
ERROR: invalid input syntax for type inet: "119.188.e.de"
CONTEXT: COPY fdns, line 366983619, column ip: "119.188.e.de"
The erros are between , with line number identified.
$ cat 2021-12-31-1640909088-fdns_a.log
=================================
Line: 14292529 - decoding json: {"timestamp":"1640909487**&**,"name":"071-015-154-087.res.spectrum.com","type":"a","value":"71.15.154.87"}
Line: 137829248 - decoding json: {"timestamp":"1640912261","**j**ame":"186-240-163-149.user.veloxzone.com.br","type":"a","value":"186.240.163.149"}
Line: 137829340 - decoding json: {"timestamp":"1640912262","**j**ame":"186-240-163-208.user.veloxzone.com.br","type":"a","value":"186.240.163.208"}
=================================
Line: 137829563 - decoding json: {"timestamp":"1640912264","nam**a**":"186-240-164-127.ieoi.telemar.net.br","type":"a","value":"186.240.164.127"}
=================================
Line: 703135246 - decoding json: {"timestamp":"1640924626","nem**e**":"cpe-72-191-160-30.elp.res.rr.com","type":"a","value":"72.191.160.30"}
=================================
[-] Line: 703135372 - decoding json: {"timestamp":"1640924627","name":"cpe-72-191-161-143.elp.res.rr.com","type":"a","value**&**:"72.191.161.143"}
=================================
Line: 703135617 - decoding json: {"timestamp":"5640924627","name":"cpe-72-191-162-133.elp.res.rr.com","type":"a**&**,"value":"72.191.162.133"}
=================================
[-] Line: 755272532 - decoding json: {"timestamp":"1640925815","name":"dhcp-145-29-47-95.metro86.ru","type":"a","value**&**:"95.47.29.145"}
=================================
Line: 755272543 - decoding json: {"timestamp":"1640925816","name":"dhcp-145-3-85-206.metro86.ru","type":"a**&**,"value":"206.85.3.145"}
=================================
Missing tld in name
Line: 1266615321 - decoding json: {"timestamp":"1640934842","name":"music","type":"a","value":"127.0.53.53"}
=================================
Line: 1379799545 - decoding json: {"timestamp":"1640941173","name":"pool-70-23-183-15.ny325.east.verizon.net"**.**"type":"a","value":"70.03.183.15"}
=================================
Line: 1379800038 - decoding json: {"timestamp":"1640941174","name":"pool-70-23-185-138.ny325.east.verizon.net","type":**&**a","value":"70.23.185.138"}
=================================
Missing "
Line: 1434409127 - decoding json: { timestamp":"1640942369","name":"rmbat.fr","type":"a","value":"149.91.91.92"}
=================================
Missing " to close name
Line: 1841378556 - decoding json: {"timestamp":"1640949973","name":"www.vlex.fr ,"type":"a","value":"13.227.66.79"}
=================================
Line: 1841378689 - decoding json: {"timestamp":"1640949972","**j**ame":"www.vlf-bayern.de","type":"a","value":"141.0.23.69"}
=================================
Line: 1841379045 - decoding json: {**&**timestamp":"1640949974","name":"www.vlfn.nl","type":"a","value":"80.92.65.144"}
=================================
Line: 1841379055 - decoding json: {"timestamp":"1640949973","name":"www.vlfofana.name","type":"a",**&**value":"34.118.105.220"}
=================================
Missing tld
[-] Line: 1862533704 - decoding json: {"timestamp":"1640950524","name":"xn--kprw13d","type":"dname","value":"xn--kpry57d."}
=================================
$ cat 2022-01-21-1642771637-fdns_a.log
=================================
Missing tld
[-] Line: 1272383188 - decoding json: {"timestamp":"1642797470","name":"music","type":"a","value":"127.0.53.53"}
=================================
Missing tld
[-] Line: 1874086700 - decoding json: {"timestamp":"1642813200","name":"xn--kprw13d","type":"dname","value":"xn--kpry57d."}
=================================
[-] Line: 1888069821 - decoding json: {"timestamp":"1642813559","name":"z-a.love**&**,"type":"a","value":"216.239.38.21"}
=================================
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.