Comments (6)
here's a simple script that should hopefully help:
import sys
import hyperscan
def parse_patterns(filename):
patterns = []
with open(filename) as f:
for line in f:
if line.startswith('#'):
continue
columns = [chunk.strip() for chunk in line.split('\t')]
if len(columns) != 3:
continue
columns[0] = int(columns[0])
columns[1] = columns[1].encode()
patterns.append(columns)
return patterns
def match_handler(pattern_id, start, end, flags, context):
pattern_name = context['names'][pattern_id]
substr = context['input'][start:end]
print(f'[match] {pattern_name}: `{substr}`')
def main(args):
if len(args) != 2:
print(f'usage: {sys.argv[0]} [patterns file] [input]')
sys.exit(1)
pattern_filename, input_str = args
patterns = parse_patterns(pattern_filename)
db = hyperscan.Database()
pattern_ids, expressions, names = zip(*patterns)
db.compile(expressions=expressions, ids=pattern_ids)
context = {
'input': input_str,
'names': {
pattern_id: names[i]
for i, pattern_id in enumerate(pattern_ids)
},
}
db.scan(
input_str,
match_event_handler=match_handler,
context=context,
)
if __name__ == '__main__':
main(sys.argv[1:])
python-hyperscan on master [$?] is 📦 v0.1.5 via 🐍 v3.8.2rc2+ (hyperscan-nTr-HRfE-py3.8)
❯ python hsmatch.py patterns.txt git://foo.git
[match] GitRepoPattern: `git://foo.git`
python-hyperscan on master [$?] is 📦 v0.1.5 via 🐍 v3.8.2rc2+ (hyperscan-nTr-HRfE-py3.8)
❯ python hsmatch.py patterns.txt foo.onion
[match] OnionDomain: `foo.onion`
The unit tests should be another resource you can use, as they cover most of the Python interface.
from python-hyperscan.
Awesome, I ll try it now, keep u updated.
Thanks again
from python-hyperscan.
I tried with the following input
python main.py patterns.txt "foo.onion git://foo.git"
and it returned me
[match] OnionDomain: `foo.onion`
[match] GitRepoPattern: `foo.onion git://foo.git`
Is there an easy way to substr only the matched pattern ?
from python-hyperscan.
see Start of Match
replace the db.compile
lines with this:
db.compile(
expressions=expressions,
ids=pattern_ids,
flags=hyperscan.HS_FLAG_SOM_LEFTMOST,
)
from python-hyperscan.
It works better.
python main.py patterns.txt "foo.onion git://foo.git"
[match] OnionDomain: `foo.onion`
[match] GitRepoPattern: `git://foo.git`
But I tried the following:
python main.py patterns.txt "[email protected] 1HB5XMLmzFVj8ALj6mfBsbifRoD4miY36v https://twitter.com/x0rxkov random test sentence https://twitter.com/twitter"
and got this output:
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRo`
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRoD`
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRoD4`
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRoD4m`
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRoD4mi`
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRoD4miY`
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRoD4miY3`
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRoD4miY36`
[match] BtcAddressPattern: `1HB5XMLmzFVj8ALj6mfBsbifRoD4miY36v`
Is my regex for bitcoin not properly written ?
Cheers,
X
from python-hyperscan.
try adding a \b
at the end of the btc pattern
from python-hyperscan.
Related Issues (20)
- symbol not found in flat namespace '_ch_alloc_scratch' HOT 10
- Add args for early termination of scanning if only need to find one match regex or just judging matched
- document release/publish process
- Illegal instruction crash on import HOT 1
- Handling scan termination from match callback could be cleaner HOT 2
- switch to vectorscan HOT 2
- Request for maintainer(s) HOT 5
- multiprocessing problem.
- Memory leak in Database object when compiling, dumping and loading. HOT 15
- Strange "hyperscan.InvalidError: error code -1" HOT 6
- Named capture groups with Chimera
- install by Dockerfile
- v0.2.0 does not work with Python 3.10 HOT 5
- ModuleNotFoundError: No module named 'hyperscan._hyperscan'
- when it will be ready for windows ? HOT 1
- Please do a release HOT 1
- Problem with musl and fat runtime? HOT 5
- pypi don't have py3.9 whl release and tar.gz with source code HOT 2
- Python 3.10 using error
- Can't import from hyperscan in python 3.12 HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-hyperscan.