Comments (4)
These two flags should definitely work together. Please post a test case with the following information:
- The full expression, flags and mode passed to
hs_compile()
, - The data being scanned,
- The locations of the matches you are receiving, and where you expect them to be.
from hyperscan.
These are my flag's
flags.push_back(HS_FLAG_DOTALL | HS_FLAG_SOM_LEFTMOST );
This is My pattern to extract javascript :
1:/<script[^/][^>]>(.?)</script[^>]>|<javascript[^/][^>]>(.?)</javascript[^>]>/
This is My File :
<script type="text/javascript">//<![CDATA[si_ST=new Date;//]]></script>
lablablab
<script type="text/javascript">//<![CDATA[
_G.HT=new Date;
//]]></script></html>
lablablab
I want to use this (HS_FLAG_DOTALL) to detect New Line and use this (HS_FLAG_SOM_LEFTMOST) to get start point .
Comment : I used Escape HTML to copy My file and My pattern here . for your test would you please use this link
http://www.freeformatter.com/html-escape.html#ad-output to change it to unscaped mode
from hyperscan.
I think your escaped markup might have stripped some characters from your pattern - I'm assuming from your description that this is what it should look like: (on github, indenting with four spaces will make their Markdown support render text as code without formatting)
/<script[^/][^>]*>(.*?)</script[^>]*>|<javascript[^/][^>]*>(.*?)</javascript[^>]*>/
I think the issue here is that you are assuming backtracking semantics, whereas Hyperscan provides automata semantics. This means that instead of providing one "best match" that takes into account greedy/ungreedy repeats, alternation ordering, etc like PCRE, Hyperscan delivers all possible matches for a given regex. In these semantics, there is no difference between .*
and .*?
.
This is a fundamental difference from the way that a backtracking matcher like PCRE operates. We have a more detailed description of it in the Semantics section of the Hyperscan developer reference.
In this particular case, this is why SOM_LEFTMOST
is always reporting a from
offset of zero: your .*?
repeats in DOTALL
mode will match any sequence of characters, so the leftmost start of any match is the first occurrence a match for of <script[^/][^>]*>
, which is the <script type="text/javascript">
at offset 0 of your file.
I would suggest that the easiest way to use Hyperscan to extract the data between two script tags would be to split your pattern up into four patterns:
1:/<script[^/][^>]*>/
2:/</script[^>]*>/
3:/<javascript[^/][^>]*>/
4:/</javascript[^>]*>/
You can then track the offsets at which patterns 1 and 2 match, extracting the data between them, and similarly for 3 and 4.
from hyperscan.
Dear jviiret
Thank you a lot for your attention and explanation ;)
from hyperscan.
Related Issues (20)
- Recommendations for running hyperscan on multicore setup HOT 2
- Why is Hyperscan not support fat runtime on Windows platforms? HOT 3
- What is the relationship between ssse3, sse4.1, sse4.2, avx, and avx2 in CPU instruction sets?
- Large Size of hs.lib File Compiled Under Windows and Optimization Options
- Regarding hs_multi_compile and hs_scan functionality HOT 1
- Approximate match (edit distance and hamming distance)
- unit-test failed with '-march=core2' HOT 1
- Question: Would hyperscan benefit from stacked SRAM cache ?
- The issue concerning the presence of "NOT" in logical combinations. HOT 1
- Tjv
- Windows binaries HOT 1
- 'From' parameter on match callback when 'HS_MODE_STREAM' mode always as zero HOT 1
- Numbered repeat doesn't work if the lower number is omitted HOT 1
- mutiple databases use one scratch ,if a delete a database,what can i do for scratch? HOT 1
- is hyperscan abandoned? HOT 4
- encountering problems of "multiple definition of XXX" when compiling hyperscan in centos with x86_64 HOT 1
- QUEDAS_FRIAEscaneo🐧
- Hyperscan panics if bounded repeat is exactly 32767
- memory leak occurs when calling hs_compile
- giving pattern with null to hs_compile_lit_multi HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hyperscan.