Code Monkey home page Code Monkey logo

commonregex's People

Contributors

8enmann avatar arvindraj17 avatar aufdemrand avatar everpeace avatar hack4money avatar hskang9 avatar james2doyle avatar jasonkessler avatar madisonmay avatar qw1mb0 avatar ralic avatar talyssonoc avatar vasilcovsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

commonregex's Issues

Reading from CSV error - TypeError: expected string or bytes-like object

Traceback (most recent call last):
File "csv_parse.py", line 14, in
parsed_text = CommonRegex({row[2]})
File "/Users/mviraktamath/.pyenv/versions/3.8.6/lib/python3.8/site-packages/commonregex.py", line 53, in init
setattr(self, key, method())
File "/Users/mviraktamath/.pyenv/versions/3.8.6/lib/python3.8/site-packages/commonregex.py", line 39, in regex_method
return [x.strip() for x in self.regex.findall(text or self.obj.text)]
TypeError: expected string or bytes-like object

Regex for HTML/XHTML

adding support for identifying tags and their attributes as well as replacement of attribute values making it more awesome. I am half way down there need some testing.

empty string results in object names being assigned

test = CommonRegex('')
test.dates
<function regex.call..regex_method at 0x000000000389F048>
test.dates()
[]
test = CommonRegex('asdasd')
test.dates
[]
test.dates()
Traceback (most recent call last):
File "", line 1, in
TypeError: 'list' object is not callable

This is on python 3.3.3 on Win 7 x64bit

I won't be surprised if this if my failure to understand a concept :)
Obvious workaround to detect empty strings before passing to CommonRegex is obvious.

costs much cpu resource & time

Thanks for build this commonregex and convenient to our works,but i still have a question.
Why it costs too much time?Also my cpu temperature rise rapidly.(the length of my content is 105851 bytes)

Lazy version

You just apply all regex on create instance of class. IMO is much better to do it in lazy way usign sth like cached properites.

Support for non-US phone numbers

Cool as this already is, it would be even cooler if it supported non-US phone numbers. I'd try and do it myself, but given how little I currently know about regular expressions I'd probably be more of a hindrance than a help.

AttributeError: 'CommonRegex' object has no attribute 'ssn_number'

I am attempting to parse the following test-data.txt with version commonregex==1.5.4:

2523088780
social security number: 428-34-4474
this is far less expensive than the alternative
114 jeffery street
usa
from commonregex import CommonRegex

with open('./test-data.txt') as data:
    parsed_text = CommonRegex(data.read())

and receiving the error:

parsed_text.ssn_number 

>>> AttributeError: 'CommonRegex' object has no attribute 'ssn_number'

no problems with emails, phones, etc:

>>> parsed_text.emails

Appreciate it

AttributeError: 'CommonRegex' object has no attribute - ssn_number and zip_codes

Hi, nice tool, I have been testing it and works well. I'd love to find out if it is just me or there is a bug somewhere when using zip_codes and ssn_number options, this is my script and works fine as long as I have the two last lines commented out.

from commonregex import CommonRegex
print("STARTING COMMON REGEX")
parsed_text = CommonRegex("""SOME TEXT HERE""")
print(parsed_text.times)
print(parsed_text.emails)
print(parsed_text.links)
print(parsed_text.phones)
print(parsed_text.street_addresses)
print(parsed_text.btc_addresses)
print(parsed_text.credit_cards)
print(parsed_text.prices)
print(parsed_text.ipv6s)
print(parsed_text.ips)
print(parsed_text.dates)
#print(parsed_text.zip_codes)
#print(parsed_text.ssn_number)

Thank you!

PO Boxes not working

I got this error on the most recent version of Commonregex (1.5.4):
AttributeError: 'CommonRegex' object has no attribute 'po_boxes'

My code just creates a parser object and tries to retrieve po_boxes.

Return position of matched text

Instead of returning an array of literals, return an array of objects of matched text and start position inside the original parsed text. Also, it would be good to have a list of all matched texts sorted by their position on the original text.

Price Regex Bug

There seems to be an issue parsing prices. The extracted price figure is sometimes truncated for values in the thousands or greater

Example:
cr.CommonRegex("$6000").prices return ["$600"] instead of ["$6000"]

IPv6 Support

Can we get IPv6 support on .ip? Wouldn't mind if it were a .ipv6, actually, to keep it separate.

CommonRegex("2001:0db8::ff00:0042:8329").ip
[]

Street Address can't parse following

3015 POE RD., HOUSTON, TX 77051

Do I need to do something first, or is this something that can be corrected in the street parser?

Oh, the call I am making is
for sa in street_address.finditer(res):
res = sa.string
break

This is working for other addresses but first one it failed on. Also unlike RE, it stopped on a \n and ignored the next line. Do I need to pre-parse the text?

Verbose regex?

Is it possible to maintain a .md file with verbose regexes so contributors can understand it better and maybe improve or add features?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.