Python developer, freelancer and open source enthusiast
dmitriiweb / extract-emails Goto Github PK
View Code? Open in Web Editor NEWExtract emails from a given website
License: MIT License
Extract emails from a given website
License: MIT License
Python developer, freelancer and open source enthusiast
I don’t have much knowledge about python, so I don’t know how to start using it.
Add changelog part to docs
Need to add more examples of usage to Quick Start part
hi @dmitriiweb can you help with this issues below after i pip installed the extract-emails module then i copied the example code to run it but i keep getting the errors below
from extract_emails import DefaultFilterAndEmailFactory as Factory
ImportError: cannot import name 'DefaultWorker' from 'extract_emails'
Hi!
Thanks for this awesome project. I've already used it for the extraction of emails, but can this also be used for the extraction of social media (LinkedIn) accounts?
Thanks!
Create CONTRIBUTING.md file with rules and tips
Add support for playwright
https://playwright.dev/python/docs/intro/
save as CSV doesn't append to CSV file when running on loop
Need to add tests for uncover parts of the code
Hi,
Again, thanks for your job. It am experimenting it before to scale. It is great but I get issue when it doesn't find emails obviously.
I run this simple script for testing:
from extract_emails import ExtractEmails
em = ExtractEmails("http://www.formationgrowthhacking.com/", depth=None, print_log=False, ssl_verify=False, user_agent=None, request_delay=0.0)
emails = em.emails
print(emails)
I get these errors:
Traceback (most recent call last):
File "C:/Users/Nino/PycharmProjects/EmailVerif/github_extract-email/extract_emails/myextrator.py", line 4, in
em = ExtractEmails("http://www.formationgrowthhacking.com/", depth=None, print_log=False, ssl_verify=False, user_agent=None, request_delay=0.0)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 30, in init
self.extract_emails(url)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 43, in extract_emails
self.extract_emails(new_url)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 43, in extract_emails
self.extract_emails(new_url)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 43, in extract_emails
self.extract_emails(new_url)
[Previous line repeated 30 more times]
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 36, in extract_emails
self.get_all_links(r.text)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 59, in get_all_links
tree = html.fromstring(page)
File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\lxml\html_init_.py", line 875, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\lxml\html_init_.py", line 761, in document_fromstring
value = etree.fromstring(html, parser, **kw)
File "src\lxml\etree.pyx", line 3234, in lxml.etree.fromstring
File "src\lxml\parser.pxi", line 1871, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
Could you please help me to fix this issue?
How does the depth scan parameter works? I assumed that a parameter value of 1 will mean search www.example.com and 2 will mean www.example.com/contactus. But it doesn't seem to work, even the log says URLs = 1. Can you please help? Many thanks
Hello,
after pip install extract_emails and trying to run your sample code i got the error below. What could be the problem. Thanks in advance and between pip install extract_emails[all] is not working, why?
Traceback (most recent call last):
File "/Users/user/My Drive/emailextractfromURL/olu.py", line 3, in
from extract_emails import DefaultFilterAndEmailFactory as Factory
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/init.py", line 2, in
from .factories import (
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/factories/init.py", line 1, in
from .base_factory import BaseFactory
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/factories/base_factory.py", line 6, in
from extract_emails.link_filters import LinkFilterBase
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/link_filters/init.py", line 1, in
from .contact_link_filter import ContactInfoLinkFilter
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/link_filters/contact_link_filter.py", line 7, in
class ContactInfoLinkFilter(LinkFilterBase):
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/link_filters/contact_link_filter.py", line 53, in ContactInfoLinkFilter
contruct_candidates: list[str] | None = None,
TypeError: unsupported operand type(s) for |: 'types.GenericAlias' and 'NoneType'
hi @dmitriiweb please can you help identify what's wrong i tired running your example also from the doc and i've pip install the latest extract_emails v5.0.2 but i'm not getting any response or output seems there's an issues somewhere.
Did you run the example from your end also ?
Hi,
Thanks for your work.
I installed and followed your instruction. I get these errors:
Traceback (most recent call last):
File "C:/Users/Nino/PycharmProjects/EmailVerif/emailverif.py", line 8, in <module>
em = ExtractEmails(url, depth=None, print_log=False, ssl_verify=True, user_agent=None, request_delay=0.0)
File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\extract_emails\extract_emails.py", line 31, in __init__
self.extract_emails(url)
File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\extract_emails\extract_emails.py", line 38, in extract_emails
self.get_emails(r.text)
File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\extract_emails\extract_emails.py", line 53, in get_emails
domains = self.get_domains()
File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\extract_emails\extract_emails.py", line 61, in get_domains
with open(DOMAINS_FAIL, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Nino\\PycharmProjects\\EmailVerif\\venv\\lib\\site-packages\\extract_emails\\top_level_domains.pkl'
Process finished with exit code 1
Could you provide the missing file please?
Kind regards
Hello
how can I extract emails from the list of website by using your tool?
thanks
Hey, hope you find this as I'm really using your tool for an important project of mine.
I've been trying many ways to run "pip install extract_emails" and I even tried running setup.py but they all give me this same error:
I tried so far with Python 3.7, then after 3.6 as I realized that's what the requirements showed, but still same results. I've tried many different solutions but none have worked so far, do you think you could help me out with this?
Add support for httpx
https://www.python-httpx.org
The ExtractEmails function is returning email addresses as ".co" and not ".co.uk". This may be an issue with the regex?
Example URL: http://www.aubreypark.co.uk/
Need to add descriptions and examples of how to create and use custom elements (filters, browser, factories, etc.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.