This is an integration task that encapsulates selenium twice, which allows us to crawl different websites with a set of encapsulated code and perform different task types.
Now, it has some functions:
- log
- debug
- just need to implement 2 functions
- exception screenshot
- email api send to yourself
Currently, this repository contains:
This project uses Python Git Chrome. Go check them out if you don't have them locally installed.
git clone https://github.com/touero/carp.git
Recommend using Python's virtual environment
python -m venv venv
Activate virtual environment
source ./venv/bin/activate # Unix
.\venv\Scripts\activate # windows
Install packagers
pip install -r requriements.txt
Quiting from virtual environment
deactivate # Unix
.\Scripts\deactivate.bat # Windows
- default_config is used to configure tasks in run.py
- default email config is in config/smtp.yaml
- if you want to use it please build a new yaml
python run.py
As you can see, there are relatively few crawling parts. But, contribute to this under Python PEP-8
If you want to increase your robot following:
- please creating ***_robot.py in package of name is robots.
- Adding task's type and task's url in constants.py.
- Adding 1&2 in robots and urls in RobotMaster.
- Over writing your _str_'s and run_task's func in your robot.
- if you want to use email api [email=True] in local_runner and set your config/smtp.yaml
- Fixing task_type in local_runner and run it.
Real running instructions if using email api example:
python run.py -y config/smtp.yaml
- Python — All Algorithms implemented in Python.
- Selenium — A browser automation framework and ecosystem.
Currently, there is only Chrome driver, if you have a feat with other driver please submit PRs
How I wish I could add more content in this repo !
Feel free to dive in! Open an issue or submit PRs.
Standard Python follows the Python PEP-8 Code of Conduct.
This project exists thanks to all the people who contribute.