停止更新, 项目整合到Virtual Judge
- Random UserAgent Support
- Simulate Login Support
- Form Submit Support
- sudo pip install virtualenvwrapper
- add the following lines to your ~/.bashrc:
if [ -f /usr/local/bin/virtualenvwrapper.sh ]; then
export WORKON_HOME=$HOME/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh
fi
- source ~/.bashrc
- git clone https://github.com/Junnplus/OnlineJudge_Crawler_Core.git && cd OnlineJudge_Crawler_Core
- mkvirtualenv OJCC
- pip install -r requirements.txt
抓取Origin_OJ现有的所有题目
scrapy crawl `origin_oj`_init
Example:
scrapy crawl poj_init
scrapy crawl `origin_oj`_problrm -a problem_id=''
- argument
- problem_id
Example:
scrapy crawl poj_problrm -a problem_id='1000'
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging
setting = get_project_settings()
def problem_crawl(origin_oj, problem_id):
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
process = CrawlerProcess(settings)
process.crawl(origin_oj + '_problem', problem_id=problem_id)
process.start()
提交的代码需要通过 base64 编码
scrapy crawl `origin_oj`_submit -a problem_id='' -a language='' -a source='' -a username='' -a password=''
- argument
- problem_id
- language
default: g++
- source
base64 编码
- username
- password
各大OJ语言支持
origin_oj | language |
---|---|
POJ | gcc , g++ , java , pascal , c , c++ , fortran |
HDU_OJ | gcc , g++ , java , pascal , c , c++ , c# |
SDUT_OJ | gcc , g++ , java , pascal , go , lua , dao , perl , ruby , haskell , python2 , python3 |
FZU_OJ | gcc , g++ , java , pascal , c , c++ |
Example:
scrapy crawl sdut_submit -a problem_id='1000' -a language='gcc' -a source='I2luY2x1ZGUgPHN0ZGlvLmg+CgppbnQgbWFpbigpCnsKICAgIGludCBhLGI7CiAgICBzY2FuZigiJWQgJWQiLCZhLCAmYik7CiAgICBwcmludGYoIiVkXG4iLGErYik7CiAgICByZXR1cm4gMDsKfQ==' -a username='sdutacm1' -a password='sdutacm'
# ...
def code_submit(origin_oj, problem_id, language, source, username, password):
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
process = CrawlerProcess(settings)
process.crawl(origin_oj + '_submit', problem_id=problem_id, language=language, source=source, username=username, password=password)
process.start()
scrapy crawl `origin_oj`_user -a username='' -a password=''
- argument
- username
- password
Example:
scrapy crawl sdut_user -a username='sdutacm1' -a password='sdutacm'
# ...
def account_info(origin_oj, username, password):
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
process = CrawlerProcess(settings)
process.crawl(origin_oj + '_user', username=username, password=password)
process.start()
git checkout account-non-pw
scrapy crawl `origin_oj`_user -a username=''
- argument
- username # sdut use argument value is userid
Example:
scrapy crawl sdut_user -a username='15940'
scrapy crawl poj_user -a username='sdutacm1'