Code Monkey home page Code Monkey logo

major-scrapy-spiders's People

Contributors

talhashraf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

major-scrapy-spiders's Issues

raise KeyError("Spider not found: %s" % spider_name)

Hi talhashraf,
Thanks for your's project, i run with facebook crawl but i had error :

Traceback (most recent call last):
  File "c:\python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "c:\python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\Scripts\scrapy.exe\__main__.py", line 9, in <module>
  File "c:\python27\lib\site-packages\scrapy\cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "c:\python27\lib\site-packages\scrapy\cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "c:\python27\lib\site-packages\scrapy\cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "c:\python27\lib\site-packages\scrapy\commands\crawl.py", line 58, in run
    spider = crawler.spiders.create(spname, **opts.spargs)
  File "c:\python27\lib\site-packages\scrapy\spidermanager.py", line 44, in create
    raise KeyError("Spider not found: %s" % spider_name)
KeyError: 'Spider not found: https://m.facebook.com/PhanKhanhHung?refid=46'
'sld' is not recognized as an internal or external command,
operable program or batch file.
'fref' is not recognized as an internal or external command,
operable program or batch file.

Can you help me solve that error ?
Thanks so much !!!

KeyError: 'Spider not found: mss/spiders/instagram'

Hi,

Trying to run the app but gets this error continuously..

root@2031f496dec5:/# scrapy crawl instagram
2018-03-08 12:56:53 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: mss)
2018-03-08 12:56:53 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'mss', 'NEWSPIDER_MODULE': 'mss.spiders', 'SPIDER_MODULES': ['mss.spiders'], 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 69, in load
    return self._spiders[spider_name]
KeyError: 'mss/spiders/instagram'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/usr/local/lib/python3.6/site-packages/scrapy/cmdline.py", line 149, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/local/lib/python3.6/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/usr/local/lib/python3.6/site-packages/scrapy/cmdline.py", line 156, in _run_command
    cmd.run(args, opts)
  File "/usr/local/lib/python3.6/site-packages/scrapy/commands/crawl.py", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 167, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 195, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 199, in _create_crawler
    spidercls = self.spider_loader.load(spidercls)
  File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 71, in load
    raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: mss/spiders/instagram'

KeyError: 'Spider not found: facebook'

(scrapy-splash) user@user-desktop:~/mss$ scrapy crawl facebook
2022-10-11 08:39:55 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: mss)
2022-10-11 08:39:55 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'mss', 'NEWSPIDER_MODULE': 'mss.spiders', 'SPIDER_MODULES': ['mss.spiders'], 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0'}
Traceback (most recent call last):
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/spiderloader.py", line 69, in load
return self._spiders[spider_name]
KeyError: 'facebook'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/anaconda3/envs/scrapy-splash/bin/scrapy", line 8, in
sys.exit(execute())
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/cmdline.py", line 149, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/cmdline.py", line 156, in _run_command
cmd.run(args, opts)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/crawler.py", line 167, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/crawler.py", line 195, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/crawler.py", line 199, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/spiderloader.py", line 71, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: facebook'
(scrapy-splash) user@user-desktop:/mss$ ls
data mss README.md requirements.txt scrapy.cfg
(scrapy-splash) user@user-desktop:
/mss$ cd mss/
(scrapy-splash) user@user-desktop:/mss/mss$ ls
init.py items.py pipelines.py pycache settings.py spiders utils
(scrapy-splash) user@user-desktop:
/mss/mss$ cd spiders/
(scrapy-splash) user@user-desktop:~/mss/mss/spiders$ scrapy crawl facebook
2022-10-11 08:40:41 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: mss)
2022-10-11 08:40:41 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'mss', 'NEWSPIDER_MODULE': 'mss.spiders', 'SPIDER_MODULES': ['mss.spiders'], 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0'}
Traceback (most recent call last):
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/spiderloader.py", line 69, in load
return self._spiders[spider_name]
KeyError: 'facebook'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/anaconda3/envs/scrapy-splash/bin/scrapy", line 8, in
sys.exit(execute())
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/cmdline.py", line 149, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/cmdline.py", line 156, in _run_command
cmd.run(args, opts)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/crawler.py", line 167, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/crawler.py", line 195, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/crawler.py", line 199, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "/home/user/anaconda3/envs/scrapy-splash/lib/python3.8/site-packages/scrapy/spiderloader.py", line 71, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: facebook'

Running on conda with python==3.8 on ubuntu 22.04LTS kernel 5.15.0-48-generic

ValueError: Unknown string format

I've installed all the requirement packages. but when I run "scrapy crawl GooglePlayStore", It says

2017-11-17 10:53:11 [scrapy.core.scraper] ERROR: Spider error processing <GET https://play.google.com/store/apps/details?id=com.FDGEntertainment.TowerBoxing.gp> (referer: https://play.google.com/store/apps/collection/promotion_3001b85_impossible_games?clp=SjcKKAoicHJvbW90aW9uXzMwMDFiODVfaW1wb3NzaWJsZV9nYW1lcxAHGAMSC0dBTUVfQVJDQURF:S:ANO1ljJlLEA)
Traceback (most recent call last):
File "c:\users\jarvan\miniconda3\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "D:\Scrapy_data\mss\mss\spiders\google\playstore.py", line 80, in parse_app
if last_updated else ''),
File "c:\users\jarvan\miniconda3\lib\site-packages\dateutil\parser.py", line 1182, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "c:\users\jarvan\miniconda3\lib\site-packages\dateutil\parser.py", line 559, in parse
raise ValueError("Unknown string format")
ValueError: Unknown string format

I want to know how to fix this error.

Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

Hi talhashraf,

I run crawl facebook but i didn't get anything.
This is what print on console when i run script :

2016-12-10 01:04:17+0700 [scrapy] INFO: Scrapy 0.24.4 started (bot: mss)
2016-12-10 01:04:17+0700 [scrapy] INFO: Optional features available: ssl, http11, django
2016-12-10 01:04:17+0700 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'mss.spiders', 'FEED_URI': 'data.csv', 'SPIDER_MODULES': ['mss.spiders'], 'BOT_NAME': 'mss', 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0', 'FEED_FORMAT': 'csv'}
2016-12-10 01:04:17+0700 [scrapy] INFO: Enabled extensions: FeedExporter, LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2016-12-10 01:04:19+0700 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-12-10 01:04:19+0700 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-12-10 01:04:19+0700 [scrapy] INFO: Enabled item pipelines:
2016-12-10 01:04:19+0700 [fb] INFO: Spider opened
2016-12-10 01:04:19+0700 [fb] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-10 01:04:19+0700 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-10 01:04:19+0700 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2016-12-10 01:04:20+0700 [fb] DEBUG: Crawled (200) <GET https://m.facebook.com/> (referer: None)
2016-12-10 01:04:20+0700 [fb] DEBUG: Redirecting (302) to <GET https://m.facebook.com/home.php?refsrc=https%3A%2F%2Fm.facebook.com%2F&refid=8&_rdr> from <POST https://m.facebook.com/login.php?refsrc=https%3A%2F%2Fm.facebook.com%2F&lwv=100&login_try_number=1&refid=8>
2016-12-10 01:04:21+0700 [fb] DEBUG: Crawled (200) <GET https://m.facebook.com/home.php?refsrc=https%3A%2F%2Fm.facebook.com%2F&refid=8&_rdr> (referer: https://m.facebook.com/)
2016-12-10 01:04:21+0700 [fb] INFO: Closing spider (finished)
2016-12-10 01:04:21+0700 [fb] INFO: Dumping Scrapy stats:
        {'downloader/request_bytes': 1763,
         'downloader/request_count': 3,
         'downloader/request_method_count/GET': 2,
         'downloader/request_method_count/POST': 1,
         'downloader/response_bytes': 26811,
         'downloader/response_count': 3,
         'downloader/response_status_count/200': 2,
         'downloader/response_status_count/302': 1,
         'finish_reason': 'finished',
         'finish_time': datetime.datetime(2016, 12, 9, 18, 4, 21, 979000),
         'log_count/DEBUG': 5,
         'log_count/INFO': 7,
         'request_depth_max': 1,
         'response_received_count': 2,
         'scheduler/dequeued': 3,
         'scheduler/dequeued/memory': 3,
         'scheduler/enqueued': 3,
         'scheduler/enqueued/memory': 3,
         'start_time': datetime.datetime(2016, 12, 9, 18, 4, 19, 66000)}
2016-12-10 01:04:21+0700 [fb] INFO: Spider closed (finished)

How to fix this problem ?
Thanks very much !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.