Code Monkey home page Code Monkey logo

scraping-ebay's People

Contributors

ax6 avatar bradbase avatar cpatrickalves avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scraping-ebay's Issues

Error at normal running

System: macOS 10.12.6, Anaconda and pip with required packages, Python 3.7
I have tried both the normal scrapy crawl ebay -o products.csv and the serch string scrapy crawl ebay -o products.csv -a search="Xbox one X" executions.
They both give me the following message and a 0 Kb csv file:

(base) COMPUTER:scraping-ebay-master USERNAME$ scrapy crawl ebay -o products.csv 2019-09-12 00:35:58 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: scraping_ebay) 2019-09-12 00:35:58 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.1, w3lib 1.20.0, Twisted 19.7.0, Python 3.7.3 (default, Mar 27 2019, 16:54:48) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Darwin-16.7.0-x86_64-i386-64bit 2019-09-12 00:35:58 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'scraping_ebay', 'FEED_FORMAT': 'csv', 'FEED_URI': 'products.csv', 'NEWSPIDER_MODULE': 'scraping_ebay.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['scraping_ebay.spiders']} 2019-09-12 00:35:58 [scrapy.extensions.telnet] INFO: Telnet Password: be0a708b17f43b19 2019-09-12 00:35:58 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2019-09-12 00:35:58 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-09-12 00:35:58 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-09-12 00:35:58 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-09-12 00:35:58 [scrapy.core.engine] INFO: Spider opened 2019-09-12 00:35:58 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-09-12 00:35:58 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-09-12 00:35:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/robots.txt> (referer: None) 2019-09-12 00:35:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com> (referer: None) 2019-09-12 00:35:59 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET http://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=nintendo+switch+console&_ipg=200> 2019-09-12 00:35:59 [scrapy.core.engine] INFO: Closing spider (finished) 2019-09-12 00:35:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1, 'downloader/request_bytes': 663, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 45542, 'downloader/response_count': 2, 'downloader/response_status_count/200': 2, 'elapsed_time_seconds': 1.179395, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 9, 11, 22, 35, 59, 912975), 'log_count/DEBUG': 3, 'log_count/INFO': 10, 'memusage/max': 50532352, 'memusage/startup': 50532352, 'request_depth_max': 1, 'response_received_count': 2, 'robotstxt/forbidden': 1, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/200': 1, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'start_time': datetime.datetime(2019, 9, 11, 22, 35, 58, 733580)} 2019-09-12 00:35:59 [scrapy.core.engine] INFO: Spider closed (finished)

Scraper creates empty CSV-File

Hello, what a nice comprehensive Scraper. But unfortunately when I use it just the convenient way, I get an empty CSV/JSON-File in return. The script runs just fine on Debian and Mac OS but without a result like example in the data-Folder.
Greets

products.json output is always empty

Hello nice work! I tried to test it with several inputs but there are problems output is not working

scrapy crawl ebay -o products.json -a search="Samsung galaxy s7"

I take the result

2019-04-18 08:37:48 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scraping_ebay) 2019-04-18 08:37:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.4.8 (default, Feb 5 2018, 11:23:17) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)], pyOpenSSL 19.0.0 (OpenSSL 1.1.0h 27 Mar 2018), cryptography 2.3, Platform Linux-3.10.0-957.10.1.el7.x86_64-x86_64-with-centos-7.6.1810-Core 2019-04-18 08:37:48 [scrapy.crawler] INFO: Overridden settings: {'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'scraping_ebay', 'NEWSPIDER_MODULE': 'scraping_ebay.spiders', 'SPIDER_MODULES': ['scraping_ebay.spiders'], 'FEED_FORMAT': 'json', 'FEED_URI': 'products.json'} 2019-04-18 08:37:48 [scrapy.extensions.telnet] INFO: Telnet Password: 4fde495aabaaad3c 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-04-18 08:37:48 [scrapy.core.engine] INFO: Spider opened 2019-04-18 08:37:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-04-18 08:37:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-04-18 08:37:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/robots.txt> (referer: None) 2019-04-18 08:37:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com> (referer: None) 2019-04-18 08:37:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> from <GET http://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> 2019-04-18 08:37:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> (referer: None) 2019-04-18 08:37:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=2> (referer: https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200) 2019-04-18 08:37:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=3> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=2) 2019-04-18 08:37:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=4> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=3) 2019-04-18 08:38:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=5> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=4) 2019-04-18 08:38:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=6> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=5) 2019-04-18 08:38:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=7> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=6) 2019-04-18 08:38:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=8> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=7) 2019-04-18 08:38:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=9> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=8) 2019-04-18 08:38:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=10> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=9) 2019-04-18 08:38:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=11> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=10) 2019-04-18 08:38:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=12> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=11) 2019-04-18 08:38:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=13> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=12) 2019-04-18 08:38:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=14> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=13) 2019-04-18 08:38:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=15> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=14) 2019-04-18 08:38:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=16> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=15) 2019-04-18 08:38:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=17> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=16) 2019-04-18 08:38:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=18> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=17) 2019-04-18 08:38:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=19> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=18) 2019-04-18 08:38:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=20> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=19) 2019-04-18 08:38:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=21> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=20) 2019-04-18 08:38:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=22> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=21) 2019-04-18 08:38:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=23> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=22) 2019-04-18 08:38:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=24> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=23) 2019-04-18 08:38:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=25> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=24) 2019-04-18 08:38:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=26> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=25) 2019-04-18 08:38:48 [scrapy.extensions.logstats] INFO: Crawled 28 pages (at 28 pages/min), scraped 0 items (at 0 items/min) 2019-04-18 08:38:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=27> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=26) 2019-04-18 08:38:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=28> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=27) 2019-04-18 08:38:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=29> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=28) 2019-04-18 08:38:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=30> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=29) 2019-04-18 08:38:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=31> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=30) 2019-04-18 08:39:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=32> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=31) 2019-04-18 08:39:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=33> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=32) 2019-04-18 08:39:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=34> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=33) 2019-04-18 08:39:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=35> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=34) 2019-04-18 08:39:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=36> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=35) 2019-04-18 08:39:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=37> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=36) 2019-04-18 08:39:13 [ebay] DEBUG: eBay products collected successfully !!! 2019-04-18 08:39:13 [scrapy.core.engine] INFO: Closing spider (finished) 2019-04-18 08:39:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 29140, 'downloader/request_count': 40, 'downloader/request_method_count/GET': 40, 'downloader/response_bytes': 3222681, 'downloader/response_count': 40, 'downloader/response_status_count/200': 39, 'downloader/response_status_count/301': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 4, 18, 12, 39, 13, 928618), 'log_count/DEBUG': 41, 'log_count/INFO': 10, 'memusage/max': 164339712, 'memusage/startup': 47673344, 'request_depth_max': 37, 'response_received_count': 39, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/200': 1, 'scheduler/dequeued': 39, 'scheduler/dequeued/memory': 39, 'scheduler/enqueued': 39, 'scheduler/enqueued/memory': 39, 'start_time': datetime.datetime(2019, 4, 18, 12, 37, 48, 939034)} 2019-04-18 08:39:13 [scrapy.core.engine] INFO: Spider closed (finished)
products.json is empty!!!

Not getting good 'stars' and 'ratings' values returned

Hi,

It seems when I run this excellent tool, I get values of "0", "Wat", and "Pre" for "Stars", and "Ratings" is always 0.

(additionally it would be nice if seller information were returned: number of items sold or % positive reviews)

Merge duplicate scraper code

The files in scrapers/ contain almost the same code.

This means if features are added to one (like adding seller information) they aren't added to the other.

It would be good to pull similar code out of them and reuse it by reference, or pull out and parameterize the differences between them.

how to scrape the datetime and watchers info?

thank you for the app! this helps me alot!

By the way, I see the datetime request in the ebay.py file inside the spider folder. however, it seems the data isn't fetched by the .extract_first() command, is there any way to fix it?

How to add an auction end date

Hello,

Great app but i couldnt add the auction completion line but it doesn’t work please correct me

endedDate = product.xpath('.//*[@Class="s-item__time-end"]/text()').extract_first()

"EndedDate":endedDate,

Thanks!

Empty results for ebay.de

Hey,
great work. I've created a new spider for ebay Germany (ebay.de) and I don't get any results.

Here are the changes I've made for the new spider compared to the original one for ebay.com:

name = "ebay_de" allowed_domains = ["ebay.de"] start_urls = ["https://www.ebay.de"] ... yield scrapy.Request("http://www.ebay.de/sch/i.html?_from=R40&_trksid=" + trksid + "&_nkw=" + self.search_string.replace(' ','+') + "&_ipg=200", callback=self.parse_link)

Input scrapy crawl ebay_de -o products_de.csv -a search="MacBook Pro 13 2016"

Output
2019-11-23 22:50:17 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: scraping_ebay) 2019-11-23 22:50:17 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 2.7.17rc1 (default, Oct 10 2019, 10:26:01) - [GCC 9.2.1 20191008], pyOpenSSL 19.1.0 (OpenSSL 1.1.1c 28 May 2019), cryptography 2.6.1, Platform Linux-5.3.0-23-generic-x86_64-with-Ubuntu-19.10-eoan 2019-11-23 22:50:17 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'scraping_ebay.spiders', 'FEED_FORMAT': 'csv', 'SPIDER_MODULES': ['scraping_ebay.spiders'], 'FEED_URI': 'products_de.csv', 'BOT_NAME': 'scraping_ebay'} 2019-11-23 22:50:17 [scrapy.extensions.telnet] INFO: Telnet Password: 383b88df45692b23 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-11-23 22:50:17 [scrapy.core.engine] INFO: Spider opened 2019-11-23 22:50:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-11-23 22:50:17 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-11-23 22:50:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.de> (referer: None) 2019-11-23 22:50:18 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> from <GET http://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> 2019-11-23 22:50:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> (referer: None) 2019-11-23 22:50:20 [ebay_de] DEBUG: eBay products collected successfully !!! 2019-11-23 22:50:20 [scrapy.core.engine] INFO: Closing spider (finished) 2019-11-23 22:50:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1327, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 3, 'downloader/response_bytes': 108803, 'downloader/response_count': 3, 'downloader/response_status_count/200': 2, 'downloader/response_status_count/301': 1, 'elapsed_time_seconds': 2.400946, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 11, 23, 21, 50, 20, 273904), 'log_count/DEBUG': 4, 'log_count/INFO': 10, 'memusage/max': 54169600, 'memusage/startup': 54169600, 'request_depth_max': 1, 'response_received_count': 2, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'start_time': datetime.datetime(2019, 11, 23, 21, 50, 17, 872958)} 2019-11-23 22:50:20 [scrapy.core.engine] INFO: Spider closed (finished)
products_de.csv is empty

Thanks!

Show more fields

Any way to show also Shipping price, Import charges, Country/Region?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.