Code Monkey home page Code Monkey logo

Comments (3)

eliasdabbas avatar eliasdabbas commented on May 23, 2024 1

Thanks.
Seems fine, and the spider ended because it finished.

If you get a specific error feel free to open another issue.

from advertools.

eliasdabbas avatar eliasdabbas commented on May 23, 2024

Thanks @mirfan899 I tried those URLs and I didn't get any issues. There were only 3 that were forbidden by robots.txt rules, and a few JSON-LD issues here and there.

Your logs show that you have 89 (out of 100) status codes that are 200.
One of the URLs seems to have timed out.

Please check your code if you have special rules that might have been blocked by some domain (these are from many different domains, and a few might have issues).
Also check the output file and see which URLs were scraped and which weren't.

from advertools.

mirfan899 avatar mirfan899 commented on May 23, 2024

Here is my code.

import advertools as adv
import pandas as pd


urls = open("urls.txt").readlines()

adv.crawl(urls, "pages.jl", follow_links=False)

Okay, here is the output of code execution, takes around 13 minutes to complete

/home/irfan/.pyenv/versions/TES/bin/python /home/irfan/PycharmProjects/TES-SAAS/tests/scprapping.py 
2023-05-05 13:52:16 [scrapy.utils.log] INFO: Scrapy 2.6.1 started (bot: scrapybot)
2023-05-05 13:52:16 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 22.4.0, Python 3.7.9 (default, Jan 23 2022, 07:32:51) - [GCC 7.5.0], pyOpenSSL 22.0.0 (OpenSSL 3.0.3 3 May 2022), cryptography 37.0.2, Platform Linux-5.4.0-148-generic-x86_64-with-debian-bullseye-sid
2023-05-05 13:52:16 [scrapy.crawler] INFO: Overridden settings:
{'ROBOTSTXT_OBEY': True,
 'SPIDER_LOADER_WARN_ONLY': True,
 'USER_AGENT': 'advertools/0.13.2'}
2023-05-05 13:52:16 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2023-05-05 13:52:16 [scrapy.extensions.telnet] INFO: Telnet Password: 548495f04ca5182a
2023-05-05 13:52:16 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
2023-05-05 13:52:16 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2023-05-05 13:52:16 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-05-05 13:52:16 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2023-05-05 13:52:16 [scrapy.core.engine] INFO: Spider opened
2023-05-05 13:52:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-05-05 13:52:17 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-05-05 13:52:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sweetscienceoffighting.com/robots.txt> (referer: None)
2023-05-05 13:52:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.rollingstone.com/robots.txt> (referer: None)
2023-05-05 13:52:18 [filelock] DEBUG: Attempting to acquire lock 140224916848720 on /home/irfan/.cache/python-tldextract/3.7.9.final__TES__f2586e__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2023-05-05 13:52:18 [filelock] DEBUG: Lock 140224916848720 acquired on /home/irfan/.cache/python-tldextract/3.7.9.final__TES__f2586e__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2023-05-05 13:52:18 [filelock] DEBUG: Attempting to release lock 140224916848720 on /home/irfan/.cache/python-tldextract/3.7.9.final__TES__f2586e__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2023-05-05 13:52:18 [filelock] DEBUG: Lock 140224916848720 released on /home/irfan/.cache/python-tldextract/3.7.9.final__TES__f2586e__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2023-05-05 13:52:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.t3.com/robots.txt> (referer: None)
2023-05-05 13:52:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.si.com/robots.txt> (referer: None)
2023-05-05 13:52:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gearpatrol.com/robots.txt> (referer: None)
2023-05-05 13:52:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.verywellfit.com/robots.txt> (referer: None)
2023-05-05 13:52:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.shape.com/robots.txt> (referer: None)
2023-05-05 13:52:18 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.si.com/showcase/fitness/best-boxing-gloves> (referer: None)
2023-05-05 13:52:18 [scrapy.core.scraper] DEBUG: Scraped from <403 https://www.si.com/showcase/fitness/best-boxing-gloves>
2023-05-05 13:52:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxingglovesreviews.com/robots.txt> (referer: None)
2023-05-05 13:52:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.t3.com/features/best-boxing-gloves> (referer: None)
2023-05-05 13:52:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.t3.com/features/best-boxing-gloves>
2023-05-05 13:52:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.rollingstone.com/product-recommendations/lifestyle/best-boxing-gloves-1234690811/> (referer: None)
2023-05-05 13:52:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/robots.txt> (referer: None)
2023-05-05 13:52:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.rollingstone.com/product-recommendations/lifestyle/best-boxing-gloves-1234690811/>
2023-05-05 13:52:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bleacherreport.com/robots.txt> (referer: None)
2023-05-05 13:52:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://expertboxing.com/robots.txt> (referer: None)
2023-05-05 13:52:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.shape.com/fitness/gear/best-boxing-gloves> (referer: None)
2023-05-05 13:52:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.verywellfit.com/best-boxing-gloves-4158917> (referer: None)
2023-05-05 13:52:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sweetscienceoffighting.com/best-boxing-gloves/> (referer: None)
2023-05-05 13:52:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.shape.com/fitness/gear/best-boxing-gloves>
2023-05-05 13:52:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://thekarateblog.com/robots.txt> (referer: None)
2023-05-05 13:52:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxupnation.com/robots.txt> (referer: None)
2023-05-05 13:52:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.verywellfit.com/best-boxing-gloves-4158917>
2023-05-05 13:52:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sweetscienceoffighting.com/best-boxing-gloves/>
2023-05-05 13:52:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxingglovesreviews.com/top-ten-boxing-gloves/> (referer: None)
2023-05-05 13:52:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://boxingglovesreviews.com/top-ten-boxing-gloves/>
2023-05-05 13:52:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxupnation.com/blogs/news/my-top-5-favorite-boxing-glove-brands-and-why> (referer: None)
2023-05-05 13:52:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://boxupnation.com/blogs/news/my-top-5-favorite-boxing-glove-brands-and-why>
2023-05-05 13:52:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/robots.txt> (referer: None)
2023-05-05 13:52:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.tabletenniscoach.me.uk/robots.txt> (referer: None)
2023-05-05 13:52:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gearpatrol.com/fitness/g40446087/best-boxing-gloves/> (referer: None)
2023-05-05 13:52:23 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.gearpatrol.com/fitness/g40446087/best-boxing-gloves/>
2023-05-05 13:52:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://myboxinglife.com/robots.txt> (referer: None)
2023-05-05 13:52:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://wayofmartialarts.com/robots.txt> (referer: None)
2023-05-05 13:52:23 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131> (failed 1 times): 429 Unknown Status
2023-05-05 13:52:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bleacherreport.com/articles/1286577-breaking-down-different-brands-of-boxing-gloves-worn-by-the-pros> (referer: None)
2023-05-05 13:52:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bleacherreport.com/articles/1286577-breaking-down-different-brands-of-boxing-gloves-worn-by-the-pros>
2023-05-05 13:52:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://thekarateblog.com/best-boxing-gloves/> (referer: None)
2023-05-05 13:52:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://thekarateblog.com/best-boxing-gloves/>
2023-05-05 13:52:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.tabletenniscoach.me.uk/sport-equipment-guides/best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 13:52:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.tabletenniscoach.me.uk/sport-equipment-guides/best-boxing-gloves-for-beginners/>
2023-05-05 13:52:25 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131> (failed 2 times): 429 Unknown Status
2023-05-05 13:52:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://expertboxing.com/best-boxing-gloves-review> (referer: None)
2023-05-05 13:52:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/watch?v=tWoucO2nIlE> (referer: None)
2023-05-05 13:52:25 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.dickssportinggoods.com/robots.txt> (referer: None)
2023-05-05 13:52:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.hayabusafight.com/robots.txt> (referer: None)
2023-05-05 13:52:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://expertboxing.com/best-boxing-gloves-review>
2023-05-05 13:52:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://myboxinglife.com/best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 13:52:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://blog.joinfightcamp.com/robots.txt> (referer: None)
2023-05-05 13:52:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=tWoucO2nIlE>
2023-05-05 13:52:26 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131> (failed 3 times): 429 Unknown Status
2023-05-05 13:52:26 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131> (referer: None)
2023-05-05 13:52:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://myboxinglife.com/best-boxing-gloves-for-beginners/>
2023-05-05 13:52:26 [scrapy.core.scraper] DEBUG: Scraped from <429 https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131>
2023-05-05 13:52:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://wayofmartialarts.com/best-boxing-gloves-worth-your-money/> (referer: None)
2023-05-05 13:52:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://wayofmartialarts.com/best-boxing-gloves-worth-your-money/>
2023-05-05 13:52:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/watch?v=rHepbZOCxfY> (referer: None)
2023-05-05 13:52:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://revgear.com/robots.txt> (referer: None)
2023-05-05 13:52:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=rHepbZOCxfY>
2023-05-05 13:52:27 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.dickssportinggoods.com/o/best-boxing-gloves-for-pad-work> (referer: None)
2023-05-05 13:52:27 [scrapy.core.scraper] DEBUG: Scraped from <403 https://www.dickssportinggoods.com/o/best-boxing-gloves-for-pad-work>
2023-05-05 13:52:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://blog.joinfightcamp.com/boxing-equipment/how-to-choose-the-best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 13:52:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/robots.txt> (referer: None)
2023-05-05 13:52:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://blog.joinfightcamp.com/boxing-equipment/how-to-choose-the-best-boxing-gloves-for-beginners/>
2023-05-05 13:52:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.hayabusafight.com/products/t3-boxing-gloves> (referer: None)
2023-05-05 13:52:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.walmart.com/robots.txt> (referer: None)
2023-05-05 13:52:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.hayabusafight.com/products/t3-boxing-gloves>
2023-05-05 13:52:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/t/Boxing-Gloves/30102/bn_1943751> (referer: None)
2023-05-05 13:52:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.quora.com/robots.txt> (referer: None)
2023-05-05 13:52:28 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.quora.com/What-companies-make-the-best-quality-boxing-gloves>
2023-05-05 13:52:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://made4fighters.com/robots.txt> (referer: None)
2023-05-05 13:52:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ebay.com/t/Boxing-Gloves/30102/bn_1943751>
2023-05-05 13:52:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://m.timesofindia.com/robots.txt> (referer: None)
2023-05-05 13:52:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.everlast.com/robots.txt> (referer: None)
2023-05-05 13:52:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://revgear.com/gear/boxing-gloves/> (referer: None)
2023-05-05 13:52:29 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms?from=mdr> from <GET https://m.timesofindia.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms>
2023-05-05 13:52:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://revgear.com/gear/boxing-gloves/>
2023-05-05 13:52:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.walmart.com/c/lists/top-rated-boxing-gloves> (referer: None)
2023-05-05 13:52:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://made4fighters.com/blogs/default-blog/top-womens-boxing-gloves> (referer: None)
2023-05-05 13:52:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cletoreyesboxing.com/robots.txt> (referer: None)
2023-05-05 13:52:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.titleboxing.com/robots.txt> (referer: None)
2023-05-05 13:52:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.walmart.com/c/lists/top-rated-boxing-gloves>
2023-05-05 13:52:30 [seo_spider] ERROR: Invalid control character at: line 20 column 226 (char 698) 200 https://made4fighters.com/blogs/default-blog/top-womens-boxing-gloves
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 20 column 226 (char 698)
2023-05-05 13:52:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://made4fighters.com/blogs/default-blog/top-womens-boxing-gloves>
2023-05-05 13:52:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.everlast.com/fight/boxing/gloves> (referer: None)
2023-05-05 13:52:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://timesofindia.indiatimes.com/robots.txt> (referer: None)
2023-05-05 13:52:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.everlast.com/fight/boxing/gloves>
2023-05-05 13:52:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sanabulsports.com/robots.txt> (referer: None)
2023-05-05 13:52:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bravose.com/robots.txt> (referer: None)
2023-05-05 13:52:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.msmfightshop.com/robots.txt> (referer: None)
2023-05-05 13:52:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sanabulsports.com/blogs/news/the-best-boxing-gloves-for-training> (referer: None)
2023-05-05 13:52:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sanabulsports.com/blogs/news/the-best-boxing-gloves-for-training>
2023-05-05 13:52:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.msmfightshop.com/blogs/news/top-3-boxing-gloves-in-the-world> (referer: None)
2023-05-05 13:52:33 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.msmfightshop.com/blogs/news/top-3-boxing-gloves-in-the-world>
2023-05-05 13:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cletoreyesboxing.com/> (referer: None)
2023-05-05 13:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.titleboxing.com/gloves> (referer: None)
2023-05-05 13:52:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cletoreyesboxing.com/>
2023-05-05 13:52:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.titleboxing.com/gloves>
2023-05-05 13:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://anthonyjoshua.com/robots.txt> (referer: None)
2023-05-05 13:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-professionals/articleshow/97128538.cms> (referer: None)
2023-05-05 13:52:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-professionals/articleshow/97128538.cms>
2023-05-05 13:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bravose.com/collections/training-gloves> (referer: None)
2023-05-05 13:52:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bravose.com/collections/training-gloves>
2023-05-05 13:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://mmagearaddict.com/robots.txt> (referer: None)
2023-05-05 13:52:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://anthonyjoshua.com/blogs/news/anthony-joshua-how-to-choose-the-best-boxing-gloves> (referer: None)
2023-05-05 13:52:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://anthonyjoshua.com/blogs/news/anthony-joshua-how-to-choose-the-best-boxing-gloves>
2023-05-05 13:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ringsport.com.au/robots.txt> (referer: None)
2023-05-05 13:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://tufwear-germany.de/robots.txt> (referer: None)
2023-05-05 13:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms?from=mdr> (referer: None)
2023-05-05 13:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://issuu.com/robots.txt> (referer: None)
2023-05-05 13:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://yokkao.com/robots.txt> (referer: None)
2023-05-05 13:52:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms?from=mdr>
2023-05-05 13:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ringsport.com.au/blogs/ringsport-blog/boxing-glove-guide-part-1> (referer: None)
2023-05-05 13:52:37 [seo_spider] ERROR: Invalid control character at: line 5 column 19 (char 78) 200 https://www.ringsport.com.au/blogs/ringsport-blog/boxing-glove-guide-part-1
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 5 column 19 (char 78)
2023-05-05 13:52:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ringsport.com.au/blogs/ringsport-blog/boxing-glove-guide-part-1>
2023-05-05 13:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://topboxer.com/robots.txt> (referer: None)
2023-05-05 13:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://mmagearaddict.com/best-boxing-gloves/> (referer: None)
2023-05-05 13:52:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://mmagearaddict.com/best-boxing-gloves/>
2023-05-05 13:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://tufwear-germany.de/en/blogs/news/was-sind-die-besten-boxhandschuhe-der-boxhandschuh-guide-fur-deinen-kauf> (referer: None)
2023-05-05 13:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://issuu.com/punchequipment/docs/get_the_best_boxing_gloves_for_a_winning_performan> (referer: None)
2023-05-05 13:52:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://tufwear-germany.de/en/blogs/news/was-sind-die-besten-boxhandschuhe-der-boxhandschuh-guide-fur-deinen-kauf>
2023-05-05 13:52:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://issuu.com/punchequipment/docs/get_the_best_boxing_gloves_for_a_winning_performan>
2023-05-05 13:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://nypost.com/robots.txt> (referer: None)
2023-05-05 13:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nakmuaywholesale.com/robots.txt> (referer: None)
2023-05-05 13:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://warriorpunch.com/robots.txt> (referer: None)
2023-05-05 13:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://yokkao.com/pages/boxing-gloves-guide> (referer: None)
2023-05-05 13:52:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://yokkao.com/pages/boxing-gloves-guide>
2023-05-05 13:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://topboxer.com/collections/boxing-gloves> (referer: None)
2023-05-05 13:52:39 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.infinitudefight.com/robots.txt> (referer: None)
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 10 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 14 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 16 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 35 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 42 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 43 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 44 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 45 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 46 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 47 without any user agent to enforce it on.
2023-05-05 13:52:39 [protego] DEBUG: Rule at line 69 without any user agent to enforce it on.
2023-05-05 13:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://origympersonaltrainercourses.co.uk/robots.txt> (referer: None)
2023-05-05 13:52:39 [seo_spider] ERROR: Invalid control character at: line 15 column 21 (char 385) 200 https://topboxer.com/collections/boxing-gloves
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 15 column 21 (char 385)
2023-05-05 13:52:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://topboxer.com/collections/boxing-gloves>
2023-05-05 13:52:39 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.infinitudefight.com/buy-the-best-boxing-gloves/> (referer: None)
2023-05-05 13:52:39 [scrapy.core.scraper] DEBUG: Scraped from <403 https://www.infinitudefight.com/buy-the-best-boxing-gloves/>
2023-05-05 13:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cashkaro.com/robots.txt> (referer: None)
2023-05-05 13:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://nypost.com/article/best-boxing-equipment-per-experts/> (referer: None)
2023-05-05 13:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://kdvr.com/robots.txt> (referer: None)
2023-05-05 13:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://nypost.com/article/best-boxing-equipment-per-experts/>
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nakmuaywholesale.com/top-3-boxing-gloves-for-small-hands-2022/> (referer: None)
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.popsugar.com/robots.txt> (referer: None)
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://warriorpunch.com/best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 13:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nakmuaywholesale.com/top-3-boxing-gloves-for-small-hands-2022/>
2023-05-05 13:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://warriorpunch.com/best-boxing-gloves-for-beginners/>
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.expertreviews.co.uk/robots.txt> (referer: None)
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://branded.disruptsports.com/robots.txt> (referer: None)
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cashkaro.com/blog/best-boxing-gloves-in-india/201246> (referer: None)
2023-05-05 13:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cashkaro.com/blog/best-boxing-gloves-in-india/201246>
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://branded.disruptsports.com/blogs/blog/which-boxing-gloves-to-buy-for-beginners> (referer: None)
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.expertreviews.co.uk/health-and-grooming/1407584/best-boxing-gloves> (referer: None)
2023-05-05 13:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://branded.disruptsports.com/blogs/blog/which-boxing-gloves-to-buy-for-beginners>
2023-05-05 13:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fightquality.com/robots.txt> (referer: None)
2023-05-05 13:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.expertreviews.co.uk/health-and-grooming/1407584/best-boxing-gloves>
2023-05-05 13:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.popsugar.com/fitness/Best-Boxing-Gloves-Women-45472473> (referer: None)
2023-05-05 13:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.popsugar.com/fitness/Best-Boxing-Gloves-Women-45472473>
2023-05-05 13:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://kdvr.com/reviews/br/sports-fitness-br/boxing-br/best-title-boxing-gloves/> (referer: None)
2023-05-05 13:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://origympersonaltrainercourses.co.uk/blog/best-boxing-gloves> (referer: None)
2023-05-05 13:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://kdvr.com/reviews/br/sports-fitness-br/boxing-br/best-title-boxing-gloves/>
2023-05-05 13:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fightingadvice.com/robots.txt> (referer: None)
2023-05-05 13:52:41 [seo_spider] ERROR: Expecting value: line 1 column 1 (char 0) 200 https://origympersonaltrainercourses.co.uk/blog/best-boxing-gloves
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2023-05-05 13:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://origympersonaltrainercourses.co.uk/blog/best-boxing-gloves>
2023-05-05 13:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flipkart.com/robots.txt> (referer: None)
2023-05-05 13:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.healthyprinciples.co.uk/robots.txt> (referer: None)
2023-05-05 13:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fightquality.com/2018/10/12/best-custom-gloves/> (referer: None)
2023-05-05 13:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.k2promos.com/robots.txt> (referer: None)
2023-05-05 13:52:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://fightquality.com/2018/10/12/best-custom-gloves/>
2023-05-05 13:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fightingadvice.com/best-boxing-gloves-under-200/> (referer: None)
2023-05-05 13:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://breakinggrips.com/robots.txt> (referer: None)
2023-05-05 13:52:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://fightingadvice.com/best-boxing-gloves-under-200/>
2023-05-05 13:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.healthyprinciples.co.uk/best-boxing-gloves-for-kids-review/> (referer: None)
2023-05-05 13:52:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.healthyprinciples.co.uk/best-boxing-gloves-for-kids-review/>
2023-05-05 13:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.k2promos.com/best-beginner-boxing-gloves/> (referer: None)
2023-05-05 13:52:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.k2promos.com/best-beginner-boxing-gloves/>
2023-05-05 13:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.proboxingequipment.com/robots.txt> (referer: None)
2023-05-05 13:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mmahive.com/robots.txt> (referer: None)
2023-05-05 13:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flipkart.com/sports/boxing/boxing-gloves/pr?sid=abc%2Cppq%2Cbb6&page=2> (referer: None)
2023-05-05 13:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bwsgym.com/robots.txt> (referer: None)
2023-05-05 13:52:45 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.flipkart.com/sports/boxing/boxing-gloves/pr?sid=abc%2Cppq%2Cbb6&page=2>
2023-05-05 13:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.dontwasteyourmoney.com/robots.txt> (referer: None)
2023-05-05 13:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.proboxingequipment.com/Boxing-Gloves_c_196.html> (referer: None)
2023-05-05 13:52:45 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.proboxingequipment.com/Boxing-Gloves_c_196.html>
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bwsgym.com/etiquette-produit/best-boxing-gloves/> (referer: None)
2023-05-05 13:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bwsgym.com/etiquette-produit/best-boxing-gloves/>
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://glovesaddict.com/robots.txt> (referer: None)
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.reddit.com/robots.txt> (referer: None)
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mmahive.com/best-boxing-gloves-for-wrist-support/> (referer: None)
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.dontwasteyourmoney.com/products/hawk-sports-heavy-bag-boxing-gloves/> (referer: None)
2023-05-05 13:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mmahive.com/best-boxing-gloves-for-wrist-support/>
2023-05-05 13:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.dontwasteyourmoney.com/products/hawk-sports-heavy-bag-boxing-gloves/>
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.bestproducts.com/robots.txt> (referer: None)
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://middleeasy.com/robots.txt> (referer: None)
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.reddit.com/r/amateur_boxing/comments/2ykhau/the_top_15_best_boxing_gloves_ranking_the_best/> (referer: None)
2023-05-05 13:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://absolutelymartialarts.com/robots.txt> (referer: None)
2023-05-05 13:52:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.fightingking.com/robots.txt> (failed 1 times): 429 Unknown Status
2023-05-05 13:52:47 [py.warnings] WARNING: /home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/scrapy/core/engine.py:276: ScrapyDeprecationWarning: Passing a 'spider' argument to ExecutionEngine.download is deprecated
  return self.download(result, spider) if isinstance(result, Request) else result

2023-05-05 13:52:47 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.reddit.com/r/amateur_boxing/comments/2ykhau/the_top_15_best_boxing_gloves_ranking_the_best/>
2023-05-05 13:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.momjunction.com/robots.txt> (referer: None)
2023-05-05 13:52:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://breakinggrips.com/best-kids-boxing-gloves/> (referer: None)
2023-05-05 13:52:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://breakinggrips.com/best-kids-boxing-gloves/>
2023-05-05 13:52:48 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.fightingking.com/robots.txt> (failed 2 times): 429 Unknown Status
2023-05-05 13:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://middleeasy.com/reviews/gear/gloves-cardio-kickboxing/> (referer: None)
2023-05-05 13:52:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://middleeasy.com/reviews/gear/gloves-cardio-kickboxing/>
2023-05-05 13:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://absolutelymartialarts.com/best-boxing-gloves-beginners/> (referer: None)
2023-05-05 13:52:50 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.fightingking.com/robots.txt> (failed 3 times): 429 Unknown Status
2023-05-05 13:52:50 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://www.fightingking.com/robots.txt> (referer: None)
2023-05-05 13:52:50 [protego] DEBUG: Rule at line 2 without any user agent to enforce it on.
2023-05-05 13:52:50 [protego] DEBUG: Rule at line 6 without any user agent to enforce it on.
2023-05-05 13:52:50 [protego] DEBUG: Rule at line 10 without any user agent to enforce it on.
2023-05-05 13:52:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://absolutelymartialarts.com/best-boxing-gloves-beginners/>
2023-05-05 13:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mightyfighter.com/robots.txt> (referer: None)
2023-05-05 13:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.momjunction.com/articles/best-boxing-gloves-for-kids_00514921/> (referer: None)
2023-05-05 13:52:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.momjunction.com/articles/best-boxing-gloves-for-kids_00514921/>
2023-05-05 13:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.wbcme.co.uk/robots.txt> (referer: None)
2023-05-05 13:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.stylecraze.com/robots.txt> (referer: None)
2023-05-05 13:52:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.fightingking.com/boxing-gloves-brands-reviews/> (failed 1 times): 429 Unknown Status
2023-05-05 13:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://glovesaddict.com/best-boxing-gloves-on-amazon/> (referer: None)
2023-05-05 13:52:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://glovesaddict.com/best-boxing-gloves-on-amazon/>
2023-05-05 13:52:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.bestproducts.com/fitness/equipment/g1009/boxing-gloves-mitts/> (referer: None)
2023-05-05 13:52:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.bestproducts.com/fitness/equipment/g1009/boxing-gloves-mitts/>
2023-05-05 13:52:52 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.fightingking.com/boxing-gloves-brands-reviews/> (failed 2 times): 429 Unknown Status
2023-05-05 13:52:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://linealboxing.com/robots.txt> (referer: None)
2023-05-05 13:52:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://smartmma.com/robots.txt> (referer: None)
2023-05-05 13:52:53 [protego] DEBUG: Rule at line 1 without any user agent to enforce it on.
2023-05-05 13:52:54 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.fightingking.com/boxing-gloves-brands-reviews/> (failed 3 times): 429 Unknown Status
2023-05-05 13:52:54 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://www.fightingking.com/boxing-gloves-brands-reviews/> (referer: None)
2023-05-05 13:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.stylecraze.com/articles/best-heavy-bag-gloves/> (referer: None)
2023-05-05 13:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.wbcme.co.uk/ringside/best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 13:52:54 [scrapy.core.scraper] DEBUG: Scraped from <429 https://www.fightingking.com/boxing-gloves-brands-reviews/>
2023-05-05 13:52:54 [seo_spider] ERROR: Invalid control character at: line 28 column 64 (char 1740) 200 https://www.stylecraze.com/articles/best-heavy-bag-gloves/
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 28 column 64 (char 1740)
2023-05-05 13:52:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stylecraze.com/articles/best-heavy-bag-gloves/>
2023-05-05 13:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mightyfighter.com/top-10-best-boxing-gloves/> (referer: None)
2023-05-05 13:52:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.wbcme.co.uk/ringside/best-boxing-gloves-for-beginners/>
2023-05-05 13:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fullcontactway.com/robots.txt> (referer: None)
2023-05-05 13:52:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mightyfighter.com/top-10-best-boxing-gloves/>
2023-05-05 13:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://linealboxing.com/best-boxing-glove-brands-2022/> (referer: None)
2023-05-05 13:52:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://linealboxing.com/best-boxing-glove-brands-2022/>
2023-05-05 13:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.attacktheback.com/robots.txt> (referer: None)
2023-05-05 13:52:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.kreedon.com/robots.txt> (referer: None)
2023-05-05 13:52:55 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.kreedon.com/best-boxing-gloves-brands/>
2023-05-05 13:52:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cletoreyesuk.com/robots.txt> (referer: None)
2023-05-05 13:52:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.attacktheback.com/best-cheap-boxing-gloves/> (referer: None)
2023-05-05 13:52:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fullcontactway.com/best-sparring-gloves/> (referer: None)
2023-05-05 13:52:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://blackbeltmag.com/robots.txt> (referer: None)
2023-05-05 13:52:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.attacktheback.com/best-cheap-boxing-gloves/>
2023-05-05 13:52:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fullcontactway.com/best-sparring-gloves/>
2023-05-05 13:52:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fitnessbaddies.com/robots.txt> (referer: None)
2023-05-05 13:52:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bestreviews.com/robots.txt> (referer: None)
2023-05-05 13:52:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.boxingison.com/robots.txt> (referer: None)
2023-05-05 13:52:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cletoreyesuk.com/blogs/news/what-are-the-best-boxing-gloves-for-beginners> (referer: None)
2023-05-05 13:52:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cletoreyesuk.com/blogs/news/what-are-the-best-boxing-gloves-for-beginners>
2023-05-05 13:52:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://blackbeltmag.com/best-boxing-gloves> (referer: None)
2023-05-05 13:52:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://blackbeltmag.com/best-boxing-gloves>
2023-05-05 13:52:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://smartmma.com/best-boxing-gloves-for-heavy-bag/> (referer: None)
2023-05-05 13:52:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://smartmma.com/best-boxing-gloves-for-heavy-bag/>
2023-05-05 13:52:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bestreviews.com/sports-fitness/boxing/best-boxing-gloves> (referer: None)
2023-05-05 13:52:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.msn.com/robots.txt> (referer: None)
2023-05-05 13:52:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pragmaticmom.com/robots.txt> (referer: None)
2023-05-05 13:52:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fitnessbaddies.com/amateur-boxing-gloves/> (referer: None)
2023-05-05 13:52:59 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://thewiredshopper.com/robots.txt> (referer: None)
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 28 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 37 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 38 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 39 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 40 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 41 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 42 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 43 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 44 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 45 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 46 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 47 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 48 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 49 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 50 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 51 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 52 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 53 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 54 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 55 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 56 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 57 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 58 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 59 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 60 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 61 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 67 without any user agent to enforce it on.
2023-05-05 13:52:59 [protego] DEBUG: Rule at line 72 without any user agent to enforce it on.
2023-05-05 13:52:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bestreviews.com/sports-fitness/boxing/best-boxing-gloves>
2023-05-05 13:52:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fitnessbaddies.com/amateur-boxing-gloves/>
2023-05-05 13:52:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.boxingear.com/robots.txt> (referer: None)
2023-05-05 13:52:59 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://thewiredshopper.com/best-boxing-gloves-to-buy/> (referer: None)
2023-05-05 13:53:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.msn.com/en-gb/lifestyle/rf-best-products-uk/best-boxing-gloves-for-men-12oz-reviews> (referer: None)
2023-05-05 13:53:00 [scrapy.core.scraper] DEBUG: Scraped from <403 https://thewiredshopper.com/best-boxing-gloves-to-buy/>
2023-05-05 13:53:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowkickmma.com/robots.txt> (referer: None)
2023-05-05 13:53:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.msn.com/en-gb/lifestyle/rf-best-products-uk/best-boxing-gloves-for-men-12oz-reviews>
2023-05-05 13:53:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.standard.co.uk/robots.txt> (referer: None)
2023-05-05 13:53:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://sites.google.com/view> from <GET https://www.boxingear.com/shop-2/grant-gloves/lace-up/best-boxing-gloves-for-sparring-grant-gloves/>
2023-05-05 13:53:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pragmaticmom.com/2019/11/best-boxing-gloves-for-women/> (referer: None)
2023-05-05 13:53:01 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pragmaticmom.com/2019/11/best-boxing-gloves-for-women/>
2023-05-05 13:53:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.boxingison.com/best-boxing-gloves-for-training-and-sparring/> (referer: None)
2023-05-05 13:53:01 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.boxingison.com/best-boxing-gloves-for-training-and-sparring/>
2023-05-05 13:53:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxingready.com/robots.txt> (referer: None)
2023-05-05 13:53:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowkickmma.com/best-boxing-gloves/> (referer: None)
2023-05-05 13:53:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gloveworx.com/robots.txt> (referer: None)
2023-05-05 13:53:01 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.sportsdirect.com/robots.txt> (referer: None)
2023-05-05 13:53:01 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowkickmma.com/best-boxing-gloves/>
2023-05-05 13:53:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.standard.co.uk/shopping/esbest/health-fitness/fitness-wear/best-womens-boxing-gloves-for-beginners-a4272321.html> (referer: None)
2023-05-05 13:53:02 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.standard.co.uk/shopping/esbest/health-fitness/fitness-wear/best-womens-boxing-gloves-for-beginners-a4272321.html>
2023-05-05 13:53:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.dmarge.com/robots.txt> (referer: None)
2023-05-05 13:53:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://brawlbros.com/robots.txt> (referer: None)
2023-05-05 13:53:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sites.google.com/robots.txt> (referer: None)
2023-05-05 13:53:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.dmarge.com/best-boxing-gloves> (referer: None)
2023-05-05 13:53:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.sportsdirect.com/boxing/boxing-gloves> (referer: None)
2023-05-05 13:53:02 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://sites.google.com/view> (referer: None)
2023-05-05 13:53:02 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.dmarge.com/best-boxing-gloves>
2023-05-05 13:53:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://themmaguru.com/robots.txt> (referer: None)
2023-05-05 13:53:02 [scrapy.core.scraper] DEBUG: Scraped from <403 https://www.sportsdirect.com/boxing/boxing-gloves>
2023-05-05 13:53:02 [scrapy.core.scraper] DEBUG: Scraped from <404 https://sites.google.com/view>
2023-05-05 13:53:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nytimes.com/robots.txt> (referer: None)
2023-05-05 13:53:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gloveworx.com/blog/how-choose-best-boxing-gloves-beginners/> (referer: None)
2023-05-05 13:53:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://thechamplair.com/robots.txt> (referer: None)
2023-05-05 13:53:03 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.gloveworx.com/blog/how-choose-best-boxing-gloves-beginners/>
2023-05-05 13:53:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://brawlbros.com/best-boxing-gloves-on-amazon/> (referer: None)
2023-05-05 13:53:03 [scrapy.core.scraper] DEBUG: Scraped from <200 https://brawlbros.com/best-boxing-gloves-on-amazon/>
2023-05-05 13:53:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://themmaguru.com/best-youth-boxing-gloves/> (referer: None)
2023-05-05 13:53:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://findbestboxinggloves.com/robots.txt> (referer: None)
2023-05-05 13:53:04 [scrapy.core.scraper] DEBUG: Scraped from <200 https://themmaguru.com/best-youth-boxing-gloves/>
2023-05-05 13:53:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gearhungry.com/robots.txt> (referer: None)
2023-05-05 13:53:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hiconsumption.com/robots.txt> (referer: None)
2023-05-05 13:53:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://thechamplair.com/sports/best-beginners-boxing-gloves/> (referer: None)
2023-05-05 13:53:05 [scrapy.core.scraper] DEBUG: Scraped from <200 https://thechamplair.com/sports/best-beginners-boxing-gloves/>
2023-05-05 13:53:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://findbestboxinggloves.com/best-boxing-gloves-for-heavy-bag-the-complete-guide/> (referer: None)
2023-05-05 13:53:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nytimes.com/video/style/1194840632119/gear-test-boxing-gloves.html> (referer: None)
2023-05-05 13:53:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hiconsumption.com/best-boxing-gloves/> (referer: None)
2023-05-05 13:53:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://findbestboxinggloves.com/best-boxing-gloves-for-heavy-bag-the-complete-guide/>
2023-05-05 13:53:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nytimes.com/video/style/1194840632119/gear-test-boxing-gloves.html>
2023-05-05 13:53:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://hiconsumption.com/best-boxing-gloves/>
2023-05-05 13:53:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gearhungry.com/best-boxing-gloves/> (referer: None)
2023-05-05 13:53:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.gearhungry.com/best-boxing-gloves/>
2023-05-05 13:53:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxingready.com/ringside/best-boxing-gloves-wrist-support/> (referer: None)
2023-05-05 13:53:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://boxingready.com/ringside/best-boxing-gloves-wrist-support/>
2023-05-05 13:53:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.hungry4fitness.co.uk/robots.txt> (referer: None)
2023-05-05 13:53:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.hungry4fitness.co.uk/post/10-best-boxing-mitts-an-ultimate-guide> (referer: None)
2023-05-05 13:53:08 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.hungry4fitness.co.uk/post/10-best-boxing-mitts-an-ultimate-guide>
2023-05-05 13:53:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 196 pages/min), scraped 97 items (at 97 items/min)
2023-05-05 13:54:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 13:54:40 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://skilspo.com/robots.txt> (failed 1 times): TCP connection timed out: 110: Connection timed out.
2023-05-05 13:55:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 13:56:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 13:56:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://skilspo.com/robots.txt> (failed 2 times): TCP connection timed out: 110: Connection timed out.
2023-05-05 13:57:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 13:58:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 13:59:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://skilspo.com/robots.txt> (failed 3 times): TCP connection timed out: 110: Connection timed out.
2023-05-05 13:59:02 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET https://skilspo.com/robots.txt>: TCP connection timed out: 110: Connection timed out.
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/scrapy/core/downloader/middleware.py", line 49, in process_request
    return (yield download_func(request=request, spider=spider))
twisted.internet.error.TCPTimedOutError: TCP connection timed out: 110: Connection timed out.
2023-05-05 13:59:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 14:00:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 14:01:13 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://skilspo.com/gb/blog/1_how-to-choose-the-best-boxing-gloves.html> (failed 1 times): TCP connection timed out: 110: Connection timed out.
2023-05-05 14:01:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 14:02:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 14:03:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 14:03:24 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://skilspo.com/gb/blog/1_how-to-choose-the-best-boxing-gloves.html> (failed 2 times): TCP connection timed out: 110: Connection timed out.
2023-05-05 14:04:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 14:05:17 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 14:05:35 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://skilspo.com/gb/blog/1_how-to-choose-the-best-boxing-gloves.html> (failed 3 times): TCP connection timed out: 110: Connection timed out.
2023-05-05 14:05:35 [seo_spider] ERROR: <twisted.python.failure.Failure twisted.internet.error.TCPTimedOutError: TCP connection timed out: 110: Connection timed out.>
2023-05-05 14:05:35 [scrapy.core.scraper] DEBUG: Scraped from TCP connection timed out: 110: Connection timed out.
2023-05-05 14:05:35 [scrapy.core.engine] INFO: Closing spider (finished)
2023-05-05 14:05:35 [scrapy.extensions.feedexport] INFO: Stored jl feed (98 items) in: pages.jl
2023-05-05 14:05:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 8,
 'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 2,
 'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 6,
 'downloader/request_bytes': 60432,
 'downloader/request_count': 210,
 'downloader/request_method_count/GET': 210,
 'downloader/response_bytes': 6041008,
 'downloader/response_count': 204,
 'downloader/response_status_count/200': 183,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 1,
 'downloader/response_status_count/403': 9,
 'downloader/response_status_count/404': 1,
 'downloader/response_status_count/429': 9,
 'elapsed_time_seconds': 798.663859,
 'feedexport/success_count/FileFeedStorage': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 5, 5, 9, 5, 35, 694921),
 'httpcompression/response_bytes': 34027780,
 'httpcompression/response_count': 174,
 'item_scraped_count': 98,
 'log_count/DEBUG': 356,
 'log_count/ERROR': 12,
 'log_count/INFO': 24,
 'log_count/WARNING': 1,
 'memusage/max': 231342080,
 'memusage/startup': 142450688,
 'response_received_count': 196,
 'retry/count': 10,
 'retry/max_reached': 5,
 'retry/reason_count/429 Unknown Status': 6,
 'retry/reason_count/twisted.internet.error.TCPTimedOutError': 4,
 "robotstxt/exception_count/<class 'twisted.internet.error.TCPTimedOutError'>": 1,
 'robotstxt/forbidden': 2,
 'robotstxt/request_count': 100,
 'robotstxt/response_count': 99,
 'robotstxt/response_status_count/200': 94,
 'robotstxt/response_status_count/403': 4,
 'robotstxt/response_status_count/429': 1,
 'scheduler/dequeued': 108,
 'scheduler/dequeued/memory': 108,
 'scheduler/enqueued': 108,
 'scheduler/enqueued/memory': 108,
 'start_time': datetime.datetime(2023, 5, 5, 8, 52, 17, 31062)}
2023-05-05 14:05:35 [scrapy.core.engine] INFO: Spider closed (finished)
2023-05-05 14:05:36,013 | INFO | utils.py:160 | _init_num_threads | NumExpr defaulting to 6 threads.

Process finished with exit code 0

from advertools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.