Code Monkey home page Code Monkey logo

mef-scraper's People

Contributors

ed1123 avatar

Watchers

 avatar

mef-scraper's Issues

MEF_2 Scraper not working at all

Ran MEF_2 Scraper with scrapy crawl mef_2 including the S3 configuration. Spider ran, even created the file on the S3, but didn't gather any data.

Logs:

2022-01-12 03:19:27 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: aliaxis)
2022-01-12 03:19:27 [scrapy.utils.log] INFO: Versions: lxml 4.7.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.10 (default, Nov 26 2021, 20:14:08) - [GCC 9.3.0], pyOpenSSL 21.0.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 36.0.1, Platform Linux-5.11.0-1025-aws-x86_64-with-glibc2.29
2022-01-12 03:19:27 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2022-01-12 03:19:27 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'aliaxis',
'NEWSPIDER_MODULE': 'aliaxis.spiders',
'SPIDER_MODULES': ['aliaxis.spiders']}
2022-01-12 03:19:27 [scrapy.extensions.telnet] INFO: Telnet Password: 1684106a3389090c
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-call.apigateway to before-call.api-gateway
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..autoscaling.CreateLaunchConfiguration.complete-section to docs..auto-scaling.CreateLaunchConfiguration.complete-section
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..logs.CreateExportTask.complete-section to docs..cloudwatch-logs.CreateExportTask.complete-section
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..cloudsearchdomain.Search.complete-section to docs..cloudsearch-domain.Search.complete-section
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/endpoints.json
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event choose-service-name: calling handler <function handle_service_name_alias at 0x7ffad743f5e0>
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/s3/2006-03-01/service-2.json
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7ffad74e89d0>
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7ffad74e8790>
2022-01-12 03:19:27 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/_retry.json
2022-01-12 03:19:27 [botocore.client] DEBUG: Registering retry handlers for service: s3
2022-01-12 03:19:27 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2022-01-12 03:19:27 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-01-12 03:19:27 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-01-12 03:19:27 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-01-12 03:19:27 [scrapy.core.engine] INFO: Spider opened
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-call.apigateway to before-call.api-gateway
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..autoscaling.CreateLaunchConfiguration.complete-section to docs..auto-scaling.CreateLaunchConfiguration.complete-section
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..logs.CreateExportTask.complete-section to docs..cloudwatch-logs.CreateExportTask.complete-section
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..cloudsearchdomain.Search.complete-section to docs..cloudsearch-domain.Search.complete-section
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/endpoints.json
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event choose-service-name: calling handler <function handle_service_name_alias at 0x7ffad743f5e0>
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/s3/2006-03-01/service-2.json
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7ffad74e89d0>
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7ffad74e8790>
2022-01-12 03:19:27 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/_retry.json
2022-01-12 03:19:27 [botocore.client] DEBUG: Registering retry handlers for service: s3
2022-01-12 03:19:27 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-01-12 03:19:27 [py.warnings] WARNING: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/scrapy/spidermiddlewares/offsite.py:65: URLWarning: allowed_domains accepts only domains, not URLs. Ignoring URL entry https://apps5.mineco.gob.pe/bingos/seguimiento_pi/Navegador/default.aspx in allowed_domains.
warnings.warn(message, URLWarning)

2022-01-12 03:19:27 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2022-01-12 03:19:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://apps5.mineco.gob.pe/bingos/seguimiento_pi/Navegador/Navegar_2.aspx?_tgt=xls&_uhc=yes&0=&31=&y=2021&cpage=1&psize=1000000> (referer: None)
2022-01-12 03:19:28 [scrapy.core.engine] INFO: Closing spider (finished)
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7ffad74609d0>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7ffad745adc0>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7ffad7461310>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7ffad745ad30>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7ffad6df5310>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7ffad6df53d0>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function generate_idempotent_uuid at 0x7ffad745ab80>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7ffad75d3c10>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7ffad74600d0>
2022-01-12 03:19:28 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7ffad6df5310>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function inject_api_version_header_if_needed at 0x7ffad7461430>
2022-01-12 03:19:28 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) with params: {'url_path': '/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'Botocore/1.23.33 Python/3.8.10 Linux/5.11.0-1025-aws', 'Content-MD5': '1B2M2Y8AsgTpgAmY7PhCfg==', 'Expect': '100-continue'}, 'body': <tempfile._TemporaryFileWrapper object at 0x7ffad6b2ea60>, 'url': 'https://s3.amazonaws.com/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7ffad70d7820>, 'has_streaming_input': True, 'auth_type': None, 'signing': {'bucket': 'rpa.dev'}}}
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7ffad70d7790>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <bound method S3EndpointSetter.set_signer of <botocore.utils.S3EndpointSetter object at 0x7ffad6df5460>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7ffad6b2e460>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <function set_operation_specific_signer at 0x7ffad745aa60>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7ffad6df5460>>
2022-01-12 03:19:28 [botocore.utils] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2022-01-12 03:19:28 [botocore.utils] DEBUG: Checking for DNS compatible bucket for: https://s3.amazonaws.com/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx
2022-01-12 03:19:28 [botocore.utils] DEBUG: Not changing URI, bucket is not DNS compatible: rpa.dev
2022-01-12 03:19:28 [botocore.auth] DEBUG: Calculating signature using v4 auth.
2022-01-12 03:19:28 [botocore.auth] DEBUG: CanonicalRequest:
PUT
/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx

content-md5:1B2M2Y8AsgTpgAmY7PhCfg==
host:s3.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20220112T031928Z

content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
2022-01-12 03:19:28 [botocore.auth] DEBUG: StringToSign:
AWS4-HMAC-SHA256
20220112T031928Z
20220112/us-east-1/s3/aws4_request
bb4ddbbecc3c4ba398451831abf5e99d6e1ad3760b228f72b4c8ca3e42f4edf8
2022-01-12 03:19:28 [botocore.auth] DEBUG: Signature:
c794a13fb293b4d8b24ef2f909fdcbb075057a1675cd10e4e3dbc2f3081e23d4
2022-01-12 03:19:28 [botocore.endpoint] DEBUG: Sending http request: <AWSPreparedRequest stream_output=False, method=PUT, url=https://s3.amazonaws.com/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx, headers={'User-Agent': b'Botocore/1.23.33 Python/3.8.10 Linux/5.11.0-1025-aws', 'Content-MD5': b'1B2M2Y8AsgTpgAmY7PhCfg==', 'Expect': b'100-continue', 'X-Amz-Date': b'20220112T031928Z', 'X-Amz-Content-SHA256': b'UNSIGNED-PAYLOAD', 'Authorization': b'AWS4-HMAC-SHA256 Credential=AKIAX2RPQC4SOXKMSZEI/20220112/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=c794a13fb293b4d8b24ef2f909fdcbb075057a1675cd10e4e3dbc2f3081e23d4', 'Content-Length': '0'}>
2022-01-12 03:19:28 [botocore.httpsession] DEBUG: Certificate path: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/cacert.pem
2022-01-12 03:19:28 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): s3.amazonaws.com:443
2022-01-12 03:19:28 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2022-01-12 03:19:28 [botocore.awsrequest] DEBUG: 100 Continue response seen, now sending request body.
2022-01-12 03:19:28 [urllib3.connectionpool] DEBUG: https://s3.amazonaws.com:443 "PUT /rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx HTTP/1.1" 200 0
2022-01-12 03:19:28 [botocore.parsers] DEBUG: Response headers: {'x-amz-id-2': 'O0i7fSYr7+J7qWdJ8PPDu4GeVfEM2ncQSvChuZLcxDcqsYy3SOKxZnLPkChrnHX8SYyUEDOpVSc=', 'x-amz-request-id': '44XMJPR899XP81R2', 'Date': 'Wed, 12 Jan 2022 03:19:29 GMT', 'ETag': '"d41d8cd98f00b204e9800998ecf8427e"', 'Server': 'AmazonS3', 'Content-Length': '0'}
2022-01-12 03:19:28 [botocore.parsers] DEBUG: Response body:
b''
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7ffad6df52b0>
2022-01-12 03:19:28 [botocore.retryhandler] DEBUG: No retry needed.
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7ffad6df5310>>
2022-01-12 03:19:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 319,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 5863,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 0.653967,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 1, 12, 3, 19, 28, 366189),
'log_count/DEBUG': 75,
'log_count/INFO': 10,
'log_count/WARNING': 1,
'memusage/max': 81354752,
'memusage/startup': 81354752,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2022, 1, 12, 3, 19, 27, 712222)}
2022-01-12 03:19:28 [scrapy.core.engine] INFO: Spider closed (finished)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.