ed1123 / mef-scraper Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Ran MEF_2 Scraper with scrapy crawl mef_2 including the S3 configuration. Spider ran, even created the file on the S3, but didn't gather any data.
Logs:
2022-01-12 03:19:27 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: aliaxis)
2022-01-12 03:19:27 [scrapy.utils.log] INFO: Versions: lxml 4.7.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.10 (default, Nov 26 2021, 20:14:08) - [GCC 9.3.0], pyOpenSSL 21.0.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 36.0.1, Platform Linux-5.11.0-1025-aws-x86_64-with-glibc2.29
2022-01-12 03:19:27 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2022-01-12 03:19:27 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'aliaxis',
'NEWSPIDER_MODULE': 'aliaxis.spiders',
'SPIDER_MODULES': ['aliaxis.spiders']}
2022-01-12 03:19:27 [scrapy.extensions.telnet] INFO: Telnet Password: 1684106a3389090c
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-call.apigateway to before-call.api-gateway
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..autoscaling.CreateLaunchConfiguration.complete-section to docs..auto-scaling.CreateLaunchConfiguration.complete-section
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..logs.CreateExportTask.complete-section to docs..cloudwatch-logs.CreateExportTask.complete-section
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..cloudsearchdomain.Search.complete-section to docs..cloudsearch-domain.Search.complete-section
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/endpoints.json
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event choose-service-name: calling handler <function handle_service_name_alias at 0x7ffad743f5e0>
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/s3/2006-03-01/service-2.json
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7ffad74e89d0>
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7ffad74e8790>
2022-01-12 03:19:27 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/_retry.json
2022-01-12 03:19:27 [botocore.client] DEBUG: Registering retry handlers for service: s3
2022-01-12 03:19:27 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2022-01-12 03:19:27 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-01-12 03:19:27 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-01-12 03:19:27 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-01-12 03:19:27 [scrapy.core.engine] INFO: Spider opened
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-call.apigateway to before-call.api-gateway
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..autoscaling.CreateLaunchConfiguration.complete-section to docs..auto-scaling.CreateLaunchConfiguration.complete-section
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..logs.CreateExportTask.complete-section to docs..cloudwatch-logs.CreateExportTask.complete-section
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Changing event name from docs..cloudsearchdomain.Search.complete-section to docs..cloudsearch-domain.Search.complete-section
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/endpoints.json
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event choose-service-name: calling handler <function handle_service_name_alias at 0x7ffad743f5e0>
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/s3/2006-03-01/service-2.json
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7ffad74e89d0>
2022-01-12 03:19:27 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7ffad74e8790>
2022-01-12 03:19:27 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2022-01-12 03:19:27 [botocore.loaders] DEBUG: Loading JSON file: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/data/_retry.json
2022-01-12 03:19:27 [botocore.client] DEBUG: Registering retry handlers for service: s3
2022-01-12 03:19:27 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-01-12 03:19:27 [py.warnings] WARNING: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/scrapy/spidermiddlewares/offsite.py:65: URLWarning: allowed_domains accepts only domains, not URLs. Ignoring URL entry https://apps5.mineco.gob.pe/bingos/seguimiento_pi/Navegador/default.aspx in allowed_domains.
warnings.warn(message, URLWarning)
2022-01-12 03:19:27 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2022-01-12 03:19:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://apps5.mineco.gob.pe/bingos/seguimiento_pi/Navegador/Navegar_2.aspx?_tgt=xls&_uhc=yes&0=&31=&y=2021&cpage=1&psize=1000000> (referer: None)
2022-01-12 03:19:28 [scrapy.core.engine] INFO: Closing spider (finished)
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7ffad74609d0>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7ffad745adc0>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7ffad7461310>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7ffad745ad30>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7ffad6df5310>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7ffad6df53d0>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function generate_idempotent_uuid at 0x7ffad745ab80>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7ffad75d3c10>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7ffad74600d0>
2022-01-12 03:19:28 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7ffad6df5310>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function inject_api_version_header_if_needed at 0x7ffad7461430>
2022-01-12 03:19:28 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) with params: {'url_path': '/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'Botocore/1.23.33 Python/3.8.10 Linux/5.11.0-1025-aws', 'Content-MD5': '1B2M2Y8AsgTpgAmY7PhCfg==', 'Expect': '100-continue'}, 'body': <tempfile._TemporaryFileWrapper object at 0x7ffad6b2ea60>, 'url': 'https://s3.amazonaws.com/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7ffad70d7820>, 'has_streaming_input': True, 'auth_type': None, 'signing': {'bucket': 'rpa.dev'}}}
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7ffad70d7790>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <bound method S3EndpointSetter.set_signer of <botocore.utils.S3EndpointSetter object at 0x7ffad6df5460>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7ffad6b2e460>>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <function set_operation_specific_signer at 0x7ffad745aa60>
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7ffad6df5460>>
2022-01-12 03:19:28 [botocore.utils] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2022-01-12 03:19:28 [botocore.utils] DEBUG: Checking for DNS compatible bucket for: https://s3.amazonaws.com/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx
2022-01-12 03:19:28 [botocore.utils] DEBUG: Not changing URI, bucket is not DNS compatible: rpa.dev
2022-01-12 03:19:28 [botocore.auth] DEBUG: Calculating signature using v4 auth.
2022-01-12 03:19:28 [botocore.auth] DEBUG: CanonicalRequest:
PUT
/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx
content-md5:1B2M2Y8AsgTpgAmY7PhCfg==
host:s3.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20220112T031928Z
content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
2022-01-12 03:19:28 [botocore.auth] DEBUG: StringToSign:
AWS4-HMAC-SHA256
20220112T031928Z
20220112/us-east-1/s3/aws4_request
bb4ddbbecc3c4ba398451831abf5e99d6e1ad3760b228f72b4c8ca3e42f4edf8
2022-01-12 03:19:28 [botocore.auth] DEBUG: Signature:
c794a13fb293b4d8b24ef2f909fdcbb075057a1675cd10e4e3dbc2f3081e23d4
2022-01-12 03:19:28 [botocore.endpoint] DEBUG: Sending http request: <AWSPreparedRequest stream_output=False, method=PUT, url=https://s3.amazonaws.com/rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx, headers={'User-Agent': b'Botocore/1.23.33 Python/3.8.10 Linux/5.11.0-1025-aws', 'Content-MD5': b'1B2M2Y8AsgTpgAmY7PhCfg==', 'Expect': b'100-continue', 'X-Amz-Date': b'20220112T031928Z', 'X-Amz-Content-SHA256': b'UNSIGNED-PAYLOAD', 'Authorization': b'AWS4-HMAC-SHA256 Credential=AKIAX2RPQC4SOXKMSZEI/20220112/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=c794a13fb293b4d8b24ef2f909fdcbb075057a1675cd10e4e3dbc2f3081e23d4', 'Content-Length': '0'}>
2022-01-12 03:19:28 [botocore.httpsession] DEBUG: Certificate path: /home/ubuntu/envs/aliaxis_envs/env1/lib/python3.8/site-packages/botocore/cacert.pem
2022-01-12 03:19:28 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): s3.amazonaws.com:443
2022-01-12 03:19:28 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2022-01-12 03:19:28 [botocore.awsrequest] DEBUG: 100 Continue response seen, now sending request body.
2022-01-12 03:19:28 [urllib3.connectionpool] DEBUG: https://s3.amazonaws.com:443 "PUT /rpa.dev/rpa_output/mef_2/output/2022-01-12T03-19-27.xlsx HTTP/1.1" 200 0
2022-01-12 03:19:28 [botocore.parsers] DEBUG: Response headers: {'x-amz-id-2': 'O0i7fSYr7+J7qWdJ8PPDu4GeVfEM2ncQSvChuZLcxDcqsYy3SOKxZnLPkChrnHX8SYyUEDOpVSc=', 'x-amz-request-id': '44XMJPR899XP81R2', 'Date': 'Wed, 12 Jan 2022 03:19:29 GMT', 'ETag': '"d41d8cd98f00b204e9800998ecf8427e"', 'Server': 'AmazonS3', 'Content-Length': '0'}
2022-01-12 03:19:28 [botocore.parsers] DEBUG: Response body:
b''
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7ffad6df52b0>
2022-01-12 03:19:28 [botocore.retryhandler] DEBUG: No retry needed.
2022-01-12 03:19:28 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7ffad6df5310>>
2022-01-12 03:19:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 319,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 5863,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 0.653967,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 1, 12, 3, 19, 28, 366189),
'log_count/DEBUG': 75,
'log_count/INFO': 10,
'log_count/WARNING': 1,
'memusage/max': 81354752,
'memusage/startup': 81354752,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2022, 1, 12, 3, 19, 27, 712222)}
2022-01-12 03:19:28 [scrapy.core.engine] INFO: Spider closed (finished)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.