Code Monkey home page Code Monkey logo

scrape_dark_sky_nick3499's Introduction

scrape_dark_sky_nick3499

Darksky API will no longer be available after December 31st 2022.

Scrape weather data: Python, BeautifulSoup, requests, datetime, dateutil.relativedelta

screen capture

Shebang Line

#!/usr/bin/env python

Unix reads that human-readable #! (shebang) as a magic number which instantiantes the executable text file (app.py) as a Python application.

>>> hex(ord('#'))
'0x23'
>>> hex(ord('!'))
'0x21'

Through the eyes of Unix, the shebang looks something like 0x23 0x21. In the mind of Unix, the following characters are recognized as the path of Python interpreter (or symlink). In other words, app.py can by started by entering the following in the CLI of a Unix-like terminal emulator:

$ ./app.py

Module Documentation

'''Scrape weather data from Dark Sky for personal/non-commercial use. Review DarkSky's TOS darksky.net/tos'''
>>> __doc__
"Scrape weather data from Dark Sky for personal/non-commercial use.\nReview DarkSky's TOS darksky.net/tos"

The module documentation is stored in the __doc__ string.

Import Modules

from datetime import datetime
from datetime import timedelta
from bs4 import BeautifulSoup
from requests import get
from figlet import get_figlet

A Python virtual environment should be used to develop and run the app in order to avoid changing Python system modules. For example, a new module could change the version of a system module.

The requirements.txt file lists the modules along with their specific versions, but the latest versions should work also.

Figlet

print(get_figlet())
figlet = '''\x1b[38;2;140;28;32m
   _____                             _____             _       _____ _
  / ____|                           |  __ \           | |     / ____| |
 | (___   ___ _ __ __ _ _ __   ___  | |  | | __ _ _ __| | __ | (___ | | ___   _
  \___ \ / __| '__/ _` | '_ \ / _ \ | |  | |/ _` | '__| |/ /  \___ \| |/ / | | |\x1b[38;2;119;121;174m
  ____) | (__| | | (_| | |_) |  __/ | |__| | (_| | |  |   <   ____) |   <| |_| |
 |_____/ \___|_|  \__,_| .__/ \___| |_____/ \__,_|_|  |_|\_\ |_____/|_|\_\___, |\x1b[38;2;140;28;32m
                       | |                                                 __/ |
                       |_|                                                |___/\x1b[0m'''

The get_figlet() method comes from the figlet module. And the figlet module keeps the ASCII art separate from the scraper code in app.py.

ref. figlet.org examples.

The \x1b[38;2;140;28;32m string sets RGB color values.

Request Data

REQ = get('https://darksky.net/forecast/40.9322,-73.899/us12/en')

The requests.get() method gets the HTML data from Dark Sky.

Parse Tree

SOUP = BeautifulSoup(REQ.text, 'html5lib')

The instruction above instantiates BeautifulSoup. REQ.text is the requested HTML markup code and 'html5lib' is a Python HTML parsing module.

3-Color Theme

THEME = {
    'c1': '\x1b[38;2;140;28;32m',
    'c2': '\x1b[38;2;119;121;174m',
    'c3': '\x1b[38;2;213;122;100m',
    'rset': '\x1b[0m'}  # 3-color theme; `rset` resets color to default

The escaped strings above are for string-formatting, and they establish a 3-color theme. For example, '{THEME['c2']}{s1[0]:<12}{THEME['rset']}' sets the color of the 'Current' string.

Titles

PG_TITLE = SOUP.title.string.strip()

SOUP.title.string gets the string from the title tag, and the strip() method removes whitespace and newline sequences \n.

print(f" {THEME['c1']}Weather data scraped from:{THEME['rset']} {PG_TITLE}")
print(f"{THEME['c1']}―――――――――――――――――――――――――――――――――――――{THEME['rset']}")
print(f" {THEME['c1']}Current conditions:{THEME['rset']}")  # subtitles
print(f" {THEME['c1']}Forecast:{THEME['rset']}")  # weekly forecast; temps/conditions

Subtitles are used to label sections. In this case, there is a Current conditions section followed by a Forecast section.

Current Conditions

CURR_COND_STR_1 = [
    ['Current', 'summary swap', ''],
    ['Feels like', 'feels-like-text', 'F'],
    ['Low', 'low-temp-text', 'F'],
    ['High', 'high-temp-text', 'F']]

for s1 in CURR_COND_STR_1:
    print(f" {THEME['c2']}{s1[0]:<12}{THEME['rset']}\
{SOUP.find('span', {'class': s1[1]}).string}{s1[2]}")

CURR_COND_STR_1 is assigned a list of nested lists which contain label strings, class value strings along with any extra strings. A for loop iterates over CURR_COND_STR_1 to print the current weather data. The same is done with CURR_COND_STR_2.

FORECAST_TODAY = SOUP.find(
    'span', {'class': 'currently__summary next swap'}).string.strip()
FORECAST_WEEK = SOUP.find('div', {'id': 'week'}).contents[1].contents[0].strip()
print(f" {THEME['c3']}Forecast today:{THEME['rset']} {FORECAST_TODAY}")
print(f" {THEME['c3']}Forecast week:{THEME['rset']} {FORECAST_WEEK}")

Finally, two lines are printed for today's and this weeks forecast.

Forecast

for i in range(0, 8):
    min_temp = SOUP.find('a', {'data-day': str(i)}).contents[3].contents[1].string
    max_temp = SOUP.find('a', {'data-day': str(i)}).contents[3].contents[5].string
    weekday_str = (datetime.now() + timedelta(days=i)).strftime('%a')
    wthr_day = SOUP.find(
        'a', {'data-day': str(i)}).contents[1].find(
            'span', {'class': 'skycon'}).img['alt'].split(' ')[0].replace(
                '-', ' ')  # condition
    print(f" {THEME['c2']}{weekday_str:<5}{THEME['rset']}{THEME['c3']}{'L':<2}\
{THEME['rset']}{min_temp:<5}{THEME['c3']}{'H':<2}{THEME['rset']}{max_temp:<5}\
{wthr_day}")  # print temps/conditions

The for loop above is used to print 8 lines of forecast temps along with general forecast descriptions (based on img alt text).

SOUP.find('a', {'data-day': str(i)}).contents[3].contents[1].string

The instruction above navigates through tags with data-day class to get the minor temp string, e.g. '42°'.

(datetime.now() + timedelta(days=i)).strftime('%a')

Within its looping structure, (datetime.now() + timedelta(days=i)).strftime('%a') iteratively advances the abbreviated weekday by one day with each iteration, e.g. Mon, Tue, Wed, etc.

SOUP.find('a', {'data-day': str(i)}).contents[1].find('span', {'class': 'skycon'}).img['alt'].split(' ')[0].replace('-', ' ')

The instruction above helps illustrate the parsing gymnastics required to hunt down the alt parameter of an img tag and format it and stored in wthr_day (see below).

print(f" {THEME['c2']}{day_str:<5}{THEME['rset']}{THEME['c3']}{'L':<2}{THEME['rset']}{min_temp:<5}{THEME['c3']}{'H':<2}{THEME['rset']}{max_temp:<5}{wthr_day}")

The instruction above prints the low/high temps along with the wthr_day variable manipulated from the img tag.

Sunrise/Sunset Times

SUNRISE_TIME = SOUP.find('span', {'class':'sunrise swip'}).contents[3].string
SUNSET_TIME = SOUP.find('span', {'class':'sunset swap'}).contents[3].string
print(f" {THEME['c2']}Sunrise:{THEME['rset']} {SUNRISE_TIME}{THEME['c2']} | Sunset:{THEME['rset']} {SUNSET_TIME}")

Finally, the sunrise/sunset times are printed.

SOUP.find('span', {'class':'sunrise swip'}).contents[3].string

SOUP.find('span', {'class':'sunrise swip'}) gets the specific span tag with sunrise swip class, then contents[3] navigates to the time string.

scrape_dark_sky_nick3499's People

Contributors

nick3499 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.