Code Monkey home page Code Monkey logo

johndpope / utl_web_scraping_top_cnn_stories Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rogerjdeangelis/utl_web_scraping_top_cnn_stories

0.0 2.0 0.0 7 KB

Web scraping top CNN stories. Keywords: sas sql join merge big data analytics macros oracle teradata mysql sas communities stackoverflow statistics artificial inteligence AI Python R Java Javascript WPS Matlab SPSS Scala Perl C C# Excel MS Access JSON graphics maps NLP natural language processing machine learning igraph DOSUBL DOW loop stackoverflow SAS community.

License: MIT License

SAS 100.00%

utl_web_scraping_top_cnn_stories's Introduction

utl_web_scraping_top_cnn_stories

Web scraping top CNN stories. Keywords: sas sql join merge big data analytics macros oracle teradata mysql sas communities stackoverflow statistics artificial inteligence AI Python R Java Javascript WPS Matlab SPSS Scala Perl C C# Excel MS Access JSON graphics maps NLP natural language processing machine learning igraph DOSUBL DOW loop stackoverflow SAS community.

Web scraping top CNN stories

github
https://github.com/rogerjdeangelis/utl_web_scraping_top_cnn_stories

see
https://tinyurl.com/y8o2n36h
https://stackoverflow.com/questions/50998136/how-to-extract-news-web-scrapings-into-csv-file-and-how-to-append-new-records


INPUT (CNN Page)
=================

 http://money.cnn.com
                               ___ _ __  _ __    _ __ ___   ___  _ __   ___ _   _
                              / __| '_ \| '_ \  | '_ ` _ \ / _ \| '_ \ / _ \ | | |
                             | (__| | | | | | | | | | | | | (_) | | | |  __/ |_| |
                              \___|_| |_|_| |_| |_| |_| |_|\___/|_| |_|\___|\__, |
   *** We Want These Top Stories ***                                         |___/
   =================================
+----------------------------------------+                                    +---------------------+
| TOP STRORIES                           |                                    | MOST ACTIVE STOCKS  |
|  The weird reason that mighty Amazon   |    GLEN BECKS WALKS OFF CNN        |                     |
|                                        |                                    |   S&P 500S&P 1500   |
|  Xiaomi wants to raise over $6 billion |    INTERVIEW OVER QUESTIONS        |   ---------------   |
|                                        |                                    |                     |
|  Jurassic World sequel crosses $700    |    ON THE MEDIA                    |   FCX  +0.08/0.49%  |
|                                        |                                    |                     |
|  Why GE may need to stop paying its    |                                    |   AKS +0.21/4.50%   |
|                                        |                                    |                     |
|  Toyota updates the Century, car of    |                                    |   CLF +0.40/4.65%   |
| ...                                    |                                    | ...                 |
|                                        |                                    |                     |
+----------------------------------------+                                    +---------------------+


 EXAMPLE OUTPUT

 WORK.WANTWPS total obs=39

 INDEX    TOPSTORIES

    0     The weird reason that mighty Amazon isn't in the Dow
    1     Xiaomi wants to raise over $6 billion in Hong Kong IPO
    2     Jurassic World sequel crosses $700 million at global box office
    3     Why GE may need to stop paying its 119-year old dividend
    4     Toyota updates the Century, car of choice for Japan's elites
    5     One innovative solution to the health care worker shortage
    6     What higher wages means for Domino's and McDonald's
    7     Apple promises free repairs for faulty MacBook keyboards
    8     'Jurassic World' sequel has big opening day amid a surging box office
    9     What's behind Tom Arnold's bizarre anti-Trump media blitz


PROCESS  (working code - two lines of code?)
=======================================

Key Text
<span class="aa8468e9 e01d3fdb">The weird reason that mighty Amazon isn&#x27;t in the Dow<!-- -->
</span>

d = soup(requests.get('http://money.cnn.com/').text, 'html.parser');
articles = list(filter(None, [i.text for i in d.find_all('span', {'class':re.compile('^\w+ _\w+|^\w+$')})]))[2:];


OUTPUT
======

WORK.WANTWPS total obs=39

 INDEX    TOPSTORIES

    0     The weird reason that mighty Amazon isn't in the Dow
    1     Xiaomi wants to raise over $6 billion in Hong Kong IPO
    2     Jurassic World sequel crosses $700 million at global box office
    3     Why GE may need to stop paying its 119-year old dividend
    4     Toyota updates the Century, car of choice for Japan's elites
    5     One innovative solution to the health care worker shortage
    6     What higher wages means for Domino's and McDonald's
    7     Apple promises free repairs for faulty MacBook keyboards
    8     'Jurassic World' sequel has big opening day amid a surging box office
    9     What's behind Tom Arnold's bizarre anti-Trump media blitz
   ...
   33     Drought woes? This tech can literally make it rain
   34     Airbus: Brexit chaos threatens our future in UK
   35     Report: Prosecutors subpoena National Enquirer records in Michael Cohen investigation
   36     Rachel Maddow breaks down in tears while discussing border crisis
   37     Nearly a quarter of Americans have no emergency savings
   38     Top bitcoin exchange says over $30 million in cryptocurrencies stolen


*                _               _       _
 _ __ ___   __ _| | _____     __| | __ _| |_ __ _
| '_ ` _ \ / _` | |/ / _ \   / _` |/ _` | __/ _` |
| | | | | | (_| |   <  __/  | (_| | (_| | || (_| |
|_| |_| |_|\__,_|_|\_\___|   \__,_|\__,_|\__\__,_|

;

  http://money.cnn.com

 *          _       _   _
 ___  ___ | |_   _| |_(_) ___  _ __
/ __|/ _ \| | | | | __| |/ _ \| '_ \
\__ \ (_) | | |_| | |_| | (_) | | | |
|___/\___/|_|\__,_|\__|_|\___/|_| |_|

;

%utl_submit_wps64("
options set=PYTHONHOME 'C:\Progra~1\Python~1.5\';
options set=PYTHONPATH 'C:\Progra~1\Python~1.5\\lib\';
libname wrk sas7bdat '%sysfunc(pathname(work))';
proc python;
submit;
from bs4 import BeautifulSoup as soup;
import requests, re;
import pandas as pd;
import csv;
d = soup(requests.get('http://money.cnn.com/').text, 'html.parser');
articles = list(filter(None, [i.text for i in d.find_all('span', {'class':re.compile('^\w+ _\w+|^\w+$')})]))[2:];
want=pd.DataFrame(articles);
want.columns = ['TopStories'];
want.reset_index(inplace=True);
endsubmit;
import python=want data=wrk.wantwps;
run;quit;
");

LOG

1         options set=PYTHONHOME 'C:\Progra~1\Python~1.5\';
2         options set=PYTHONPATH 'C:\Progra~1\Python~1.5\\lib\';
3         libname wrk sas7bdat 'e:\saswork\wrk\_TD6532_BEAST_';
NOTE: Library wrk assigned as follows:
      Engine:        SAS7BDAT
      Physical Name: e:\saswork\wrk\_TD6532_BEAST_

4         proc python;
5         submit;
6         from bs4 import BeautifulSoup as soup
7         import requests, re
8         import pandas as pd
9         import csv
10        d = soup(requests.get('http://money.cnn.com/').text, 'html.parser')
11        articles = list(filter(None, [i.text for i in d.find_all('span', {'class
12        want=pd.DataFrame(articles)
13        want.columns = ['TopStories']
14        want.reset_index(inplace=True)
15        endsubmit;

NOTE: Submitting statements to Python:


16        import python=want data=wrk.wantwps;
NOTE: Creating data set 'WRK.wantwps' from Python data frame 'want'
NOTE: Data set "WRK.wantwps" has 39 observation(s) and 2 variable(s)

17        run;
NOTE: Procedure python step took :
      real time : 1.301
      cpu time  : 0.015

utl_web_scraping_top_cnn_stories's People

Contributors

rogerjdeangelis avatar

Watchers

John D. Pope avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.