Code Monkey home page Code Monkey logo

Comments (6)

epogrebnyak avatar epogrebnyak commented on June 24, 2024
from bs4 import BeautifulSoup
import urllib.request

url = 'http://www.dohod.ru/ik/analytics/dividend/rtkmp'

# *requests* fails on SSL, using *urllib.request*
with urllib.request.urlopen(url) as response:
   html = response.read().decode("utf-8") 
   
# following https://gist.github.com/phillipsm/0ed98b2585f0ada5a769
#           https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all
soup = BeautifulSoup(html, 'lxml') 

# идентифицировать нужную таблицу
table = soup.find_all('table')[2] 

# выдать строки
for html_row in table.find_all('tr'):
    row = [column.text.strip() for column in html_row.find_all('td')]    
    print (row)

from poptimizer_old.

epogrebnyak avatar epogrebnyak commented on June 24, 2024
# following https://gist.github.com/phillipsm/0ed98b2585f0ada5a769
#           https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all

from bs4 import BeautifulSoup
import pandas as pd
import urllib.request
from pathlib import Path


def get_text(url):
    # *requests* fails on SSL, using *urllib.request*
    with urllib.request.urlopen(url) as response:
       return response.read().decode("utf-8") 

def yield_rows(table: str):
    for html_row in table.find_all('tr'):
        row = [column.text.strip() for column in html_row.find_all('td')]    
        try:
            yield dict(DATE=pd.to_datetime(row[0]), DIVIDEND=row[2])
        except (IndexError, ValueError) as e:
            print("Not parsed:", row)
 
def make_dataframe(table: str):        
    df = pd.DataFrame(columns=('DATE', 'DIVIDEND'))
    for row in yield_rows(table):
        df = df.append(row, ignore_index=True)
    return df.set_index('DATE').drop_duplicates()
   
if __name__ == '__main__':
    url = 'http://www.dohod.ru/ik/analytics/dividend/rtkmp'
    # dirty cache   
    cache = Path('temp.txt')
    if cache.exists():
       html = cache.read_text()
    else:
       html = get_text(url)  
       cache.write_text(html)      
    # identify table 
    soup = BeautifulSoup(html, 'lxml') 
    table = soup.find_all('table')[2] 
    df = make_dataframe(table)
    print(df)
    

from poptimizer_old.

epogrebnyak avatar epogrebnyak commented on June 24, 2024
Not parsed: []
Not parsed: ['07.07.2018 (прогноз)', '01.08.2018', '6.38', '8.49%']
           DIVIDEND
DATE               
2017-07-07     5.39
2016-08-07     5.92
2015-03-07     4.05
2014-07-14     4.85
2013-04-30      4.1
2012-04-28      4.7
2011-10-05   0.4344
2010-09-21     1.67
2010-07-05      2.1
2009-04-14     2.91
2008-04-22     3.88
2007-04-27     2.96
2006-05-05     3.72
2005-06-05     2.97
2004-11-05     3.25
2003-06-15     1.27

from poptimizer_old.

WLM1ke avatar WLM1ke commented on June 24, 2024

А что это такое # requests fails on SSL, using urllib.request - вроде нормально requests работает

from poptimizer_old.

epogrebnyak avatar epogrebnyak commented on June 24, 2024

Updated code above

from poptimizer_old.

WLM1ke avatar WLM1ke commented on June 24, 2024

Набросал версию

from poptimizer_old.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.