madisonmay / tomorrow Goto Github PK

Magic decorator syntax for asynchronous code in Python

License: MIT License

Python 100.00%

tomorrow's Introduction

Tomorrow

Magic decorator syntax for asynchronous code in Python 2.7.

Please don't actually use this in production. It's more of a thought experiment than anything else, and relies heavily on behavior specific to Python's old style classes. Pull requests, issues, comments and suggestions welcome.

Installation

Tomorrow is conveniently available via pip:

pip install tomorrow

or installable via git clone and setup.py

git clone [email protected]:madisonmay/Tomorrow.git
sudo python setup.py install

To ensure Tomorrow is properly installed, you can run the unittest suite from the project root:

nosetests -v

Usage

The tomorrow library enables you to utilize the benefits of multi-threading with minimal concern about the implementation details.

Behind the scenes, the library is a thin wrapper around the Future object in concurrent.futures that resolves the Future whenever you try to access any of its attributes.

Enough of the implementation details, let's take a look at how simple it is to speed up an inefficient chunk of blocking code with minimal effort.

Naive Web Scraper

You've collected a list of urls and are looking to download the HTML of the lot. The following is a perfectly reasonable first stab at solving the task.

For the following examples, we'll be using the top sites from the Alexa rankings.

urls = [
    'http://google.com',
    'http://facebook.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
]

Right then, let's get on to the code.

import time
import requests

def download(url):
    return requests.get(url)

if __name__ == "__main__":

    start = time.time()
    responses = [download(url) for url in urls]
    html = [response.text for response in responses]
    end = time.time()
    print "Time: %f seconds" % (end - start)

More Efficient Web Scraper

Using tomorrow's decorator syntax, we can define a function that executes in multiple threads. Individual calls to download are non-blocking, but we can largely ignore this fact and write code identically to how we would in a synchronous paradigm.

import time
import requests

from tomorrow import threads

@threads(5)
def download(url):
    return requests.get(url)

if __name__ == "__main__":
    start = time.time()
    responses = [download(url) for url in urls]
    html = [response.text for response in responses]
    end = time.time()
    print "Time: %f seconds" % (end - start)

Awesome! With a single line of additional code (and no explicit threading logic) we can now download websites ~10x as efficiently.

You can also optionally pass in a timeout argument, to prevent hanging on a task that is not guaranteed to return.

import time

from tomorrow import threads

@threads(1, timeout=0.1)
def raises_timeout_error():
    time.sleep(1)

if __name__ == "__main__":
    print raises_timeout_error()

How Does it Work?

Feel free to read the source for a peek behind the scenes -- it's less than 50 lines of code.

tomorrow's People

Contributors

Stargazers

Watchers

Forkers

stavxyz roger- bmannix codevlabs rgordeev honeyflyfish afthill weiland bryant1410 ankushjindal ivlyth h0bby kcbsbo ssundarraj amyweiner-udemy riddbengkok xyy19920105 giserh bazaha wutengcoding jacksnow00 jiajie999 whosyourban plucury ssoto isoyang dotoca yourmoonlight kantale shejianmin cho-jangwan furqanrydhan penyugalova suqi xidianwlc oudb huyidao625 graymissing dongin awesome-python zhangtb afei418 wujuguang widy28 tobetterman henry51 xman1212 dragonriver1990 tawateer sonlia beaulian orenr61 wings-xue liyuliang90 coopertian maxim-popkov moonlightlong firues bigdragontime lxkaka xyeshenyue zhausong shniu mauricioaburto hujunxianligong scorp2010 resolvewang mvmthecreator sukmabadar eruditus-group uploadskill pingf hugoren ownermz zeeshansultan huoyijie jianli2014 ethansunqing vault-the carrychang hhy5277 lovelyworm kanzihuang quicklysnail pansfy member63 melodya000 kingsun0 samuelogsantos daihaolong jxiaof jay-davisphem olegjakushkin schild zhangyucumt python-repository-hub ji3g4m6zo6 cryptowealth-technology iq-scm

tomorrow's Issues

cannot import name threads

coding=utf-8

import time
import requests
from tomorrow import threads
urls=[
'http://www.baidu.com',
'http://www.sina.com',
'http://www.ifeng.com',
'http://www.13393.com',
'http://www.bing.com'
]
@threads(5)
def dowload(url):
return requests.get(url)

if name == 'main':
start=time.time()
responses=[dowload(url) for url in urls]
html=[response.text for response in responses]
end=time.time()
print 'Time:%f seconds' %(end-start)

Traceback (most recent call last):
File "/home/kaka/workspace/PythonProject/Https/PC/tomorrowdemo.py", line 5, in
from tomorrow import threads
File "/home/kaka/workspace/PythonProject/Https/PC/tomorrow.py", line 4, in
ImportError: cannot import name threads

Clarify which versions of Python are supported

My sense is that this will work in Python 3.2 and later but not in Python 2. Is that correct? It would be useful if that was clearly stated in both the readme and in the pypi record.

Will it support retry after raising a timeout exception?

such as @threads(50,timeout=1), will easily raise an error, so how to retry the func then?
thanks for answering.

urls = ['http://p.3.cn/prices/mgets?skuIds=J_1273600'] * 1000
import time
import requests
# from multiprocessing.dummy import Pool
from tomorrow import threads


@threads(50,timeout=0.1)
def download(url):
    return requests.get(url)

if __name__ == "__main__":
    start = time.time()
    responses = [download(url) for url in urls]
    html = [len(response.text) for response in responses]
    print(html)
    end = time.time()
    print("Time: %f seconds" % (end - start))

some codes

"Syntax Error" in Python version 3.8.1 async name

The tomorrow module runs a "Syntax Error" in Python version 3.8.1 due to the name "async". I made a correction by changing the function name to "_async".

Change def async(n, base_type, timeout=None): to
def _async(n, base_type, timeout=None):

And def threads(n, timeout=None):

change return return async(n, ThreadPoolExecutor, timeout) to
return _async(n, ThreadPoolExecutor, timeout)

This code does not work on python3

✔ ~/projects[master ↓·2|…2]
10:25 $ python3 get_urls.py
Traceback (most recent call last):
File "get_urls.py", line 4, in
from tomorrow import threads
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tomorrow/init.py", line 1, in
from tomorrow import threads
ImportError: cannot import name 'threads'
✔ ~/projects[master ↓·2|…2]
10:25 $ python get_urls.py
Time: 4.356341 seconds

Old-style classes interfere with returning strings

I have no idea how to solve this one, but when trying to return strings from a function decorated with an @threads, rather than executing, it simply errors. Code below to reproduce:

def download_nm(delimiter, source, dump):
    images = [line.split(delimiter)[10] for line in open(source)][1:]  # avoid header row
    with open(dump, 'a') as sink:
        for i, image in enumerate(images):
            results = return_image_json(image)  # type(results) == <type 'instance'> 
            sink.write(results)  # Error, expected string or buffer

@threads(16)
def return_image_json(image_link):
    response = requests.get(image_link)
    encoded = "data:%s;base64,%s" % (response.headers['Content-Type'], base64.b64encode(response.content))
    return json.dumps({image_link: encoded}) + '\n'

It looks like moving to new-style classes resolves this issue, but since you're relying on some of the syntax-hacks of old-style classes I'm not sure if this is solvable.

Python 3 compatibility

run test.py error

does test.py only support python2? Runtime error if I use python3 interpreter cuz print function without parentheses
Runtime error also

Error
Traceback (most recent call last):
File "D:\project\Tomorrow\tests\test.py", line 95, in test_future_function
assert true()
TypeError: 'Tomorrow' object is not callable

<tomorrow.tomorrow.Tomorrow object at 0x029885D0>
<tomorrow.tomorrow.Tomorrow object at 0x02988730>

Failure
Traceback (most recent call last):
File "D:\project\Tomorrow\tests\test.py", line 68, in test_shared_executor
assert (N * DELAY) < (end - start) < (2 * N * DELAY)
AssertionError

<tomorrow.tomorrow.Tomorrow object at 0x02988810>
<tomorrow.tomorrow.Tomorrow object at 0x02988930>

Function returns object type of Tomorrow when decorated

The function with the decoration @threads returns an object of the class Tomorrow even when I return an integer or some other type from the function.

Sample code

from tomorrow import threads

@threads(10)
def test(i):
    return i

for i in range(10):
    print(test(i))

Output

<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x0228FAD0>
<tomorrow.tomorrow.Tomorrow object at 0x01F44B90>
<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x0228FAD0>
<tomorrow.tomorrow.Tomorrow object at 0x01F44B90>
<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x0228FAD0>

Document `timeout` argument.

Should probably end up in README.md.

Tomorrow prevents script from terminating

When tomorrow is used the python script won't end. Ctrl+C won't stop it, only closing the process alltogether will stop it. Needless to say, this is pretty frustrating.

tomorrow pip package installs tests for no real reason

Hello,

After doing pip install tomorrow - you will find that your tests directory is installed along with the tomorrow package into your site-packages.

Here are complete contents of site-packages/tomorrow-0.2.3-py3.4.egg-info/installed-files.txt file on my ubuntu box:

../tests/test.py
../tests/__init__.py
../tomorrow/__init__.py
../tomorrow/tomorrow.py
../tests/__pycache__/test.cpython-34.pyc
../tests/__pycache__/__init__.cpython-34.pyc
../tomorrow/__pycache__/__init__.cpython-34.pyc
../tomorrow/__pycache__/tomorrow.cpython-34.pyc
./
SOURCES.txt
dependency_links.txt
top_level.txt
requires.txt
PKG-INFO

Most packages do not distribute their own unit tests. The ones that do - have a uniquely named package, so that it would never clash with anything a user project might have.

This is how I found out about the bug, I was trying to import tests.fixtures in my own project, where file tests/fixtures.py exists and I kept getting an ImportError.

Please consider adding exclude argument in find_packages call in your setup.py file. Thanks.

Magic decorator syntax for asynchronous code in Python 2.7+