Code Monkey home page Code Monkey logo

tomorrow's Introduction

Codeship Status for madisonmay/Tomorrow

Tomorrow

Magic decorator syntax for asynchronous code in Python 2.7.

Please don't actually use this in production. It's more of a thought experiment than anything else, and relies heavily on behavior specific to Python's old style classes. Pull requests, issues, comments and suggestions welcome.

Installation

Tomorrow is conveniently available via pip:

pip install tomorrow

or installable via git clone and setup.py

git clone [email protected]:madisonmay/Tomorrow.git
sudo python setup.py install

To ensure Tomorrow is properly installed, you can run the unittest suite from the project root:

nosetests -v 

Usage

The tomorrow library enables you to utilize the benefits of multi-threading with minimal concern about the implementation details.

Behind the scenes, the library is a thin wrapper around the Future object in concurrent.futures that resolves the Future whenever you try to access any of its attributes.

Enough of the implementation details, let's take a look at how simple it is to speed up an inefficient chunk of blocking code with minimal effort.

Naive Web Scraper

You've collected a list of urls and are looking to download the HTML of the lot. The following is a perfectly reasonable first stab at solving the task.

For the following examples, we'll be using the top sites from the Alexa rankings.

urls = [
    'http://google.com',
    'http://facebook.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
]

Right then, let's get on to the code.

import time
import requests

def download(url):
    return requests.get(url)

if __name__ == "__main__":

    start = time.time()
    responses = [download(url) for url in urls]
    html = [response.text for response in responses]
    end = time.time()
    print "Time: %f seconds" % (end - start)

More Efficient Web Scraper

Using tomorrow's decorator syntax, we can define a function that executes in multiple threads. Individual calls to download are non-blocking, but we can largely ignore this fact and write code identically to how we would in a synchronous paradigm.

import time
import requests

from tomorrow import threads

@threads(5)
def download(url):
    return requests.get(url)

if __name__ == "__main__":
    start = time.time()
    responses = [download(url) for url in urls]
    html = [response.text for response in responses]
    end = time.time()
    print "Time: %f seconds" % (end - start)

Awesome! With a single line of additional code (and no explicit threading logic) we can now download websites ~10x as efficiently.

You can also optionally pass in a timeout argument, to prevent hanging on a task that is not guaranteed to return.

import time

from tomorrow import threads

@threads(1, timeout=0.1)
def raises_timeout_error():
    time.sleep(1)

if __name__ == "__main__":
    print raises_timeout_error()

How Does it Work?

Feel free to read the source for a peek behind the scenes -- it's less than 50 lines of code.

tomorrow's People

Contributors

homebysix avatar kcbsbo avatar madisonmay avatar oooeeee avatar plucury avatar stavxyz avatar weiland avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tomorrow's Issues

cannot import name threads

coding=utf-8

import time
import requests
from tomorrow import threads
urls=[
'http://www.baidu.com',
'http://www.sina.com',
'http://www.ifeng.com',
'http://www.13393.com',
'http://www.bing.com'
]
@threads(5)
def dowload(url):
return requests.get(url)

if name == 'main':
start=time.time()
responses=[dowload(url) for url in urls]
html=[response.text for response in responses]
end=time.time()
print 'Time:%f seconds' %(end-start)

Traceback (most recent call last):
File "/home/kaka/workspace/PythonProject/Https/PC/tomorrowdemo.py", line 5, in
from tomorrow import threads
File "/home/kaka/workspace/PythonProject/Https/PC/tomorrow.py", line 4, in
ImportError: cannot import name threads

Clarify which versions of Python are supported

My sense is that this will work in Python 3.2 and later but not in Python 2. Is that correct? It would be useful if that was clearly stated in both the readme and in the pypi record.

Will it support retry after raising a timeout exception?

such as @threads(50,timeout=1), will easily raise an error, so how to retry the func then?
thanks for answering.

urls = ['http://p.3.cn/prices/mgets?skuIds=J_1273600'] * 1000
import time
import requests
# from multiprocessing.dummy import Pool
from tomorrow import threads


@threads(50,timeout=0.1)
def download(url):
    return requests.get(url)

if __name__ == "__main__":
    start = time.time()
    responses = [download(url) for url in urls]
    html = [len(response.text) for response in responses]
    print(html)
    end = time.time()
    print("Time: %f seconds" % (end - start))

some codes

"Syntax Error" in Python version 3.8.1 async name

The tomorrow module runs a "Syntax Error" in Python version 3.8.1 due to the name "async". I made a correction by changing the function name to "_async".

Change def async(n, base_type, timeout=None): to
def _async(n, base_type, timeout=None):

And def threads(n, timeout=None):

change return return async(n, ThreadPoolExecutor, timeout) to
return _async(n, ThreadPoolExecutor, timeout)

This code does not work on python3

✔ ~/projects[master ↓·2|…2]
10:25 $ python3 get_urls.py
Traceback (most recent call last):
File "get_urls.py", line 4, in
from tomorrow import threads
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tomorrow/init.py", line 1, in
from tomorrow import threads
ImportError: cannot import name 'threads'
✔ ~/projects[master ↓·2|…2]
10:25 $ python get_urls.py
Time: 4.356341 seconds

Old-style classes interfere with returning strings

I have no idea how to solve this one, but when trying to return strings from a function decorated with an @threads, rather than executing, it simply errors. Code below to reproduce:

def download_nm(delimiter, source, dump):
    images = [line.split(delimiter)[10] for line in open(source)][1:]  # avoid header row
    with open(dump, 'a') as sink:
        for i, image in enumerate(images):
            results = return_image_json(image)  # type(results) == <type 'instance'> 
            sink.write(results)  # Error, expected string or buffer

@threads(16)
def return_image_json(image_link):
    response = requests.get(image_link)
    encoded = "data:%s;base64,%s" % (response.headers['Content-Type'], base64.b64encode(response.content))
    return json.dumps({image_link: encoded}) + '\n'

It looks like moving to new-style classes resolves this issue, but since you're relying on some of the syntax-hacks of old-style classes I'm not sure if this is solvable.

run test.py error

  1. does test.py only support python2? Runtime error if I use python3 interpreter cuz print function without parentheses
  2. Runtime error also

Error
Traceback (most recent call last):
File "D:\project\Tomorrow\tests\test.py", line 95, in test_future_function
assert true()
TypeError: 'Tomorrow' object is not callable

<tomorrow.tomorrow.Tomorrow object at 0x029885D0>
<tomorrow.tomorrow.Tomorrow object at 0x02988730>

Failure
Traceback (most recent call last):
File "D:\project\Tomorrow\tests\test.py", line 68, in test_shared_executor
assert (N * DELAY) < (end - start) < (2 * N * DELAY)
AssertionError

<tomorrow.tomorrow.Tomorrow object at 0x02988810>
<tomorrow.tomorrow.Tomorrow object at 0x02988930>

Function returns object type of Tomorrow when decorated

The function with the decoration @threads returns an object of the class Tomorrow even when I return an integer or some other type from the function.

Sample code

from tomorrow import threads

@threads(10)
def test(i):
    return i

for i in range(10):
    print(test(i))

Output

<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x0228FAD0>
<tomorrow.tomorrow.Tomorrow object at 0x01F44B90>
<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x0228FAD0>
<tomorrow.tomorrow.Tomorrow object at 0x01F44B90>
<tomorrow.tomorrow.Tomorrow object at 0x022AA030>
<tomorrow.tomorrow.Tomorrow object at 0x0228FAD0>

Tomorrow prevents script from terminating

When tomorrow is used the python script won't end. Ctrl+C won't stop it, only closing the process alltogether will stop it. Needless to say, this is pretty frustrating.

tomorrow pip package installs tests for no real reason

Hello,

After doing pip install tomorrow - you will find that your tests directory is installed along with the tomorrow package into your site-packages.

Here are complete contents of site-packages/tomorrow-0.2.3-py3.4.egg-info/installed-files.txt file on my ubuntu box:

../tests/test.py
../tests/__init__.py
../tomorrow/__init__.py
../tomorrow/tomorrow.py
../tests/__pycache__/test.cpython-34.pyc
../tests/__pycache__/__init__.cpython-34.pyc
../tomorrow/__pycache__/__init__.cpython-34.pyc
../tomorrow/__pycache__/tomorrow.cpython-34.pyc
./
SOURCES.txt
dependency_links.txt
top_level.txt
requires.txt
PKG-INFO

Most packages do not distribute their own unit tests. The ones that do - have a uniquely named package, so that it would never clash with anything a user project might have.

This is how I found out about the bug, I was trying to import tests.fixtures in my own project, where file tests/fixtures.py exists and I kept getting an ImportError.

Please consider adding exclude argument in find_packages call in your setup.py file. Thanks.

threads(n) get rid of that n!

Hi madison,
I am very excited of your python module. But can you change the code so that I can use threads() as decorator but not set an n. So the number of threads append dynamicly to the threadpool?
Thank you very much.

Example contains two imports of time module.

In the second example where you added Tomorrow to the first example, you had a second import of time after the ifmain. Is this necessary?

I didn't see any explanation of why it was added.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.