Code Monkey home page Code Monkey logo

python-textile's Introduction

python-textile

python-textile is a Python port of Textile, Dean Allen’s humane web text generator.

Installation

pip install textile

Dependencies:

  • html5lib
  • regex (The regex package causes problems with PyPy, and is not installed as a dependency in such environments. If you are upgrading a textile install on PyPy which had regex previously included, you may need to uninstall it.)

Optional dependencies include:

  • PIL/Pillow (for checking image sizes). If needed, install via pip install 'textile[imagesize]'

Usage

import textile
>>> s = """
... _This_ is a *test.*
...
... * One
... * Two
... * Three
...
... Link to "Slashdot":http://slashdot.org/
... """
>>> html = textile.textile(s)
>>> print html
	<p><em>This</em> is a <strong>test.</strong></p>

	<ul>
		<li>One</li>
		<li>Two</li>
		<li>Three</li>
	</ul>

	<p>Link to <a href="http://slashdot.org/">Slashdot</a></p>
>>>

Notes:

  • Active development supports Python 3.5 or later.

python-textile's People

Contributors

adam-iris avatar brad avatar brondsem avatar dinkypumpkin avatar dmishe avatar hupf avatar ikirudennis avatar ivanspengen-ct avatar jsamsa avatar kurtraschke avatar mitya57 avatar rczajka avatar robhudson avatar sebix avatar wbond avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-textile's Issues

Repeating list modifiers output incorrect HTML

If the input is something like this:

text="First line\n\n** Point 1\n* Point2\nLast line"
print textile.textile_restricted(text)

The output looks like this:

<p>First line</p>

<p> <ul>
    <li>Point 1</li>
</ul></li>
<ul>
    <li>Point2</li>
</ul><br />Last line</p>

Its the same for #, - and *. The behavior is the same for two or any more repeating.

Unicode support for links

Take the following textile link example:

"Superhuman":https://en.wikipedia.org/wiki/Superhuman
"Übermensch":https://de.wikipedia.org/wiki/Übermensch

This will be rendered as:

<p><a href="https://en.wikipedia.org/wiki/Superhuman">Superhuman</a><br />
&#8220;Übermensch&#8221;:https://de.wikipedia.org/wiki/Übermensch</p>

The second link will not be parsed correctly, since it contains the Unicode character 'Ü'. This happens because the links() function uses A-Za-z in the regular expression:

        pattern = r'''
            (?P<pre>[\s\[{(]|[%s])?         #leading text
            "                               #opening quote
            (?P<atts>%s)                    #block attributes
            (?P<text>[^"]+?)                #link text
            \s?
            (?:\((?P<title>[^)]+?)\)(?="))? #optional title
            ":                              #closing quote, colon
            (?P<url>(?:ftp|https?)?         #URL
                        (?: :// )?
                        [-A-Za-z0-9+&@#/?=~_()|!:,.;%%]*
                        [-A-Za-z0-9+&@#/=~_()|]
            )
            (?P<post>[^\w\/;]*?)        #trailing text
            (?=<|\s|$)
        ''' % (re.escape(punct), self.c)

        text = re.compile(pattern, re.X).sub(self.fLink, text)

Instead, \w in combination with the re.UNICODE flag should be used:

        pattern = r'''
            (?P<pre>[\s\[{(]|[%s])?         #leading text
            "                               #opening quote
            (?P<atts>%s)                    #block attributes
            (?P<text>[^"]+?)                #link text
            \s?
            (?:\((?P<title>[^)]+?)\)(?="))? #optional title
            ":                              #closing quote, colon
            (?P<url>(?:ftp|https?)?         #URL
                        (?: :// )?
                        [-\w+&@#/?=~()|!:,.;%%]*
                        [-\w+&@#/=~()|]
            )
            (?P<post>[^\w\/;]*?)        #trailing text
            (?=<|\s|$)
        ''' % (re.escape(punct), self.c)

        text = re.compile(pattern, re.X | re.UNICODE).sub(self.fLink, text)

There are other occurences of the a-z or A-Z expression in the source code... they should maybe replaced too.

ImportError on Python 3

Hi, I'm trying to use textile for a project of mine that I want to support Python 3. Importing the textile package triggers an ImportError.

You can reproduce this with tox. Here is my tox.ini file:

[tox]
envlist = py34

[testenv]
deps = nose
       coverage
commands = nosetests

I think the problem is related to relative imports. Here are my results:

matt@eden:~/python-textile$ tox -e py34
GLOB sdist-make: /home/matt/python-textile/setup.py
py34 inst-nodeps: /home/matt/python-textile/.tox/dist/textile-2.1.5.zip
py34 runtests: PYTHONHASHSEED='4057464796'
py34 runtests: commands[0] | nosetests
E
======================================================================
ERROR: Failure: ImportError (No module named 'functions')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/matt/python-textile/.tox/py34/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/matt/python-textile/.tox/py34/lib/python3.4/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/matt/python-textile/.tox/py34/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/matt/python-textile/.tox/py34/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/matt/python-textile/.tox/py34/lib/python3.4/imp.py", line 245, in load_module
    return load_package(name, filename)
  File "/home/matt/python-textile/.tox/py34/lib/python3.4/imp.py", line 217, in load_package
    return methods.load()
  File "<frozen importlib._bootstrap>", line 1220, in load
  File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1129, in _exec
  File "<frozen importlib._bootstrap>", line 1448, in exec_module
  File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
  File "/home/matt/python-textile/textile/__init__.py", line 1, in <module>
    from functions import textile, textile_restricted, Textile
ImportError: No module named 'functions'

Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
textile/__init__              2      1    50%   3
textile/functions           469    469     0%   4-1001
textile/tests/__init__      139    139     0%   2-442
textile/textilefactory       27     27     0%   1-74
textile/tools/__init__        0      0   100%   
textile/tools/imagesize      24     24     0%   1-41
textile/tools/sanitizer      20     20     0%   1-35
-------------------------------------------------------
TOTAL                       681    680     1%   
----------------------------------------------------------------------
Ran 1 test in 0.039s

FAILED (errors=1)
ERROR: InvocationError: '/home/matt/python-textile/.tox/py34/bin/nosetests'
___________________________________ summary ____________________________________
ERROR:   py34: commands failed

Empty lines in pre block break pre

Sorry, I can't think of a good title.

But if there are empty lines inside the pre, output is broken. Think the same thing happens with code or notextile blocks too. I will try to look at the code and figure out a fix, but I'm still early in my python development. :)

[# ] import textile

[# ] s = """<pre>
   ...: xxxxxxxx
   ...: yyyyyyyy
   ...: zzzzzzzz
   ...: </pre>
   ...: 
   ...: <pre>
   ...: aaaaaaa
   ...: 
   ...: bbbbbbb
   ...: 
   ...: ccccccc
   ...: </pre>
   ...: """

[# ] textile.textile(s)
Out[5]: '<pre>\nxxxxxxxx\nyyyyyyyy\nzzzzzzzz\n</pre>\n\n\t<p><pre><br />aaaaaaa</p>\n\n\t<p>bbbbbbb</p>\n\n\t<p>ccccccc<br /></pre></p>'
[# ] !grep version setup.py
    version='2.1.5',

Missing tags

The 'upstream' repository for textile has changed over time, and it seems some tags have been lost on the way.

As far as I can tell, version 2.1.4 corresponds to commit 84bdad3 and version 2.1.5 corresponds to commit 4554407.

Could you re-add them?

bad em markup in anchor href attribute

Generally I like how textile let's HTML pass through, but I recently came across a problem along the lines of the following:

import textile
print textile.textile("""<a href="_link_">label</a>""")

from which python-textile 2.1.5 produces:

<p><a href="<em>link</em>&#8220;>label</a></p>

which not only is wrong, but introduces a run-away attribute value that swallows parts of the page!

auto_link=True breaks link syntax

Given you have the following textile text:

"Superhuman":https://en.wikipedia.org/wiki/Superhuman
https://en.wikipedia.org/wiki/Superhuman
www.wikipedia.org
foo en.wikipedia.org/wiki/Superhuman bar

Rendered with auto_link=True, the output is:

<p>
&#8220;Superhuman&#8221;:<a href="https://en.wikipedia.org/wiki/Superhuman" rel="nofollow">https://en.wikipedia.org/wiki/Superhuman</a><br />
<a href="https://en.wikipedia.org/wiki/Superhuman" rel="nofollow">https://en.wikipedia.org/wiki/Superhuman</a><br />
<a href="www.wikipedia.org" rel="nofollow">www.wikipedia.org</a><br />
foo <a href="en.wikipedia.org/wiki/Superhuman" rel="nofollow">en.wikipedia.org/wiki/Superhuman</a> bar
</p>

Obviously autoLink() breaks the textile link syntax ("a link":http://example.org/). I'd rather expect the following output, with both working:

<p>
<a href="https://en.wikipedia.org/wiki/Superhuman" rel="nofollow">Superhuman</a><br />
<a href="https://en.wikipedia.org/wiki/Superhuman" rel="nofollow">https://en.wikipedia.org/wiki/Superhuman</a><br />
<a href="www.wikipedia.org" rel="nofollow">www.wikipedia.org</a><br />
foo <a href="en.wikipedia.org/wiki/Superhuman" rel="nofollow">en.wikipedia.org/wiki/Superhuman</a> bar
</p>

It probably has something to do with the leading \b in the autoLink() pattern which not only matches end of words:

pattern = re.compile(r"""\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))""", re.U | re.I)
        return pattern.sub(r'"\1":\1', text)

if there is not blank line between tables, textile will combine them together as one table

|6.12.27global| 10h00m39s239ms  074 | 38.46 | 1040 |some comments|
##title##
 || date ||version    ||data         ||data	||data	||comments||
|..|...

There are 2 tables, but textile always combine them together as one table.
I have to add blank lines like this:

|6.12.27global| 10h00m39s239ms  074 | 38.46 | 1040 |some comments|

##title##

 || date ||version    ||data         ||data	||data	||comments||
|..|...

EDIT 2017-02-06T21:04: Fix formatting

URLs containing ^ or * are not idenitified.

If a URL contains either of the above two characters, they are not correctly identified by the URL parser. The text will show up un-parsed. Test cases:

"Marketing Operations Managers":http://www.linkedin.com/groups?home=&gid=138990&goback=%2Enmp_*1_*1_*1_*1_*1_*1_*1_*1_*1&trk=grp-name 

"US Government bonds":http://finance.yahoo.com/echarts?s=^TYX+Interactive#chart2:symbol=^tyx;range=6m;indicator=volume;charttype=line;crosshair=on;ohlcvalues=0;logscale=on;source=undefined

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.