gjtorikian / html-pipeline Goto Github PK
View Code? Open in Web Editor NEWHTML processing filters and utilities
License: MIT License
HTML processing filters and utilities
License: MIT License
Add the table section elements to the whitelist.
Table sections (thead, tbody, tfoot) are important table elements that control how a table gets rendered. If handled with the same restrictions as the table element (they can only contain tr, th and td elements), allowing them does not impose any security risk.
Hi there,
I have been trying to work out how to stop newline's being inserted into a (github flavour) markdown blockquote.
If I have a markdown file like this:
> this is a start of a quote
> this is a continuation of a quote
according to the docs, github markdown does not put a <br>
tag in there.
I have been using your excellent pipeline in a small gem I created for using markdown with the excellent vimwiki plugin, and I keep getting <br>
tags inside my generated html. I'm happy to create a test case if it'll help, but I'm wondering if you can tell me what (if any) other filters I should be using. Currently it just uses your sample ones:
pipeline = HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::SyntaxHighlightFilter
]
Any help most appreciated!
Right now HTML::Pipeline::MentionFilter.new "test @benbalter test"
will return the input string, while filter = HTML::Pipeline::MentionFilter.new "<p>test @benbalter test</p>"
will return the expected @mentioned string.
I believe this is due to the doc.search('text()')
pattern. Would be awesome if html-pipeline could support arbitrary strings, as right now I believe the input must be HTML, or the first filter must be the markdown filter for the expected behavior to occur.
At the very least, documentation could help clear things up for new users.
I'm a little surprised that the html5 "summary" tag is whitelisted, but the "details" tag (that it is used with) is not whitelisted:
http://html5doctor.com/the-details-and-summary-elements/
Might it be possible to include the "details" tag in the white list? I think this could be a really useful feature
The builds are failing because ActiveSupport 4.x requires Ruby 1.9:
Installing activesupport (4.0.0)
Gem::InstallError: activesupport requires Ruby version >= 1.9.3.
An error occurred while installing activesupport (4.0.0), and Bundler cannot
continue.
Make sure that `gem install activesupport -v '4.0.0'` succeeds before bundling.
Need to add separate gemfiles for CI to fix this.
#48 kickstarted discussion, and here is a plan for placing dependency management on Filters.
:test
blockcharlock_holmes is a hassle to deploy on Heroku (brianmario/charlock_holmes#4). Could github/linguist (which depends on charlock_holmes) be an optional dependency? I'm guessing that quite a few sites that use html-pipeline won't need syntax highlighting.
Any interest in an executable that people can use to preview the output of an html-pipeline run easily? I just wrote one, but I could submit it as a pull: https://gist.github.com/indirect/5096633
$ echo "foo" | html-pipeline
<p>foo</p>
We should bump a release. I want to get the Digest deprecation taken care of in some projects upstream.
/cc @jch
What do you think about including this filter?
https://gist.github.com/r38y/7663375
If this filter is applied and they want straight quotes, they can escape them with ". It will also turn -- into – and --- into —.
Using something like <h1>日本語</h1>
results in an anchor with a blank name.
It seems too complicated to make a repository. What help can you give when the code to paste within the page's body doesn't click?
I'm not sure if this is a good idea or if this is actually the place to suggest it, but it'd be cool if you could put something like :cameronmcefee:
in any gfm field and have the person's avatar appear, probably linked to their profile and maybe tool-tipped with their name.
Am I correct in thinking this is used to parse the replies on GitHub? If so, what do you think about adding a way to strip the garbage from this:
I'm happy to do it but I wanted to make sure this filter was the correct place to do it.
I think the non-code solution is for that dude to delete the garbage from his email but that is sort of "you're holding it wrong".
Hi,
In my code I have:
context = {
asset_root: 'https://a248.e.akamai.net/assets.github.com/images/icons/',
link_attr: 'target="_blank"',
gfm: true
}
pipeline = HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::SanitizationFilter,
HTML::Pipeline::EmojiFilter,
HTML::Pipeline::AutolinkFilter
], context
pipeline.call(text)[:output].to_s
and
%p= raw format(answer.body)
to invoke it.
The link however doesn't add the attribute target="_blank"
Any idea?
Thanks,
Roy
When I pass this string...
"I can do this.\r\n:scream: Juice 3: Whoa, that's a LOT of cayenne!"
...to a pipeline containing EmojiFilter, it does not replace the emoji-cheat-sheet code with the Emoji as expected.
I tracked the problem down to here:
irb(main):204:0> doc.search('text()')
=> []
What does happen is that the DocumentFragment in doc contains one child Nokogiri::XML::Text node, and doc.text
contains the same text that html
contains. So....
Armed with that knowledge, I made the following changes:
def call
- doc.search('text()').each do |node|
+ nodes(doc).each do |node|
content = node.to_html
next if !content.include?(':')
next if has_ancestor?(node, %w(pre code))
html = emoji_image_filter(content)
next if html == content
node.replace(html)
end
doc
end
# Look for text nodes in the DocumentFragment
#
# If doc's text is the same as original string,
# just nab its children to get the proper nodes.
# Otherwise do a search for text nodes.
+ def nodes(doc)
+ doc.text == html ? doc.children : doc.search('text()')
+ end
... and that fixed it for me.
Anyone see any problems with that fix? If not, I'll work up a PR as soon as I can.
It semms that MentionFilter has to work with MarkdownFilter. But i dont want to give markdown support.
Considering that Github Markdown tends to lack documentation on how to configure it (that or Google is failing me,) and it does a lot of things that aren't necessarily nice for user content that you want to restrict (such as autolinking) it would be nice if the dependence on github-markdown was loosened so that people who wish to use redcarpet can.
Maybe wiki page like for Jekyll Plugins
Hi. I am using MentionFilter
, and my user lives in www.lvh.me:3000/~jch.
HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::SanitizationFilter,
HTML::Pipeline::MentionFilter
], context.merge(gfm: true, base_url: '/~')
If I specified base_url: '~'
or /~
, it gives me
www.lvh.me:3000/~/jch
instead of
www.lvh.me:3000/~jch
.
How to achieve behaviour as mentioned with MentionFilter
?
Currently I replace it by myself:
text.gsub!(/@([a-z0-9][a-z0-9-]*)/i) do |match|
%Q(<a href="/~#{$1}">#{match}</a>)
end
Thanks!
I think this repo is ready for 🚢ing. #6 extracted this project from .com, and removed GitHub specific references in the gem. Here's a list of remaining things I'd like to do before I share the ❤️ with the world:
Is there anything I'm missing?
hey @jch, can you grant me rights to push the gem to rubygems? Would be useful for getting releases out the door.
My account is rsanheim - email is [email protected]
ActiveSupport
v4.1.0 depends upon 'minitest', '~> 5.1'
.
This dependency breaks the html-pipeline
build. Here is a more detailed explanation.
#123 found a temporary solution by disallowing ActiveSupport
4.1.0. or greater. A more permanent solution must be found.
We were upgrading from 0.0.14 to 0.2.0, but got blocked by the gemspec requirement on activesupport 3 or earlier.
Bundler could not find compatible versions for gem "activesupport":
In Gemfile:
html-pipeline (~> 0.1.0) ruby depends on
activesupport (< 4, >= 2) ruby
rails (~> 4.0) ruby depends on
activesupport (4.0.0)
Chalk this up to RTFM, but with a simple filter like this
HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::SyntaxHighlightFilter
]
I kept getting the help
rails app to crash:
SystemExit in Help/articles#show
Showing /Users/garentorikian/github/help/app/views/help/articles/_article.html.erb where line #22 raised:
exit
Extracted source (around line #22):
Finally, after looking at the logs, I found: You need to install linguist before using the SyntaxHighlightFilter. See README.md for details
.
Not sure if this error can be raised in the browser itself, but it'd be nice. Also not sure if this'll be fixed by #28 anyway.
Let's add 'em.
cc @jch
CONTRIBUTING.md is a cool feature; we should add it to html-pipeline
! 😄
When a user submits a New Issue or sends a Pull Request, they are linked to the project's CONTRIBUTING.md.
Since CONTRIBUTING.md is linked from both places, we could split it into two pieces of documentation. At the top of the document, we could have navigation to both pieces. Here is a rough draft for review. Thoughts?
Please include:
nokogiri -v
How to run the tests:
bundle exec rake
Title says it all, can this work with Rouge instead of pygments? https://github.com/jneen/rouge
I prefer to stick with all Ruby solution :)
This bug was previously reported in markup repo
A section header in markdown is rendered as h3 > a[name]
To anchor an element in URL, the id
attribute must be used
The name
attribute is reserved for usage in form elements. Its availability as a id
is a inheritance from Netscape days and should not have been used here.
Alas, as I read in your README.md, « Note that the id attribute is not whitelisted. »
So how can I patch this ?
HTML 5 allows style tags inline in a page. style
should be added to the list of parents that are excluded, since otherwise @media
queries get turned into mentions. :(
During some testing this morning I started using the disable_asset_proxy
option. It seems when you pass that in the CamoFilter just returns nil, instead of the doc causing the rest of the filter chain to break.
We don't specify versions for external dependencies and raise runtime errors when a dependency is missing (#80). For example, HTML::Pipeline::AutolinkFilter
depends on rinku
:
begin
require "rinku"
rescue LoadError => _
abort "Missing dependency 'rinku' for AutolinkFilter. See README.md for details."
end
This approach is simple, but couples html-pipeline's versioning to the versions of it's external dependencies. For example, to update from gemoji ~> 1 to ~> 2, we would need to increase the major version for html-pipeline #159.
Here are a few ideas I came up with:
This requires the least changes. We would raise html-pipeline's major version whenever one of it's dependencies made breaking changes. There are 8 external dependencies for 8 filters. They are all pretty stable gems and unlikely to change frequently.
I experimented with this in the separate-gems branch. This is similar to how rails/rails is composed of separate gems (actionpack, actionmailer, activesupport), but all live in the same repository for an easy development workflow. The problem I ran into with this is bundler does not like having multiple projects within the same folder. If you poke around rails/rails, you can see they've added a good number of helper methods to Rakefile and their own set of conventions to bumping versions to make it work well. This feels a bit overkill to me, but maybe I'm missing something obvious.
We recommend 3rd party filters to be written this way. We could do the same thing with the existing filters and package them as their own separate gems in separate repositories. The trade off here is we'd have to jump between 9 projects (html-pipeline, and 8 filter gems). We could add a html-pipeline
organization to help with this, but it is more overhead and would make the project harder to discover, and harder to contribute to. This is also how the bkeepers/qu gem handles swapping different backend stores.
@simeonwillbanks @JuanitoFatas @rsanheim @bkeepers What do you think? Are there other factors I haven't covered? Another possible way?
In the github app, the emoji icons are frozen to public/images, and urls to images are coded relative to the value of :asset_root
. It'd be preferable to detect the availability of the asset-pipeline and use asset_path
when it's available.
The README has tons of information (usage, dependencies, examples, etc). However, new users would benefit from a Getting Started Guide; factory_girl's guide is a good example. The Getting Started Guide could detail common implementations such as integrating with Rails or Sinatra. Thoughts?
It'd be cool to retain the original history when extracting libraries like this. Would you guys mind if I push a branch with the full history from the github/github repo? We'd need to rebase everything that's happened here on top and force push unfortunately. Sorry, I would have chimed in here earlier but had no idea this was going on.
I might be missing some dependency, but the EmailReplyFilter
references an EmailReplyParser
constant which is not defined in the gem, at all :)
Can't remember if this is something that was there in github/github or maybe github/html-pipeline? But it should proooobably be here. Or maybe it's EmailReplyFilter that shouldn't be :P
Copying from this issue from github/markup
:
Currently, you can syntax-highlight code blocks. For example,
main :: IO ()
main = putStrLn "Hello, World!"
renders as
main :: IO ()
main = putStrLn "Hello, World!"
However, you cannot do the same with inline code such as
main :: IO ()
or
main :: IO ()
both of which get rendered as main :: IO ()
(without syntax highlighting) when used inline. It would be nice to have something like
haskell main :: IO ()
that gives you inline syntax-highlighting (right now, that would render as haskell main :: IO ()
).
As gjtorikian suggested on the other issue, this could conceivably be fixed by changing this line to match on code
tags, as well as pre
.
Using
pipeline = HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::SyntaxHighlightFilter
]
produces code
that has 10 spaces prepended to every line after the first, including an extra line with 10 spaces at the end.
This
```css
@media (max-width: 992px) {
#contact_email{ display: none; }
}
produces
@media (max-width: 992px) {
#contact_email{ display: none; }
}
// 10 spaces at end
When using the HTML::Pipeline::MarkdownFilter
on a string containing a "Right Double Quotation Mark" (U+201D) around an email address the output html will include an invalid byte sequence when trying to autolink it as a mailto:
I'm only having this issue on OSX. I'm running 10.10.2.
To reproduce:
renderer = HTML::Pipeline.new([HTML::Pipeline::MarkdownFilter]).freeze
renderer.to_html("This is an “[email protected]” example").split
This is really a bug within github-markdown, but I'm submitting it here as github-markdown doesn't seem to have a Github repository. I've also tried using Redcloth and it fails as well.
ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]
# Nokogiri (1.6.5)
---
warnings: []
nokogiri: 1.6.5
ruby:
version: 2.1.5
platform: x86_64-darwin14.0
description: ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]
engine: ruby
libxml:
binding: extension
source: packaged
libxml2_path: "/Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin14.1.0/libxml2/2.9.2"
libxslt_path: "/Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin14.1.0/libxslt/1.1.28"
libxml2_patches:
- 0001-Revert-Missing-initialization-for-the-catalog-module.patch
- 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch
libxslt_patches:
- 0001-Adding-doc-update-related-to-1.1.28.patch
- 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch
- 0003-Initialize-pseudo-random-number-generator-with-curre.patch
- 0004-EXSLT-function-str-replace-is-broken-as-is.patch
- 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch
- 0007-Separate-function-for-predicate-matching-in-patterns.patch
- 0008-Fix-direct-pattern-matching.patch
- 0009-Fix-certain-patterns-with-predicates.patch
- 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch
- 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch
- 0014-Fix-for-bug-436589.patch
- 0015-Fix-mkdir-for-mingw.patch
compiled: 2.9.2
loaded: 2.9.2
I'll be renaming this repository to html-pipeline
in 3 days. Giving a heads up to let people change any references.
Due to the fact that HTML::Pipeline is a class, not a module, there is risk that an add-on filter will prematurely define this class before it's extended in the core library, which causes the notorious "superclass mismatch" exception.
Here's an example of where this happens. While create a new gem for the BarFilter, we define a version file:
lib/html/pipeline/bar_filter/version.rb
module HTML
class Pipeline
class BarFilter
VERSION = '1.0.0'
end
end
end
If we load this at the top of a gemspec file, for instance, then if we attempt to load 'html/pipeline', it goes 💥.
Normally the way these things are defined (as far as I understand it), the top-level type in a gem is a module, not a class. One way to accomplish this without breaking the current API (much), is to define the class method new
on the module that instantiates the concrete class. Something like:
module HTML
module Pipeline
def self.new filters, default_context = {}, result_class = nil
Engine.new filters, default_context, result_class
end
class Engine
# relocate Pipeline class definition here
end
end
end
The other solution, which I used in html-pipeline-asciidoc_filter, is to put the filter class in a different module for the purpose of holding the VERSION constant.
module HTML_Pipeline
class BarFilter
VERSION = '1.0.0'
end
end
Either way, I think this is an important issue to address to minimize the challenges of creating an add-on filter.
There have been some necessary changes to the pipeline over in github/github...just want to create as a TODO to make sure they get merged onto here at some point.
Using the example is the README:
pipeline = HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::SyntaxHighlightFilter
]
result = pipeline.call input
result[:output].to_s
produces the requisite <span>
s with classes, but there are no styles / stylesheets to colorize the output.
Is there something I need to add to application.css
?
If the @mention
or :emoji:
is in a code block, do not transform it. This applies to the HTML, rather than to a Markdown, codeblock.
Related issues:
Thanks!
Pretty sure this is the only filter that still includes github refs:
https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/https_filter.rb
How about piggybacking on the :base_url
option instead?
Remove as many gem dependencies as possible because not everyone uses every single filter. The responsibility of checking for dependencies will be on the filter. This is similar to what faraday does for it's adapters. I don't want the current filters to be split up into a bunch of mini-gems (html-pipeline-emoji, html-pipeline-markdown) cause that's just dicing things too thin.
I would love it if rather then sending a generic error that means nothing to the user (in some cases) and could be confusing, html-pipeline should detect order issues if there is a clear process order or emoji
should convert the DocumentFragment
. What I mean is:
[
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::EmojiFilter
]
Works, but
[
HTML::Pipeline::EmojiFilter,
HTML::Pipeline::MarkdownFilter
]
Fails. However your lib sends people a broad message that doesn't even hint closely to what the problem might be, it only sends: https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/text_filter.rb#L7 which can confuse some users who are simply doing the most simple things like:
class HTMLPipeline < Filter
FILTERS =
[
HTML::Pipeline::EmojiFilter
HTML::Pipeline::MarkdownFilter,
]
def run(content, opts = {})
opts = { gfm: true, asset_root: "/assets/img" }.merge(opts)
HTML::Pipeline.new(FILTERS, opts).to_html(content)
end
end
This might be a problem with Emoji on Ruby 2.0.0-p0 though.
@mentions.)
gets left alone instead of turning into @mentions.
How do I keep finding these?
It'd be handy if you could also use SSH protocol links like [test server](ssh://[email protected])
. Is there any chance of adding that to the protocol whitelist in SanitizationFilter? I don't think there should be any security implications, but I may be missing something.
Implement an AsciiDoc filter based on Asciidoctor.
Adding this filter will allow AsciiDoc output to be syntax highlighted. The filter should invoke Asciidoctor using attributes that make the HTML produced reasonably consistent with the HTML generated from Markdown (notitle! idprefix idseparator=-
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.