Code Monkey home page Code Monkey logo

html-pipeline's Issues

Whitelist table sections (thead, tbody, tfoot)

Add the table section elements to the whitelist.

Table sections (thead, tbody, tfoot) are important table elements that control how a table gets rendered. If handled with the same restrictions as the table element (they can only contain tr, th and td elements), allowing them does not impose any security risk.

Question about github markdown filter (low priority!)

Hi there,

I have been trying to work out how to stop newline's being inserted into a (github flavour) markdown blockquote.

If I have a markdown file like this:

> this is a start of a quote
> this is a continuation of a quote

according to the docs, github markdown does not put a <br> tag in there.

I have been using your excellent pipeline in a small gem I created for using markdown with the excellent vimwiki plugin, and I keep getting <br> tags inside my generated html. I'm happy to create a test case if it'll help, but I'm wondering if you can tell me what (if any) other filters I should be using. Currently it just uses your sample ones:

pipeline = HTML::Pipeline.new [
  HTML::Pipeline::MarkdownFilter,
  HTML::Pipeline::SyntaxHighlightFilter
]

Any help most appreciated!

Passed content must be valid XML to be filtered

Right now HTML::Pipeline::MentionFilter.new "test @benbalter test" will return the input string, while filter = HTML::Pipeline::MentionFilter.new "<p>test @benbalter test</p>" will return the expected @mentioned string.

I believe this is due to the doc.search('text()') pattern. Would be awesome if html-pipeline could support arbitrary strings, as right now I believe the input must be HTML, or the first filter must be the markdown filter for the expected behavior to occur.

At the very least, documentation could help clear things up for new users.

Fix travis-ci build

The builds are failing because ActiveSupport 4.x requires Ruby 1.9:

Installing activesupport (4.0.0) 
Gem::InstallError: activesupport requires Ruby version >= 1.9.3.
An error occurred while installing activesupport (4.0.0), and Bundler cannot
continue.
Make sure that `gem install activesupport -v '4.0.0'` succeeds before bundling.

Need to add separate gemfiles for CI to fix this.

Place Dependency Management On Filters

#48 kickstarted discussion, and here is a plan for placing dependency management on Filters.

  1. Add dependency management tests
  2. Add dependency management to Filter with descriptive exception
    message
  3. Refactor Filters to use new dependency management logic
  4. For CI, move gem dependencies from gemspec to Gemfile :test block
  5. Add gem post install message alerting users to new dependency
    management
  6. Update README to detail each Filters dependencies e.g. FaradayMiddleware README

cut a 1.6.0 release

We should bump a release. I want to get the Digest deprecation taken care of in some projects upstream.

/cc @jch

Medico

It seems too complicated to make a repository. What help can you give when the code to paste within the page's body doesn't click?

Emoji syntax gravatars

I'm not sure if this is a good idea or if this is actually the place to suggest it, but it'd be cool if you could put something like :cameronmcefee: in any gfm field and have the person's avatar appear, probably linked to their profile and maybe tool-tipped with their name.

Tweaks to the email reply filter

Am I correct in thinking this is used to parse the replies on GitHub? If so, what do you think about adding a way to strip the garbage from this:

remove_redundant_data_tidy_up_the_code_indentation_and_add_a_new_menu_i _by_dylanbarwick__pull_request_125__bauerpubtwinit_20130909_113910

I'm happy to do it but I wanted to make sure this filter was the correct place to do it.

I think the non-code solution is for that dude to delete the garbage from his email but that is sort of "you're holding it wrong".

AutolinkFilter link_attr doesn't seem to work

Hi,
In my code I have:

context = {
      asset_root: 'https://a248.e.akamai.net/assets.github.com/images/icons/',
      link_attr: 'target="_blank"',
      gfm: true
    }

    pipeline = HTML::Pipeline.new [
      HTML::Pipeline::MarkdownFilter,
      HTML::Pipeline::SanitizationFilter,
      HTML::Pipeline::EmojiFilter,
      HTML::Pipeline::AutolinkFilter
    ], context

    pipeline.call(text)[:output].to_s

and
%p= raw format(answer.body) to invoke it.

The link however doesn't add the attribute target="_blank"
Any idea?

Thanks,
Roy

EmojiFilter doesn't work on strings that don't contain HTML

When I pass this string...

"I can do this.\r\n:scream: Juice 3: Whoa, that's a LOT of cayenne!"

...to a pipeline containing EmojiFilter, it does not replace the emoji-cheat-sheet code with the Emoji as expected.

I tracked the problem down to here:

irb(main):204:0> doc.search('text()')
=> []

What does happen is that the DocumentFragment in doc contains one child Nokogiri::XML::Text node, and doc.text contains the same text that html contains. So....

Armed with that knowledge, I made the following changes:

def call
- doc.search('text()').each do |node|
+ nodes(doc).each do |node|
    content = node.to_html
    next if !content.include?(':')
    next if has_ancestor?(node, %w(pre code))
    html = emoji_image_filter(content)
    next if html == content
    node.replace(html)
  end
  doc
end

# Look for text nodes in the DocumentFragment
# 
# If doc's text is the same as original string,
# just nab its children to get the proper nodes.
# Otherwise do a search for text nodes.
+ def nodes(doc)
+   doc.text == html ? doc.children : doc.search('text()')
+ end

... and that fixed it for me.

Anyone see any problems with that fix? If not, I'll work up a PR as soon as I can.

Loosen Markdown Dependency.

Considering that Github Markdown tends to lack documentation on how to configure it (that or Google is failing me,) and it does a lot of things that aren't necessarily nice for user content that you want to restrict (such as autolinking) it would be nice if the dependence on github-markdown was loosened so that people who wish to use redcarpet can.

MentionFilter base_url config question

Hi. I am using MentionFilter, and my user lives in www.lvh.me:3000/~jch.

HTML::Pipeline.new [
  HTML::Pipeline::MarkdownFilter,
  HTML::Pipeline::SanitizationFilter,
  HTML::Pipeline::MentionFilter
], context.merge(gfm: true, base_url: '/~')

If I specified base_url: '~' or /~, it gives me

www.lvh.me:3000/~/jch

instead of

www.lvh.me:3000/~jch.

How to achieve behaviour as mentioned with MentionFilter?

Currently I replace it by myself:

text.gsub!(/@([a-z0-9][a-z0-9-]*)/i) do |match|
  %Q(<a href="/~#{$1}">#{match}</a>)
end

Thanks!

Open source, transferring repo ownership

I think this repo is ready for 🚢ing. #6 extracted this project from .com, and removed GitHub specific references in the gem. Here's a list of remaining things I'd like to do before I share the ❤️ with the world:

  • update the readme
  • write a blog post with some examples
  • add travis
  • transfer ownership to jch (per @rtomayko, having a maintainer rather than putting it under the org)

Is there anything I'm missing?

Support for ActiveSupport 4

We were upgrading from 0.0.14 to 0.2.0, but got blocked by the gemspec requirement on activesupport 3 or earlier.

Bundler could not find compatible versions for gem "activesupport":
  In Gemfile:
    html-pipeline (~> 0.1.0) ruby depends on
      activesupport (< 4, >= 2) ruby

    rails (~> 4.0) ruby depends on
      activesupport (4.0.0)

Better error notification on missing linguist dependency?

Chalk this up to RTFM, but with a simple filter like this

HTML::Pipeline.new [
          HTML::Pipeline::MarkdownFilter,
          HTML::Pipeline::SyntaxHighlightFilter
        ]

I kept getting the help rails app to crash:

SystemExit in Help/articles#show

Showing /Users/garentorikian/github/help/app/views/help/articles/_article.html.erb where line #22 raised:

exit
Extracted source (around line #22):

Finally, after looking at the logs, I found: You need to install linguist before using the SyntaxHighlightFilter. See README.md for details.

Not sure if this error can be raised in the browser itself, but it'd be nice. Also not sure if this'll be fixed by #28 anyway.

Contributing Guidelines

CONTRIBUTING.md is a cool feature; we should add it to html-pipeline! 😄

When a user submits a New Issue or sends a Pull Request, they are linked to the project's CONTRIBUTING.md.

New Issue:
screen shot 2014-02-06 at 11 37 32 am

Pull Request:
screen shot 2014-02-06 at 11 41 24 am

Since CONTRIBUTING.md is linked from both places, we could split it into two pieces of documentation. At the top of the document, we could have navigation to both pieces. Here is a rough draft for review. Thoughts?


Submitting New Issue

Please include:

  1. Example code
  2. Result output
  3. nokogiri -v

Sending Pull Request

How to run the tests:

bundle exec rake

Camo Filter doesn't return doc when disabled

During some testing this morning I started using the disable_asset_proxy option. It seems when you pass that in the CamoFilter just returns nil, instead of the doc causing the rest of the filter chain to break.

Separate gems for versioning external dependencies

We don't specify versions for external dependencies and raise runtime errors when a dependency is missing (#80). For example, HTML::Pipeline::AutolinkFilter depends on rinku:

begin
  require "rinku"
rescue LoadError => _
  abort "Missing dependency 'rinku' for AutolinkFilter. See README.md for details."
end

This approach is simple, but couples html-pipeline's versioning to the versions of it's external dependencies. For example, to update from gemoji ~> 1 to ~> 2, we would need to increase the major version for html-pipeline #159.

Here are a few ideas I came up with:

Keep things the same

This requires the least changes. We would raise html-pipeline's major version whenever one of it's dependencies made breaking changes. There are 8 external dependencies for 8 filters. They are all pretty stable gems and unlikely to change frequently.

Separate gems, same repository

I experimented with this in the separate-gems branch. This is similar to how rails/rails is composed of separate gems (actionpack, actionmailer, activesupport), but all live in the same repository for an easy development workflow. The problem I ran into with this is bundler does not like having multiple projects within the same folder. If you poke around rails/rails, you can see they've added a good number of helper methods to Rakefile and their own set of conventions to bumping versions to make it work well. This feels a bit overkill to me, but maybe I'm missing something obvious.

Separate gems, separate repositories

We recommend 3rd party filters to be written this way. We could do the same thing with the existing filters and package them as their own separate gems in separate repositories. The trade off here is we'd have to jump between 9 projects (html-pipeline, and 8 filter gems). We could add a html-pipeline organization to help with this, but it is more overhead and would make the project harder to discover, and harder to contribute to. This is also how the bkeepers/qu gem handles swapping different backend stores.

@simeonwillbanks @JuanitoFatas @rsanheim @bkeepers What do you think? Are there other factors I haven't covered? Another possible way?

Detect asset pipeline availability

In the github app, the emoji icons are frozen to public/images, and urls to images are coded relative to the value of :asset_root. It'd be preferable to detect the availability of the asset-pipeline and use asset_path when it's available.

Getting Started Guide

The README has tons of information (usage, dependencies, examples, etc). However, new users would benefit from a Getting Started Guide; factory_girl's guide is a good example. The Getting Started Guide could detail common implementations such as integrating with Rails or Sinatra. Thoughts?

History

It'd be cool to retain the original history when extracting libraries like this. Would you guys mind if I push a branch with the full history from the github/github repo? We'd need to rebase everything that's happened here on top and force push unfortunately. Sorry, I would have chimed in here earlier but had no idea this was going on.

EmailReplyParser is undefined

I might be missing some dependency, but the EmailReplyFilter references an EmailReplyParser constant which is not defined in the gem, at all :)

Can't remember if this is something that was there in github/github or maybe github/html-pipeline? But it should proooobably be here. Or maybe it's EmailReplyFilter that shouldn't be :P

Enable syntax highlighting for inline code

Copying from this issue from github/markup:

Currently, you can syntax-highlight code blocks. For example,

main :: IO ()
main = putStrLn "Hello, World!"

renders as

main :: IO ()
main = putStrLn "Hello, World!"

However, you cannot do the same with inline code such as

main :: IO ()

or

main :: IO ()

both of which get rendered as main :: IO () (without syntax highlighting) when used inline. It would be nice to have something like

haskell main :: IO ()

that gives you inline syntax-highlighting (right now, that would render as haskell main :: IO ()).

As gjtorikian suggested on the other issue, this could conceivably be fixed by changing this line to match on code tags, as well as pre.

Spaces inserted into code

Using

    pipeline = HTML::Pipeline.new [
      HTML::Pipeline::MarkdownFilter,
      HTML::Pipeline::SyntaxHighlightFilter
    ]

produces code that has 10 spaces prepended to every line after the first, including an extra line with 10 spaces at the end.

This

```css
@media (max-width: 992px) {
    #contact_email{ display: none; }
}

produces

@media (max-width: 992px) {
              #contact_email{ display: none; }
          }
          // 10 spaces at end

OSX HTML::Pipeline::MarkdownFilter Fails on Right Double Quotation Mark around email address

When using the HTML::Pipeline::MarkdownFilter on a string containing a "Right Double Quotation Mark" (U+201D) around an email address the output html will include an invalid byte sequence when trying to autolink it as a mailto:

I'm only having this issue on OSX. I'm running 10.10.2.

To reproduce:

renderer = HTML::Pipeline.new([HTML::Pipeline::MarkdownFilter]).freeze
renderer.to_html("This is  an “[email protected]” example").split

This is really a bug within github-markdown, but I'm submitting it here as github-markdown doesn't seem to have a Github repository. I've also tried using Redcloth and it fails as well.

ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]
# Nokogiri (1.6.5)
    ---
    warnings: []
    nokogiri: 1.6.5
    ruby:
      version: 2.1.5
      platform: x86_64-darwin14.0
      description: ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin14.1.0/libxml2/2.9.2"
      libxslt_path: "/Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin14.1.0/libxslt/1.1.28"
      libxml2_patches:
      - 0001-Revert-Missing-initialization-for-the-catalog-module.patch
      - 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch
      libxslt_patches:
      - 0001-Adding-doc-update-related-to-1.1.28.patch
      - 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch
      - 0003-Initialize-pseudo-random-number-generator-with-curre.patch
      - 0004-EXSLT-function-str-replace-is-broken-as-is.patch
      - 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch
      - 0007-Separate-function-for-predicate-matching-in-patterns.patch
      - 0008-Fix-direct-pattern-matching.patch
      - 0009-Fix-certain-patterns-with-predicates.patch
      - 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch
      - 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch
      - 0014-Fix-for-bug-436589.patch
      - 0015-Fix-mkdir-for-mingw.patch
      compiled: 2.9.2
      loaded: 2.9.2

Potential class loading conflict with add-on filters

Due to the fact that HTML::Pipeline is a class, not a module, there is risk that an add-on filter will prematurely define this class before it's extended in the core library, which causes the notorious "superclass mismatch" exception.

Here's an example of where this happens. While create a new gem for the BarFilter, we define a version file:

lib/html/pipeline/bar_filter/version.rb

module HTML
  class Pipeline
    class BarFilter
      VERSION = '1.0.0'
    end
  end
end

If we load this at the top of a gemspec file, for instance, then if we attempt to load 'html/pipeline', it goes 💥.

Normally the way these things are defined (as far as I understand it), the top-level type in a gem is a module, not a class. One way to accomplish this without breaking the current API (much), is to define the class method new on the module that instantiates the concrete class. Something like:

module HTML
  module Pipeline
    def self.new filters, default_context = {}, result_class = nil
      Engine.new filters, default_context, result_class
    end

    class Engine
      # relocate Pipeline class definition here
    end
  end
end

The other solution, which I used in html-pipeline-asciidoc_filter, is to put the filter class in a different module for the purpose of holding the VERSION constant.

module HTML_Pipeline
class BarFilter
  VERSION = '1.0.0'
end
end

Either way, I think this is an important issue to address to minimize the challenges of creating an add-on filter.

No stylesheets for SyntaxHighlightFilter

Using the example is the README:

    pipeline = HTML::Pipeline.new [
      HTML::Pipeline::MarkdownFilter,
      HTML::Pipeline::SyntaxHighlightFilter
    ]
    result = pipeline.call input
    result[:output].to_s

produces the requisite <span>s with classes, but there are no styles / stylesheets to colorize the output.

Is there something I need to add to application.css?

Decrease number of dependencies

Remove as many gem dependencies as possible because not everyone uses every single filter. The responsibility of checking for dependencies will be on the filter. This is similar to what faraday does for it's adapters. I don't want the current filters to be split up into a bunch of mini-gems (html-pipeline-emoji, html-pipeline-markdown) cause that's just dicing things too thin.

Warn if "pipelines" are out of order.

I would love it if rather then sending a generic error that means nothing to the user (in some cases) and could be confusing, html-pipeline should detect order issues if there is a clear process order or emoji should convert the DocumentFragment. What I mean is:

[
  HTML::Pipeline::MarkdownFilter,
  HTML::Pipeline::EmojiFilter
]

Works, but

[
  HTML::Pipeline::EmojiFilter,
  HTML::Pipeline::MarkdownFilter
]

Fails. However your lib sends people a broad message that doesn't even hint closely to what the problem might be, it only sends: https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/text_filter.rb#L7 which can confuse some users who are simply doing the most simple things like:

class HTMLPipeline < Filter
  FILTERS =
    [
      HTML::Pipeline::EmojiFilter
      HTML::Pipeline::MarkdownFilter,
    ]

  def run(content, opts = {})
    opts = { gfm: true, asset_root: "/assets/img" }.merge(opts)
    HTML::Pipeline.new(FILTERS, opts).to_html(content)
  end
end

This might be a problem with Emoji on Ruby 2.0.0-p0 though.

Allow SSH protocol links

It'd be handy if you could also use SSH protocol links like [test server](ssh://[email protected]). Is there any chance of adding that to the protocol whitelist in SanitizationFilter? I don't think there should be any security implications, but I may be missing something.

Implement an AsciiDoc filter based on Asciidoctor

Implement an AsciiDoc filter based on Asciidoctor.

Adding this filter will allow AsciiDoc output to be syntax highlighted. The filter should invoke Asciidoctor using attributes that make the HTML produced reasonably consistent with the HTML generated from Markdown (notitle! idprefix idseparator=-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.