hgmnz / truncate_html Goto Github PK

truncates html so you don't have to

Home Page: http://practiceovertheory.com

License: MIT License

Ruby 100.00%

truncate_html's Introduction

TruncateHtml

truncate_html cuts off a string of HTML and takes care of closing any lingering open tags. There are many ways to solve this. This library does not have any dependencies, and parses HTML using regular expressions.

It can be used with or without Rails.

Example

some_html = '<ul><li><a href="http://whatever">This is a link</a></li></ul>'
truncate_html(some_html, length: 15, omission: '...(continued)')
  => <ul><li><a href="http://whatever">This...(continued)</a></li></ul>

A few notes:

By default, it will truncate on word boundary. To truncate the HTML string strictly at the specified length, pass in the word_boundary: false option.
If the input HTML is nil, it will return an empty string.
The omission text's length does count toward the resulting string's length.
<script> tags will pass right through - they will not count toward the resulting string's length, or be truncated.
The default options are:

length: 100
omission: '...'
word_boundary: /\S/

You may also set global configuration options. For example, place the following on application boot, something like config/initializers/truncate_html.rb

TruncateHtml.configure do |config|
  config.length        = 50
  config.omission      = '...(continued)'
end

If you really want, you can even set a custom word boundary regexp. For example, to truncate at the end of the nearest sentence:

TruncateHtml.configure do |config|
  config.word_boundary = /\S[\.\?\!]/
end

You can also truncate the HTML at a specific point not based on length but content. To do that, place the :break_token in your source. This allows the truncation to be data driven, breaking after a leading paragraph or sentence. If the :break_token is in your content before the specified :length, :length will be ignored and the content truncated at :break_token. If the :break_token is in your content after the specified :length, :break_token will be ignored and the content truncated at :length.

TruncateHtml.configure do |config|
  config.break_token = '<!-- truncate -->'
end

Installation

The latest gem version for the Rails 2.x series is 0.3.2. To use truncate_html on a Rails 2 app, please install the 0.3.2 version:

gem install truncate_html -v 0.3.2

For Rails 3, use the latest truncate_html:

gem install truncate_html

Issues or Suggestions

Found an issue or have a suggestion? Please report it on Github's issue tracker.

Testing

bundle
rake

All green? Go hack.

Thanks to all the contributors!

truncate_html's People

Contributors

Stargazers

Watchers

Forkers

dolzenko ghazel kball hexgnu dapi onsails csquared baxter libryder courtenay tinygrasshopper parndt addbrick destinyd dougjohnston coneybeare jbirdjavi togaurav dmitryrck 55ideas hewo jessicabarbalho dmfrancisco ruby-fu-ninja gussan doabit halida chondm danielevans vanderhoorn t-k davidcollom abonec thiyagarajan kzaitsev rafaelpetry jipiboily alexgunslinger supportbee rorwebsite123 cimutech meshane derkobe olivierlacan elvinefendi medictrust bookwhen tboeckmann simmerz initforthe fero46 levoleague bookbub netguru maybefriday muhammadyana mbaev delxen ready-hacker-one heroku david-zw-liu vladislavilyuschits iq-scm thechartguys

truncate_html's Issues

can not truncate continuous words

I try to truncate this word "testtesttesttesttesttest", but it can't. This is my code:
<%= truncate_html("testtesttesttesttesttest", :length => 11, :omission => '..') %>

String is truncated even when shorter than length

If I try to truncate a string using an omission, the string may get truncated even when it is shorter than the length. This happens when the combined length of the string and the omission is greater than the :length option. For instance, if you add the following test to html_truncator_spec.rb ...

it 'does not truncate a string shorter than length' do
  truncate('some string', length: 12, omission: '...').should == 'some string'
end

... it will fail with the following error:

1) TruncateHtml::HtmlTruncator does not truncate a string shorter than length
   Failure/Error: truncate('some string', length: 12, omission: '...').should == 'some string'
     expected: "some string"
          got: "some..." (using ==)
   # ./spec/truncate_html/html_truncator_spec.rb:62:in `block (2 levels) in <top (required)>'

This happens even though "some string" is shorter than 12 characters. Is this expected behavior? It differs from the behavior of truncate from Active Support, where "some string".truncate(12, omission: '...') == "some string".

truncate_html removes space between words

Hello guys,
I am trying to use truncate_html on such strings,
<p>T h a n k s</p>
issue with this is , it removes the space added purposely.
so when i do truncate_html("<p>T h a n k s</p>", length: 15000)
it gives me output as =>
<p>Thanks</p>
Is there any configuration, so escape such kind of characters ?
Any help will be appreciated.
Thanks

Nomethod error in rails view

I have tried installing this gem with both bundle install and gem install and my rails app is unable to find the truncate_html method. I keep getting a nomethod error.

Can not truncate on a sentence boundary if the omission has a '.' character in it.

It looks like the omission string is applied before the word_boundry regex is applied, so if you have the default omission '...' or any other omission with a '.' character in it, the result is word boundry truncation instead of the regex param passed in.

:word_boundary can no longer be explicitly true

As I'm looking at the code, :word_boundary seems to have changed from a true/false to a regex expression. At least, that's the default configuration. Now we can no longer explicitly pass in true to :word_boundary, since it errors out with the statement: "NoMethodError: undefined method `source' for true:TrueClass". I'm using Refinery, which does that in its code.

Could you add a check to see if :word_boundary is true and then switch it to the default regex if it is?

Thanks,
Justin

truncate_html length does not include omission

Unlike the Rails truncate method, length passed to truncate_html does not include the omission text. So for example:

truncate("a b c", 4, "...")
#= > "a..."

truncate_html("a b c", :length => 4, :omission => "...")
#=> "a b ..."

At the very least, this should be documented somewhere.

How is length calculated?

Is length calculated based on the text outside of the html tags? Or does it include the HTML tags?

If it does include the HTML tags, then it would be very useful to be able to set the length of the text outside of HTML tags.

Thanks for an awesome gem!

truncate_html does not respect Unicode

Hi @hgmnz,

A client is running some content with Unicode characters (namely, an up arrow) through truncate_html and noticing that those characters are disappearing.

I've narrowed it down to the scan in TruncateHtml::HtmlString. However, that's a hell of a regex to read, so I was wondering if you wouldn't mind walking me through it.

You can paste this code into an .rb file and run it to see what I mean:

# encoding: utf-8
unicode_string = "Up Arrow (↑) points up."

# From TruncateHtml::HtmlString
# 
def regex
  /(?:<script.*>.*<\/script>)+|<\/?[^>]+>|[[[:alpha:]]\w\|`~!@#\$%^&*\(\)\-_\+=\[\]{}:;'",\.\/?]+|\s+|[[:punct:]]/
end

# scan normally respects unicode.
puts unicode_string.scan(/.*/).join

# but this regex does not.
puts unicode_string.scan(regex).join

The result at the command line is

Up Arrow (↑) points up.
Up Arrow () points up.

Thanks!

remove test suite warning

by adding a backslash. See https://github.com/hgimenez/truncate_html/commit/36c2edfa3acd40782600ac11e52a18ce30f19a1f#commitcomment-1246688

Incorrect Example in README

In the README an example regex for splitting on strings is config.word_boundary = /\S[\.\?\!]/.

This regex fails, however, when an HTML tag is at the end of the string. Testing on the string 'Here is a string.', I get:

Minitest::Assertion: --- expected
+++ actual
@@ -1 +1 @@
-"<strong>Here is a string.</strong>"
+"<strong>Here is a string."

The regex should read /\S([\.\?\!]|\z)/. This will also match against the end of the string.

Helper method to get the rest of given html

Sometimes there is a requirement to toggle the rest of given html text using Javascript(show more/show less functionality). In order to implement this without duplicating truncated html I need the rest of it.
I tried to get the rest of html as following:

truncated_content = truncate_html(item.content, length: 300, omission: '')
trunc_length = truncated_content.length
rest = item.content[trunc_length, item.content.length]

But when there are extra consecutive whitespace in the given html, above code won't produce correct output. This is because in TruncateHtml::HtmlString#html_tokens method consecutive whitespace are being reduced to single whitespace.

Therefore I propose to create new helper "slice_html" that returns both truncated html and
the rest as following:

truncated_content, rest = slice_html(item.content, length: 300, omission: '')

Cyrillic characters missing

All the Cyrillic characters from my truncated html are missing.

Sentence boundary destroys href.

I'm realizing as I write this that this could perhaps be fixed with the regex provided in the documentation for sentence boundaries, but as it stands the regex can cut off the resulting string mid href. As such, you end up with an open quotation that is not closed and many resulting formatting problems on the page. I got something like this as a result.

<p>Sample text with link to <a href=\"http://www.example.</a></p>

No issue

Invalid ticket, sorry.

Spaces disappearing

In this series, we’re interviewing NYCDA graduates to talk about their program...
gets truncated to
In this series, we’re interviewingNYCDA graduates to talk about their program...

Please update plugin on rubygems.org

Hi Harold,

I found you thanks to github search feature, as your gem on rubygems.org is pointing the source code to your old address at: http://github.com/hgimenez/truncate_html. I don't really know how they handle it but I think if you loose your access to rubygems, you could contact them and try to reclaim the ownership of that specific gem.

Here is the address to help you with it:
http://rubygems.org/gems/truncate_html

Thanks for your time

bad interaction with script tags

s = "This is bad <script type=text/javascript>document.write('lum dee dum');</script>"
truncate_html(s, :length => 20, :omission => "... <a href='foo'>read more</a>")

=>

"This is bad <script type=text/javascript>document.write('lum... <a href='foo'>read more</a></script>"

Multibyte bug

I figured out that texts with multibyte characters truncates wrongly.

NoMethodError: You have a nil object when you didn't expect it! The error occurred while evaluating nil.rstrip

>> truncate_html("blah", :length => 2, :omission => "longer")
NoMethodError: You have a nil object when you didn't expect it!
The error occurred while evaluating nil.rstrip
        from C:/Ruby/lib/ruby/gems/1.8/gems/truncate_html-0.2.1/lib/truncate_html/html_truncator.rb:30:in `truncate'
        from C:/Ruby/lib/ruby/gems/1.8/gems/truncate_html-0.2.1/lib/truncate_html/html_truncator.rb:17:in `each'
        from C:/Ruby/lib/ruby/gems/1.8/gems/truncate_html-0.2.1/lib/truncate_html/html_truncator.rb:17:in `truncate'
        from C:/Ruby/lib/ruby/gems/1.8/gems/truncate_html-0.2.1/lib/app/helpers/truncate_html_helper.rb:4:in `truncate_html'

invalid truncation with :word_boundary => false

it will truncate in the middle of a href="" string and create invalid html...

The error occurred while evaluating nil.word_boundary

You have a nil object when you didn't expect it!
The error occurred while evaluating nil.word_boundary

truncate_html(name, :length => 40, :omission => "...")

/usr/local/lib/ruby/gems/1.8/gems/truncate_html-0.3.1/lib/truncate_html/html_truncator.rb:11:in truncate' /usr/local/lib/ruby/gems/1.8/gems/truncate_html-0.3.1/lib/app/helpers/truncate_html_helper.rb:6:intruncate_html'

regression from truncate_html 0.2.2

Running on Rails 2.2.2

Example for using this without rails?

Could you include in the README how to use this independently of rails?

Missing "omission" text when truncating to length - 1 characters

The following test fails with truncate_html 0.5.4:

truncate_html("One two three", :length => 12)

should return

One two t...

but instead it returns

One two t

Changing the value of :word_boundary makes no difference.

Truncating chinese characters

This plugin is truncating chinese characters

Omission is missing when word_boundary or break_token options are used

The behavior I expect is that omission is used any time the text is truncated, regardless of how the end point of the truncation is determined.
#40 had a fix for the omission missing when break_token is used, but was closed with no discussion for some reason.

html comment tags being closed / doubled unexpectedly

haven't had the free time to dig into why this is happening yet, but wanted to note this here.

1.9.3-p194 :008 > helper.truncate_html('hello and goodbye', length: 15)
=> "hello and ..."

undefined method `error' for true:TrueClass on 0.9.2

When I upgraded the gem to 0.9.1 or 0.9.2 it threw an exception I had to go back to 0.5.5.

Support for nil strings

Right now this fails. I've added support:

http://github.com/bcardarella/truncate_html/commit/77119b6de3eb6d096f23ec7dac14159b0ff49fe0

truncate_html removes newline escape sequences

It's common to see HTML code like this:

<p>Hello
World</p>

Which, for example, when scrapped, results in the following Ruby string:

"<p>Hello\nWorld</p>"

If we render this code in a view, the browser will display it as Hello World.
However, truncate_html strips the \n, so the string will be rendered incorrectly as HelloWorld.

truncate_html helper inside mailer views with rails 3

I'm using helpers inside mailer views and this it works fine:

class UserMailer < Devise::Mailer

  helper :home
  helper :post
  .
  .
  .
end

However, I need to use truncate_html gem inside mailer views

How can I add this helper to mailer views?

Thanks

Is there anyway that we call truncate_html inside controller

Hi,

We are having some ajax call to controller, and controller return json to front-end.

Is there anyway that we call truncate_html inside controller

Thanks

New lines should be interpreted.

I know there was this issue: #43

But should it not just interpret the new line?

Take my website for example: http://www.dchapman.io/ (here the code examples look wonky because the new line chars are getting replaced with spaces).

Here's what the post should look like: http://www.dchapman.io/posts/changing-column-type-in-postgresql-rails

Here's what the index looks like where truncate_html is being used: https://github.com/dchapman1988/dchapman.io/blob/master/app/views/posts/index.html.slim#L10

https://rubygems.org/gems/truncate_html link to homepage broken

Option to exclude special tags from character count

It will be awesome to be able to exclude special tags ( html or not ) from the character count. Example

truncate_html("Please take care of truncating this <script type='text/javascript'>function call_me_function() { console.debug('whatsup'); call_me_function();} </script> big and long portion of text", :length=>40)

instead of outputting:

Please take care of truncating this <script type='text/javascript'>function call_me_function() { console.debug('whatsup'); call_me_function();} </script>...

Will do

Please take care of trunkating this <script type='text/javascript'>function call_me_function() { console.debug('whatsup'); call_me_function();} </script> big and long portion of text

because I set in some truncate_html settings that tags like <script> should not be counted.

This could apply to other tags that are already invisible to the users:

<script>, , etc.

Feature Request: Allow breaking at XXX tag if found

It is common in blogs to place a page break in the content. Maybe it is after the first paragraph, maybe it is after the first line, but the point is it could be anywhere.

I suggest a feature where you could look for a configurable tag, maybe <break /> that if found, treat the character count as 0 and truncate the html, cleaning up just as you do if the character limit was reached.

It would probably be an alteration to this if block looking something like

    if @chars_remaining <= 0 || token == TruncateHtml.configuration.break_token
       close_open_tags
       break
    else
      process_token(token)
    end

This allows the gem to be data-driven for the people who want it as well as truncated at a pre-set value for existing users of the gem

truncate_html removes non-breaking spaces

e.g. helper.truncate_html('non breaking spaces') will return 'nonbreakingspaces'

Feat request: a means to know if the string was actually truncated

This might be a very simple thing, or it might be possible already, but I would like to have an easy way to know if a given string was actually truncated by truncate_html.