Code Monkey home page Code Monkey logo

truncate_html's Introduction

TruncateHtml

Build Status Code Climate

truncate_html cuts off a string of HTML and takes care of closing any lingering open tags. There are many ways to solve this. This library does not have any dependencies, and parses HTML using regular expressions.

It can be used with or without Rails.

Example

some_html = '<ul><li><a href="http://whatever">This is a link</a></li></ul>'
truncate_html(some_html, length: 15, omission: '...(continued)')
  => <ul><li><a href="http://whatever">This...(continued)</a></li></ul>

A few notes:

  • By default, it will truncate on word boundary. To truncate the HTML string strictly at the specified length, pass in the word_boundary: false option.

  • If the input HTML is nil, it will return an empty string.

  • The omission text's length does count toward the resulting string's length.

  • <script> tags will pass right through - they will not count toward the resulting string's length, or be truncated.

  • The default options are:

length: 100
omission: '...'
word_boundary: /\S/

You may also set global configuration options. For example, place the following on application boot, something like config/initializers/truncate_html.rb

TruncateHtml.configure do |config|
  config.length        = 50
  config.omission      = '...(continued)'
end

If you really want, you can even set a custom word boundary regexp. For example, to truncate at the end of the nearest sentence:

TruncateHtml.configure do |config|
  config.word_boundary = /\S[\.\?\!]/
end

You can also truncate the HTML at a specific point not based on length but content. To do that, place the :break_token in your source. This allows the truncation to be data driven, breaking after a leading paragraph or sentence. If the :break_token is in your content before the specified :length, :length will be ignored and the content truncated at :break_token. If the :break_token is in your content after the specified :length, :break_token will be ignored and the content truncated at :length.

TruncateHtml.configure do |config|
  config.break_token = '<!-- truncate -->'
end

Installation

The latest gem version for the Rails 2.x series is 0.3.2. To use truncate_html on a Rails 2 app, please install the 0.3.2 version:

gem install truncate_html -v 0.3.2

For Rails 3, use the latest truncate_html:

gem install truncate_html

Issues or Suggestions

Found an issue or have a suggestion? Please report it on Github's issue tracker.

Testing

bundle
rake

All green? Go hack.

Copyright (c) 2009 - 2010 Harold A. Giménez, released under the MIT license

Thanks to all the contributors!

truncate_html's People

Contributors

agis avatar arturdryomov avatar bcardarella avatar coneybeare avatar csquared avatar danielevans avatar dmfrancisco avatar dmitry avatar dougjohnston avatar ghazel avatar halida avatar hewo avatar hgmnz avatar jbirdjavi avatar olivierlacan avatar parndt avatar tinygrasshopper avatar torbjon avatar vanderhoorn avatar zapnap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

truncate_html's Issues

can not truncate continuous words

I try to truncate this word "testtesttesttesttesttest", but it can't. This is my code:
<%= truncate_html("testtesttesttesttesttest", :length => 11, :omission => '..') %>

String is truncated even when shorter than length

If I try to truncate a string using an omission, the string may get truncated even when it is shorter than the length. This happens when the combined length of the string and the omission is greater than the :length option. For instance, if you add the following test to html_truncator_spec.rb ...

it 'does not truncate a string shorter than length' do
  truncate('some string', length: 12, omission: '...').should == 'some string'
end

... it will fail with the following error:

1) TruncateHtml::HtmlTruncator does not truncate a string shorter than length
   Failure/Error: truncate('some string', length: 12, omission: '...').should == 'some string'
     expected: "some string"
          got: "some..." (using ==)
   # ./spec/truncate_html/html_truncator_spec.rb:62:in `block (2 levels) in <top (required)>'

This happens even though "some string" is shorter than 12 characters. Is this expected behavior? It differs from the behavior of truncate from Active Support, where "some string".truncate(12, omission: '...') == "some string".

truncate_html removes space between words

Hello guys,
I am trying to use truncate_html on such strings,
<p>T h a n k s</p>
issue with this is , it removes the space added purposely.
so when i do truncate_html("<p>T h a n k s</p>", length: 15000)
it gives me output as =>
<p>Thanks</p>
Is there any configuration, so escape such kind of characters ?
Any help will be appreciated.
Thanks

Nomethod error in rails view

I have tried installing this gem with both bundle install and gem install and my rails app is unable to find the truncate_html method. I keep getting a nomethod error.

:word_boundary can no longer be explicitly true

As I'm looking at the code, :word_boundary seems to have changed from a true/false to a regex expression. At least, that's the default configuration. Now we can no longer explicitly pass in true to :word_boundary, since it errors out with the statement: "NoMethodError: undefined method `source' for true:TrueClass". I'm using Refinery, which does that in its code.

Could you add a check to see if :word_boundary is true and then switch it to the default regex if it is?

Thanks,
Justin

truncate_html length does not include omission

Unlike the Rails truncate method, length passed to truncate_html does not include the omission text. So for example:

truncate("a b c", 4, "...")
#= > "a..."

truncate_html("a b c", :length => 4, :omission => "...")
#=> "a b ..."

At the very least, this should be documented somewhere.

How is length calculated?

Is length calculated based on the text outside of the html tags? Or does it include the HTML tags?

If it does include the HTML tags, then it would be very useful to be able to set the length of the text outside of HTML tags.

Thanks for an awesome gem!

truncate_html does not respect Unicode

Hi @hgmnz,

A client is running some content with Unicode characters (namely, an up arrow) through truncate_html and noticing that those characters are disappearing.

I've narrowed it down to the scan in TruncateHtml::HtmlString. However, that's a hell of a regex to read, so I was wondering if you wouldn't mind walking me through it.

You can paste this code into an .rb file and run it to see what I mean:

# encoding: utf-8
unicode_string = "Up Arrow (↑) points up."

# From TruncateHtml::HtmlString
# 
def regex
  /(?:<script.*>.*<\/script>)+|<\/?[^>]+>|[[[:alpha:]]\w\|`~!@#\$%^&*\(\)\-_\+=\[\]{}:;'",\.\/?]+|\s+|[[:punct:]]/
end

# scan normally respects unicode.
puts unicode_string.scan(/.*/).join

# but this regex does not.
puts unicode_string.scan(regex).join

The result at the command line is

Up Arrow (↑) points up.
Up Arrow () points up.

Thanks!

Incorrect Example in README

In the README an example regex for splitting on strings is config.word_boundary = /\S[\.\?\!]/.

This regex fails, however, when an HTML tag is at the end of the string. Testing on the string 'Here is a string.', I get:

Minitest::Assertion: --- expected
+++ actual
@@ -1 +1 @@
-"<strong>Here is a string.</strong>"
+"<strong>Here is a string."

The regex should read /\S([\.\?\!]|\z)/. This will also match against the end of the string.

Helper method to get the rest of given html

Sometimes there is a requirement to toggle the rest of given html text using Javascript(show more/show less functionality). In order to implement this without duplicating truncated html I need the rest of it.
I tried to get the rest of html as following:

truncated_content = truncate_html(item.content, length: 300, omission: '')
trunc_length = truncated_content.length
rest = item.content[trunc_length, item.content.length]

But when there are extra consecutive whitespace in the given html, above code won't produce correct output. This is because in TruncateHtml::HtmlString#html_tokens method consecutive whitespace are being reduced to single whitespace.

Therefore I propose to create new helper "slice_html" that returns both truncated html and
the rest as following:

truncated_content, rest = slice_html(item.content, length: 300, omission: '')

Sentence boundary destroys href.

I'm realizing as I write this that this could perhaps be fixed with the regex provided in the documentation for sentence boundaries, but as it stands the regex can cut off the resulting string mid href. As such, you end up with an open quotation that is not closed and many resulting formatting problems on the page. I got something like this as a result.

<p>Sample text with link to <a href=\"http://www.example.</a></p>

Spaces disappearing

In this series, we’re interviewing NYCDA graduates to talk about their program...
gets truncated to
In this series, we’re interviewingNYCDA graduates to talk about their program...

bad interaction with script tags

s = "This is bad <script type=text/javascript>document.write('lum dee dum');</script>"
truncate_html(s, :length => 20, :omission => "... <a href='foo'>read more</a>")

=>

"This is bad <script type=text/javascript>document.write('lum... <a href='foo'>read more</a></script>"

Multibyte bug

I figured out that texts with multibyte characters truncates wrongly.

NoMethodError: You have a nil object when you didn't expect it! The error occurred while evaluating nil.rstrip

>> truncate_html("blah", :length => 2, :omission => "longer")
NoMethodError: You have a nil object when you didn't expect it!
The error occurred while evaluating nil.rstrip
        from C:/Ruby/lib/ruby/gems/1.8/gems/truncate_html-0.2.1/lib/truncate_html/html_truncator.rb:30:in `truncate'
        from C:/Ruby/lib/ruby/gems/1.8/gems/truncate_html-0.2.1/lib/truncate_html/html_truncator.rb:17:in `each'
        from C:/Ruby/lib/ruby/gems/1.8/gems/truncate_html-0.2.1/lib/truncate_html/html_truncator.rb:17:in `truncate'
        from C:/Ruby/lib/ruby/gems/1.8/gems/truncate_html-0.2.1/lib/app/helpers/truncate_html_helper.rb:4:in `truncate_html'

The error occurred while evaluating nil.word_boundary

You have a nil object when you didn't expect it!
The error occurred while evaluating nil.word_boundary

truncate_html(name, :length => 40, :omission => "...")

/usr/local/lib/ruby/gems/1.8/gems/truncate_html-0.3.1/lib/truncate_html/html_truncator.rb:11:in truncate' /usr/local/lib/ruby/gems/1.8/gems/truncate_html-0.3.1/lib/app/helpers/truncate_html_helper.rb:6:intruncate_html'

regression from truncate_html 0.2.2

Running on Rails 2.2.2

truncate_html removes newline escape sequences

It's common to see HTML code like this:

<p>Hello
World</p>

Which, for example, when scrapped, results in the following Ruby string:

"<p>Hello\nWorld</p>"

If we render this code in a view, the browser will display it as Hello World.
However, truncate_html strips the \n, so the string will be rendered incorrectly as HelloWorld.

New lines should be interpreted.

I know there was this issue: #43

But should it not just interpret the new line?

Take my website for example: http://www.dchapman.io/ (here the code examples look wonky because the new line chars are getting replaced with spaces).

Here's what the post should look like: http://www.dchapman.io/posts/changing-column-type-in-postgresql-rails

Here's what the index looks like where truncate_html is being used: https://github.com/dchapman1988/dchapman.io/blob/master/app/views/posts/index.html.slim#L10

Option to exclude special tags from character count

It will be awesome to be able to exclude special tags ( html or not ) from the character count. Example

truncate_html("Please take care of truncating this <script type='text/javascript'>function call_me_function() { console.debug('whatsup'); call_me_function();} </script> big and long portion of text", :length=>40)

instead of outputting:

Please take care of truncating this <script type='text/javascript'>function call_me_function() { console.debug('whatsup'); call_me_function();} </script>...

Will do

Please take care of trunkating this <script type='text/javascript'>function call_me_function() { console.debug('whatsup'); call_me_function();} </script> big and long portion of text

because I set in some truncate_html settings that tags like <script> should not be counted.

This could apply to other tags that are already invisible to the users:

<script>, , etc.

Feature Request: Allow breaking at XXX tag if found

It is common in blogs to place a page break in the content. Maybe it is after the first paragraph, maybe it is after the first line, but the point is it could be anywhere.

I suggest a feature where you could look for a configurable tag, maybe <break /> that if found, treat the character count as 0 and truncate the html, cleaning up just as you do if the character limit was reached.

It would probably be an alteration to this if block looking something like

    if @chars_remaining <= 0 || token == TruncateHtml.configuration.break_token
       close_open_tags
       break
    else
      process_token(token)
    end

This allows the gem to be data-driven for the people who want it as well as truncated at a pre-set value for existing users of the gem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.