Code Monkey home page Code Monkey logo

word-wrap's Introduction

word-wrap


Maven Central
codecov

Java library for wrapping text.

Status: published to Maven Central

Maven site reports are here including javadoc.

Features

  • Designed for use with normally formatted English (spaces after commas, periods for example)
  • Can specify custom string width function (for example FontMetrics.stringWidth)
  • Treats special characters appropriately (don't wrap a comma to the next line for example)
  • Conserves leading whitespace on lines
  • Optionally insert hyphens
  • Easy to use and read builder

Motivation

@davidmoten needed to render text for display in a PDF using PDFBox but PDFBox didn't offer word wrapping. He searched for libraries to do it and found Apache commons-text and commons-lang WordUtils but it didn't conserve leading spaces on lines and didn't allow for a customizable string width function. With a bit of luck this library will help those who are on a similar search!

Getting started

Add this to your pom:

<dependency>
  <groupId>com.github.davidmoten</groupId>
  <artifactId>word-wrap</artifactId>
  <version>VERSION_HERE</version>
</dependency>

Usage

String text = "hi there how are you going?";
System.out.println(
  WordWrap.from(text).maxWidth(10).wrap());

Output:

hi there
how are
you going?

More options:

Reader in = ...
Writer out = ...
String newLine = "\r\n";
WordWrap
  .from(in)
  .maxWidth(4.5)
  .newLine("\r\n")
  .includeExtraWordChars("~")
  .excludeExtraWordChars("_")
  .insertHyphens(true)
  .breakWords(true)
  .stringWidth(s -> fontMetrics.stringWidth(s.toString()))
  .wrap(out);

Note that the WordWrap builder used above is quite flexible and allows you to take input from a Reader, InputStream, classpath resource, File, String and has similar options for output.

Breaking numbers

The default is to be able to break sequences of digits even if .breakWords(false) is set. If you don't want sequences of digits broken then set .extraWordChars("0123456789").

Rules

Rules are here, generated by WordWrapTests.java.

Rewrapping classic fiction

A good test is to rewrap novels downloaded from Gutenberg (copied from the html form). When you run mvn test the output of The Importance of Being Earnest, The Black Gang, and Treasure Island are rewrapped into the directory target with line lengths of 20, 20 and 80 respectively. Part of the testing process for this library in addition to lots of unit tests is inspecting the wrapped output from these novels.

From The Black Gang wrapped at 20 chars:

The wind howled
dismally round a
house standing by
itself almost on the
shores of Barking
Creek. It was the
grey dusk of an
early autumn day,
and the occasional
harsh cry of a sea-
gull rising
discordantly above
the wind alone broke
the silence of the
flat, desolate
waste.

The house seemed
deserted. Every
window was
shuttered; the
garden was uncared
for and a mass of
weeds; the gate
leading on to the
road, apparently
feeling the need of
a deficient top
hinge, propped
itself drunkenly on
what once had been a
flower-bed. A few
gloomy trees swaying
dismally in the wind
surrounded the house
and completed the
picture—one that
would have caused
even the least
imaginative of men
to draw his coat a
little tighter round
him, and feel
thankful that it was
not his fate to live
in such a place.

Performance

For version 0.1.2 JMH benchmarks were added and numerous performance improvements made that gave 10x throughput increase and reduced memory allocation rate to 1/30x.

Benchmark throughput for repeatedly wrapping the Treasure Island fragment at a word length of 80 chars is ~21MB/s on my i5 laptop.

To run benchmarks

mvn clean install -P benchmark

Build

Use maven:

maven clean install

word-wrap's People

Contributors

davidmoten avatar dependabot[bot] avatar mfg92 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.