Code Monkey home page Code Monkey logo

copy-down's Introduction

Copy Down

Convert HTML into Markdown with Java.

Installation

Gradle:

dependencies {
    compile 'io.github.furstenheim:copy_down:1.1'
}

Maven:

<dependencies>
    <dependency>
        <groupId>io.github.furstenheim</groupId>
        <artifactId>copy_down</artifactId>
        <version>1.1</version>
    </dependency>
</dependencies>

JSoup Compatibility

This library has a strong reliance on JSoup. Using a different version of it will lead to unexpected behaviours. Sadly, Java does not allow several versions of a library (unlike Node.js) so if your project is already using JSoup that version will have priority.

Supported versions are:

This Library Jsoup
1.0 1.13
1.1 1.15

Usage

import io.github.furstenheim.CopyDown;
public class Main {
    public static void main (String[] args) {
        CopyDown converter = new CopyDown();
        String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
        String markdown = converter.convert(myHtml);
        System.out.println(markdown);
        // Some title\n==========\n\nSome html\n\nAnother paragraph\n
    }
}

Options

It is possible to use options for converting markdown:

import io.github.furstenheim.CopyDown;
import io.github.furstenheim.Options;
import io.github.furstenheim.OptionsBuilder;

public class Main {
   public static void main (String[] args) {
       OptionsBuilder optionsBuilder = OptionsBuilder.anOptions();
       Options options = optionsBuilder
               .withBr("-")
               // more options
               .build();
       CopyDown converter = new CopyDown(options);
       String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
       String markdown = converter.convert(myHtml);
       System.out.println(markdown);
   }
}
Option Valid values Default
headingStyle SETEXT or ATX SETEXT
hr Any Thematic break * * *
bulletListMarker -, +, or * *
codeBlockStyle INDENTED or FENCED INDENTED
fence ``` or ~~~ ```
emDelimiter _ or * _
strongDelimiter ** or __ **
linkStyle INLINED or REFERENCED INLINED
linkReferenceStyle FULL, COLLAPSED, or SHORTCUT FULL

Acknowledgment

This library is a port to Java of the wonderful library Turndown.js. This library passes the same test suite as the original library to ensure same behavior.

copy-down's People

Contributors

furstenheim avatar semkeijsper avatar tijder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

copy-down's Issues

java.lang.StringIndexOutOfBoundsException: String index out of range

the "value.charAt(0)" in io.github.furstenheim.WhitespaceCollapser will throw this error:

java.lang.StringIndexOutOfBoundsException: String index out of range: 0

But my html code is not too long.

My html:

<div id="content_views" class="htmledit_views"> 
 <p>如果您对文章有提议, 建议或者任何想表达的, 欢迎在下方评论区留言! 不断交流才是进步的捷径!</p> 
 <p>&nbsp;</p> 
 <p>仅四篇文章, 手把手教您制作一个自己的桌面邮件客户端</p> 
 <p>&nbsp;</p> 
 <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;众所周知, <strong>Windows</strong>的桌面程序大多数都是<strong>C#</strong>, .<strong>Net</strong>等等语言. 但当我们Java程序员也想做一个桌面应用, 却不想花较多时间在其他语言上, 怎么办呢?</p> 
 <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; <strong>Java</strong>这种语言本身就十分强大, 只要安装了环境, 一次编码, 到处使用. Java也可以写出相当完整, 好看的桌面应用. (最爽的是, 就算换个操作系统, 也不影响使用!)</p> 
 <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 本系列(<strong>我的第一个JAVA桌面应用 - 邮件客户端</strong>) 将会提供一个完整的Java桌面应用编写的教学, 如果完整的学会了这个, 做其他桌面应用都不成问题, 您完全可以发挥自己的想象力, 做出任何您想完成, 实现的功能, 甚至可能给您带来无限商机.</p> 
 <p>&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</p> 
 <p><strong>所需环境/软件:</strong></p> 
 <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; <u>Java编译器: IDEA(推荐)或Eclipse/MyEclipse</u></p> 
 <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; <u>JDK1.8以上</u></p> 
 <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; <u>JavaFX SceneBuilder 软件</u> <a href="http://www.oracle.com/technetwork/java/javase/downloads/javafxscenebuilder-1x-archive-2199384.html">官方下载地址</a>&nbsp; &nbsp;<a href="https://pan.baidu.com/s/1HNr6Ts7fDVrw7dDpX6l7bA">百度云</a>&nbsp;密码ym93</p> 
 <p>&nbsp;</p> 
 <blockquote> 
  <p>&nbsp; &nbsp; <strong>&nbsp; &nbsp; GIT源码在最后一章哦</strong></p> 
 </blockquote> 
 <p>&nbsp;</p> 
 <p>&nbsp;&nbsp;<strong>&nbsp;&nbsp;&nbsp; &nbsp; 注: 本系列教程是作者在空余时间编写, 难免有疏漏之处, 如果发现了其中的问题, 欢迎指出, 十分乐意并虚心的接受您的任何与代码相关的意见和建议!</strong></p> 
 <p><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(Java本身并非最适合作为桌面应用开发的语言, 本系列教程只用于学习交流之用!)</strong></p> 
 <p>&nbsp;</p> 
 <p>下一章将会用不到十分钟的时间, 教会您制作自己的第一个窗口!</p> 
 <p><strong><a href="https://blog.csdn.net/yongshiaoteman/article/details/80983071">下一章--&gt;</a></strong></p> 
 <p>&nbsp;</p> 
 <p>第一个java桌面应用, javafx, 邮件客户端, java开发邮件客户端</p> 
</div>

Table convertion

IMO should be added the capability to convert HTML tables to markdown tables.

Expose addRule?

Hi, just wondering why addRule is made private? It's available in Turndown as part of the API, and I have a use case where I need to add some custom rules. If you don't have any strong opinions on making it public, I can raise a PR to expose it.

UL and OL list conversions fail with AdoptOpen JDK

$ java -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
Eclipse OpenJ9 VM (build openj9-0.18.1, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20200122_511 (JIT enabled, AOT enabled)
OpenJ9 - 51a5857d2
OMR - 7a1b0239a
JCL - 8cf8a30581 based on jdk8u242-b08)

<dependency>
  <groupId>io.github.furstenheim</groupId>
  <artifactId>copy_down</artifactId>
  <version>1.0</version>
</dependency>

String to convert:

<ol>
  <li>Coffee</li>
  <li>Tea</li>
  <li>Milk</li>
</ol>

Result:

Coffee

Tea

Milk

Problem with nested <pre> and <code> tags

I am using this library to convert Stack Overflow questions to markdown (to display them in Discord).
For code blocks Stack Overflow uses following html structure:

<pre><code>
Source code (raw; no <br> tags)
</code></pre>

Unfortunately, converting this to markdown results in the code not being shown properly. I tried replacing <pre><code> with <code>, <code><pre> etc. but the formatting isn't right. I will try to fix this and get back if I can get it to work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.