Code Monkey home page Code Monkey logo

htmlcxx's Introduction

htmlcxx - html and css APIs for C++

---------------------------------------------


	Description
	===========

htmlcxx is a simple non-validating css1 and html parser for C++.
Although there are several other html parsers available, htmlcxx has some
characteristics that make it unique:

- STL like navigation of DOM tree, using excelent's tree.hh library from
  Kasper Peeters 
- It is possible to reproduce exactly, character by character, the
  original document from the parse tree
- Bundled css parser
- Optional parsing of attributes	
- C++ code that looks like C++ (not so true anymore)
- Offsets of tags/elements in the original document are stored in the
  nodes of the DOM tree

The parsing politics of htmlcxx were created trying to mimic mozilla
firefox (http://www.mozilla.org) behavior. So you should expect parse
trees similar to those create by firefox. However, differently from firefox,
htmlcxx does not insert non-existent stuff in your html. Therefore, serializing
the DOM tree gives exactly the same bytes contained in the original HTML
document.


        News for version 0.85
        =====================

Fixed gcc 4.3 compiler errors, several minor bug fixes, improved distribution
of the css library.


	News for version 0.7.3
	======================

Added utility code to escape/decode urls as defined by RFC 2396.
Added new SAX interface. The API was slightly broken to support the new
SAX interface :-(.
Added Visual Studio 2003 projects for the WIN32 port.


	Examples
	========

Using htmlcxx is quite simple. Take a look
at this example.

-----------------------------------------------------------------------

  #include <htmlcxx/html/ParserDom.h>
  ...
  using namespace std;
  using namespace htmlcxx;
  
  //Parse some html code
  string html = "<html><body>hey</body></html>";
  HTML::ParserDom parser;
  tree<HTML::Node> dom = parser.parseTree(html);
  
  //Print whole DOM tree
  cout << dom << endl;
  
  //Dump all links in the tree
  tree<HTML::Node>::iterator it = dom.begin();
  tree<HTML::Node>::iterator end = dom.end();
  for (; it != end; ++it)
  {
     if (strcasecmp(it->tagName().c_str(), "A") == 0)
     {
       it->parseAttributes();
       cout << it->attribute("href").second << endl;
     }
  }
  
  //Dump all text of the document
  it = dom.begin();
  end = dom.end();
  for (; it != end; ++it)
  {
    if ((!it->isTag()) && (!it->isComment()))
    {
      cout << it->text();
    }
  }
  cout << endl;

-------------------------------------------------


	The htmlcxx application
	=======================

htmlcxx is the name of both the library and the utility
application that comes with this package. Although the 
htmlcxx (the application) is mostly useless for programming, you can use it 
to easily see how htmlcxx (the library) would parse your html code.
Just install and try htmlcxx -h.


	Downloads
	=========

Use the project page at sourceforge: http://sf.net/projects/htmlcxx


	License Stuff
	=============

Code is now under the LGPL. This was our initial intention, and is
now possible thanks to the author of tree.hh, who allowed us to use it
under LGPL only for HTML::Node template instances. Check 
http://www.fsf.org or the COPYING file in the distribution for details
about the LGPL license. The uri parsing code is a derivative work of
Apache web server uri parsing routines. Check 
www.apache.org/licenses/LICENSE-2.0 or the ASF-2.0 file in the
distribution for details.

----------------------------------------

Enjoy!

Davi de Castro Reis - <davi (a) users sf net>

Robson Braga Araújo - <braga (a) users sf net>

Last Updated: Thu Mar 24 00:56:05 2005





htmlcxx's People

Contributors

tex avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.