Code Monkey home page Code Monkey logo

rexpat's Introduction

Build Status

Rexpat: a libexpat compatible Rust crate

This project is a work-in-progress conversion from unsafe Rust transpiled directly from libexpat into safe, idiomatic Rust code. The initial transpilation and refactoring was done using C2Rust.

Do not use this in production (yet!), but help refactoring and rewriting is always welcome.

Building

Requirements: Linux host with rustup installed. To run tests, you'll also need to install the requirements of libexpat (autoconf 2.58 or newer, make, and a recent C toolchain).

$ git clone --recurse-submodules https://github.com/immunant/rexpat
$ cd rexpat && cargo build

Testing

Unit testing:

$ cargo test

Download the W2C XML test suite to /tmp/libexpat/xml-test-suite and run

$ ./test_w2c.sh 

To perform additional testing.

Benchmarking

NOTE: Requires that you build $REXPAT_ROOT/upstream/expat/tests/benchmark/benchmark first. See steps and requirements here. You must also have python3 (Python 3.6 or later) in your path.

$ ./src/tests/bench_c_vs_rust.py

Goals

  • Provide an ABI-compatible drop-in replacement for libexpat
  • Avoid memory-corruption vulnerabilities via Rust's safety guarantees.
  • Perform on par with the libexpat in time and space.

License

rexpat is free software licensed similarly to libexpat. You may copy, distribute, and modify it under the terms of the License contained in the file COPYING distributed with this package. This license is the same as the MIT/X Consortium license.

rexpat's People

Contributors

ahomescu avatar rinon avatar thedan64 avatar thedataking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rexpat's Issues

W3C test script does not work on macOS

Tests fail with output like this:

ibm01v01.xml: not a regular file
ibm02v01.xml: not a regular file
ibm03v01.xml: not a regular file
ibm09v01.xml: not a regular file
ibm09v02.xml: not a regular file
ibm09v03.xml: not a regular file
ibm09v04.xml: not a regular file
ibm09v05.xml: not a regular file
ibm10v01.xml: not a regular file
ibm10v02.xml: not a regular file

The files do exist though. The first file is located at (matching the Linux path):

/tmp/libexpat/xml-test-suite-2013/xmlconf/ibm/valid/P01/ibm01v01.xml

So maybe there is a problem with the file status checking logic.

Add cargo test support

runtests.rs could be a cargo test compatible file instead of a standalone binary. We could then just run cargo test. Also make cargo bench do something useful.

Use Rust Enums

C2Rust translates C enums to a series of consts. We need to Oxidize these.

  • ByteType enum in xmltok_impl.h
  • XML_Status in expat.h
  • XML_Error in expat.h
  • XML_Content_Type in expat.h
  • XML_Content_Quant in expat.h
  • XML_Parsing in expat.h
  • XML_ParamEntityParsing in expat.h
  • XML_FeatureEnum in expat.h

Set up CI

Steps:

  • Cargo build, cargo test, and run libexpat test binary (runtests).
  • Test projects that use libexpat?

Try to re-translate xmltok, preserving macro invocations

xmltok and xmltok_impl are mostly macros, so we had the thought to try to force the translator to translate all expressions that were expanded from a macro invocation into invalid Rust output that invokes a non-existent macro. We can then add definitions for the macros as needed by hand.

Change types in XML_ParserStruct

We'll start by planning out changes here, then incrementally make those changes through the codebase.

Raw pointer fields that need replacing:

  • pub m_userData: *mut c_void
  • pub m_handlerArg: *mut c_void
  • pub m_buffer: *mut c_char
  • pub m_mem: crate::expat_h::XML_Memory_Handling_Suite
  • pub m_bufferPtr: *const c_char
  • pub m_bufferEnd: *mut c_char
  • pub m_bufferLim: *const c_char
  • pub m_parseEndPtr: *const c_char
  • pub m_dataBuf: *mut crate::expat_external_h::XML_Char
  • pub m_dataBufEnd: *mut crate::expat_external_h::XML_Char
  • pub m_startElementHandler: crate::expat_h::XML_StartElementHandler
  • pub m_endElementHandler: crate::expat_h::XML_EndElementHandler
  • pub m_characterDataHandler: crate::expat_h::XML_CharacterDataHandler
  • pub m_processingInstructionHandler: crate::expat_h::XML_ProcessingInstructionHandler
  • pub m_commentHandler: crate::expat_h::XML_CommentHandler
  • pub m_startCdataSectionHandler: crate::expat_h::XML_StartCdataSectionHandler
  • pub m_endCdataSectionHandler: crate::expat_h::XML_EndCdataSectionHandler
  • pub m_defaultHandler: crate::expat_h::XML_DefaultHandler
  • pub m_startDoctypeDeclHandler: crate::expat_h::XML_StartDoctypeDeclHandler
  • pub m_endDoctypeDeclHandler: crate::expat_h::XML_EndDoctypeDeclHandler
  • pub m_unparsedEntityDeclHandler: crate::expat_h::XML_UnparsedEntityDeclHandler
  • pub m_notationDeclHandler: crate::expat_h::XML_NotationDeclHandler
  • pub m_startNamespaceDeclHandler: crate::expat_h::XML_StartNamespaceDeclHandler
  • pub m_endNamespaceDeclHandler: crate::expat_h::XML_EndNamespaceDeclHandler
  • pub m_notStandaloneHandler: crate::expat_h::XML_NotStandaloneHandler
  • pub m_externalEntityRefHandler: crate::expat_h::XML_ExternalEntityRefHandler
  • pub m_externalEntityRefHandlerArg: crate::expat_h::XML_Parser
  • pub m_skippedEntityHandler: crate::expat_h::XML_SkippedEntityHandler
  • pub m_unknownEncodingHandler: crate::expat_h::XML_UnknownEncodingHandler
  • pub m_elementDeclHandler: crate::expat_h::XML_ElementDeclHandler
  • pub m_attlistDeclHandler: crate::expat_h::XML_AttlistDeclHandler
  • pub m_entityDeclHandler: crate::expat_h::XML_EntityDeclHandler
  • pub m_xmlDeclHandler: crate::expat_h::XML_XmlDeclHandler
  • pub m_encoding: *const crate::src::lib::xmltok::ENCODING
  • pub m_initEncoding: crate::src::lib::xmltok::INIT_ENCODING
  • pub m_internalEncoding: *const crate::src::lib::xmltok::ENCODING
  • pub m_protocolEncodingName: *const crate::expat_external_h::XML_Char
  • pub m_unknownEncodingMem: *mut c_void
  • pub m_unknownEncodingData: *mut c_void
  • pub m_unknownEncodingHandlerData: *mut c_void
  • pub m_unknownEncodingRelease: Option<unsafe extern "C" fn(_: *mut c_void) -> ()>
  • pub m_processor: Option<Processor>
  • pub m_eventPtr: *const c_char
  • pub m_eventEndPtr: *const c_char
  • pub m_positionPtr: *const c_char
  • pub m_openInternalEntities: *mut OPEN_INTERNAL_ENTITY
  • pub m_freeInternalEntities: *mut OPEN_INTERNAL_ENTITY
  • pub m_declEntity: *mut ENTITY
  • pub m_doctypeName: *const crate::expat_external_h::XML_Char
  • pub m_doctypeSysid: *const crate::expat_external_h::XML_Char
  • pub m_doctypePubid: *const crate::expat_external_h::XML_Char
  • pub m_declAttributeType: *const crate::expat_external_h::XML_Char
  • pub m_declNotationName: *const crate::expat_external_h::XML_Char
  • pub m_declNotationPublicId: *const crate::expat_external_h::XML_Char
  • pub m_declElementType: *mut ELEMENT_TYPE
  • pub m_declAttributeId: *mut ATTRIBUTE_ID
  • pub m_dtd: *mut DTD
  • pub m_curBase: *const crate::expat_external_h::XML_Char
  • pub m_tagStack: *mut TAG
  • pub m_freeTagList: *mut TAG
  • pub m_inheritedBindings: *mut BINDING
  • pub m_freeBindingList: *mut BINDING
  • pub m_atts: *mut crate::src::lib::xmltok::ATTRIBUTE
  • pub m_nsAtts: *mut NS_ATT
  • pub m_tempPool: STRING_POOL
  • pub m_temp2Pool: STRING_POOL
  • pub m_groupConnector: *mut c_char
  • pub m_parentParser: crate::expat_h::XML_Parser

Trait for handler callbacks

We can replace handlers with a Rust trait, and an instance of the trait holding the unsafe C function pointers. This will let us implement a native Rust handler struct in the future.

Move to stable rustc

We are currently using the following nightly-only features:

Unsafe pointer manipulation:

  • #![feature(const_raw_ptr_to_usize_cast)]
  • #![feature(const_transmute)]
  • #![feature(ptr_wrapping_offset_from)]

C ABI:

  • #![feature(extern_types)]

Convenience, can be removed:

  • #![feature(label_break_value)]
  • #![feature(main)]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.