Code Monkey home page Code Monkey logo

duckx's Introduction

Build Status GitHub license GitHub release Twitter follow

DuckX

Create, read and write Microsoft Office Word docx files. More informations are available in this documentation.

DuckX was created when I was looking for a C++ library which can properly parse MS Word .docx files, but couldn't find any

Status

  • Documents (docx) [Word]
    • Read/Write/Edit
    • Change document properties

Quick Start

Here's an example of how to use duckx to read a docx file; It opens a docx file named file.docx and goes over paragraphs and runs to print them:

#include <iostream>
#include <duckx/duckx.hpp>

int main() {

    duckx::Document doc("file.docx");   

    doc.open();

    for (auto p : doc.paragraphs())
	for (auto r : p.runs())
            std::cout << r.get_text() << std::endl;
}

And compile your file like this:
g++ sample1.cpp -lduckx

Install

Easy as pie!

Compiling

The preferred way is to create a build folder

git clone https://github.com/amiremohamadi/DuckX.git
cd DuckX
mkdir build
cd build
cmake ..
cmake --build .

Requirements

Donation

Please consider donating to sustain our activities.

BITCOIN: bc1qex0wdwp22alnmvncxs3gyj5q5jaucsvpkp4d6z

Licensing

This library is available to anybody free of charge, under the terms of MIT License (see LICENSE.md).

duckx's People

Contributors

amiremohamadi avatar angguss avatar cihansari avatar davidloftus avatar diegomagdaleno avatar ehsan-mohammadi avatar kingkili avatar mohammadekhosravi avatar qwerity avatar superwig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duckx's Issues

Save function doesn't work correctly

Describe the bug
When you modify the word document and save it, the old document.xml must be replaced.
But currently It adds a file with the same name to the word document.

Example
In this case, after calling save function we have two document.xml:
Archive: my_test.docx
Length Date Time Name


  573  02-03-2019 17:16   _rels/.rels
  513  02-03-2019 17:16   docProps/app.xml
  732  02-03-2019 17:16   docProps/core.xml
  531  02-03-2019 17:16   word/_rels/document.xml.rels
  280  02-03-2019 17:16   word/settings.xml
  853  02-03-2019 17:16   word/fontTable.xml
 1480  02-03-2019 17:16   word/document.xml
 2585  02-03-2019 17:16   word/styles.xml
 1118  02-03-2019 17:16   [Content_Types].xml
 1602  07-03-2019 23:14   word/document.xml

10267                     10 files

Expected behavior
Remove the old document.xml and replace it with the new one.

Additional context
This may be due to the zip library.

Can you give me some example to wirte a docx file ?

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Clarification on minimum suported C++ version.

Many of the recent Pull Requests have experienced issues with build failures due to the old GCC 5 on Travis CI (this might be a bug in the travis config as the intended version seems to be 6). Many of the modern C++ features (11 and above) don't appear to be fully supported.

I couldn't find anything explicitly stating that this library should support C++98 compilers. Could you clarify what is the minimum version we should target, and then update the cmake config to reflect this.

Personally I don't see any reason to keep support for C++98, as this is a new library there is no legacy code that depends on it compiling in C++98. Additionally many features we know and love (smart pointers, lambdas, auto, move semantics, etc.)

Can I create a table?

For example I have a task. Make a 20x3 table with number sequences. According to the examples, I can create it manually, read it with DuckX, and display it in the console for example. But can I create it automatically and fill it in automatically?

Image support

I would be interested in the ability to work with images in a Word .docx file.

Examples:

  • Insert a new PNG or JPG (or other) image in the document
  • Access image data in a document as an array of bytes
  • Methods for identifying the image format or file type

This feature would be useful when generating reports from C++ software, where charts, maps or other kinds of figures are required.

To do this currently, it's necessary to jump through a lot of hoops involving writing temporary files in an intermediate format (e.g. HTML) and then invoking external executables to handle the conversion to docx. This is a massive pain. It would be much better to be able to do this entirely within C++ code.

Optimize the Save function

DuckX is using zip library that only supports appending or writing new files. (unable to update a file inplace!)
So currently Save function works as follows:

  • make a new file
  • write any new files
  • copy the old files
  • delete old docx
  • rename new file to old file

And this process has a terrible performance on large files!

Chinese document shows chaos

I test sample1.cpp, it works fine.

While I change the code by giving it a Chinese docx file, the output can not be read.
Seems to be concerned with character encoding.

Help!

I can't use DuckX

Describe the bug
I followed the steps of installing, but when I try to build my project I got an error("cannot open source file "duckx.hpp" ")
I'm working in visual studio 2017
Can you help me ?
Or can you be more specified with instructions of installing ? maybe I missed a step

Thank you :) !

doc.open crashes if doc is not a word document

Describe the bug
Whenever .open is called on a Document object that has a directory that is not a word document the application crashes.

To Reproduce
Steps to reproduce the behavior:
create a project and add this code:
duckx::Document doc("test.docx");
doc.open();

make sure test.docx does not exist

run the code

Expected behavior
A new docx file named test should be crated, but instead the application crashes

Screenshots
image

Desktop (please complete the following information):

  • OS: windows 10

Additional context
Add any other context about the problem here.

Paragraph::has_next "lies" and Paragraph::next returns one non-existing paragraph

Paragraph::has_next returns true if the current paragraph exists in the document, even if there is no next paragraph. next() then returns an empty paragraph which does not exist.

Since that last paragraph is empty, this issue is unnoticeable when reading a document. The problem occurs when the user tries inserting a paragraph after the last paragraph:

Paragraph last = doc.paragraphs();
while(last.has_next()) last = last.next();
last.insert_paragraph_after("This should appear after the last paragraph!");

Since last is actually a non-existing paragraph referring to an pugi::xml_node without a rode, the last paragraph is not written to the file.

I intend to fix this issue myself. I also want to add methods append_paragraph and append_run to Document. This would allow generating a long document without using a while loop to obtain the last paragraph as in the example above, and then working on a new Paragraph object for every new paragraph; instead, one would only use the Document object the entire time.

Unresolved External Symbol

When creating an instance of Document without any file path the code executes with no errors.
However, when you create it with a file path it returns an error ("1 unresolved externals")

image

Support for document properties

It would be useful to have the ability to access & modify document properties.

This would allow creating a template docx file, with fields set up to display custom document properties. Using DuckX, the property values could be populated with output data from C++ software. When loaded into MS Word, the final docx file would display the completed report, and user would still have the ability to edit.

To work around the lack of document property support at the moment, I am experimenting with searching all paragraphs for a particular string (e.g. [REPORTBODY]) and replacing with the corresponding value.

Working with headers and footes

For me I changed the source code in Visual Studio, so that I am able to work with documents with headers and footers. Where can I put the changed the code, so that somebody can review and test it.
One question Í have: is there a way to duplicate a document within Duckx, so that I could produce the basis for serial letters, because I want to get rid of the MS macros in word. There are to much restrictions.

UTF-8 Support

Function bool duckx::Run::set_text(const char *text) writes text into document.xml with no issues for English characters.
When I try to write Russian letters (wide char) it writes with no issues but the MS Word can not open the docx file showing an error.

What I tried (stages):

  1. Create docx with one Russian letter (i.e Ж)
  2. Replace Russian letter with English one (i.e J) using set_text function.
  3. Save document - DOCX OPENS WITH NO ERRORS
  4. Replace English Letter with Russian (J -> Ж)
  5. Save document - DOCX OPENS WITH AN ERROR
  6. Replace Russian Letter with the English one (Ж -> J)
  7. Save document - NO ERRORS

I compared original file (stage 1 - manually created file with Russian letter) and the one with an English letter (stage 7 or 3). The only file that was changed is document.xml.

Stage 1 - document.xml has UTF-8 encoding and header <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Stage 2-7 - document.xml now has ANSI encoding and NO header from stage 1. The letter writes correctly inside <w:t>Ж</w:t>. But due to ANSI encoding and removed header from original file it can not be opened if contain wide characters. No issues for ordinary characters though.

Is there's something I am missing in library configuration or is it a bug then?

Create docx files

Does DuckX support to create new docx files?
The README file says that it supports to create docx files, but I can only get damaged files after calling the Save function.
Maybe I made some mistakes. Could you give me a sample of create new docx files? Thanks a lot.

How to build for android

I would like to use this library for my android project can you tell me how can i compile it for android.

how to crate a table

The project just provide api, but I don't know how to create a 3*3 table. I would be apprecicate if someone can provide me a demo.

Install failing on Ubuntu Linux

Installation is failing on Ubuntu Linux with the following output:

Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/lib/libduckx.a
-- Installing: /usr/local/lib/cmake/duckx/duckxConfig.cmake
-- Installing: /usr/local/lib/cmake/duckx/duckxConfig-release.cmake
-- Installing: /usr/local/include/duckx/duckx.hpp
-- Installing: /usr/local/include/duckx/constants.hpp
-- Installing: /usr/local/include/duckx/duckxiterator.hpp
CMake Error at cmake_install.cmake:69 (file):
  file INSTALL cannot find
  "/home/chris/Projects/DuckXSOURCE_DIR}/thirdparty/pugixml/pugixml.hpp": No
  such file or directory.


gmake: *** [Makefile:148: install] Error 1

I believe this could be due to a an error on line 23 of CMakeLists.txt

Image Support

Does DuckX support to insert image into docx file?
The website says that it supports Adding Images to Word DOCX Files, but I cannot find any hints in the samples.
Could you clarify it? Thanks a lot.

Leading/trailing spaces between runs are not preserved

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:
p.add_run("A ");
p.add_run("run");
p.add_run(" test");

Expected behavior
"A run test" is expected; "Aruntest" is obtained. The leading and trailing spaces are there in the XML, but they are not preserved unless xml:space="preserve" is added to the "w:t" tag.

I plan to fix this myself; I'm opening an issue as requested in Contributing.md.

Support for tables

First of all thanks so much for creating this library!
Very clean and easy to use interface with few dependencies - great!
I tried libopc and while it seems to be very comprehensive, it has a ton of dependencies which makes it rather hard to add to a real app.

I'm about to start a new project which requires parsing docx (among others).
I thought there would be several libraries for reading docx files with C++ but had to find out there are actually not many.

The documents I need to parse typically contain tables and while using the DuckX sample code I'm ending up with almost no output. Do you think it's possible to add support for tables in the near future?

Thanks and keep up the great work!

Add Paragraph at the end off the file

I'm trying to add few paragraph at the end of my file but it is impossible. I tried

While(p.has_next())
p = p.next()

but it doesn't write anything.

I tried other things that a i don't remember. Do you have the issue ?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.