Code Monkey home page Code Monkey logo

docxbox's Introduction

Build Status CodeFactor

docxBox

CLI tool for Word DOCX templating and analysis.

Table of contents

Commands

Unzip DOCX: All files, or only media files, format XML

Unzip all files: docxbox uz foo.docx

Unzip only media files:
docxbox uzm foo.docx or docxbox uz foo.docx -m
or docxbox uz foo.docx --media

Unzip all files and indent XML files:
docxbox uzi foo.docx
or docxbox uz foo.docx -i
or docxbox uz foo.docx --indent

Zip files into DOCX

docxbox zp path/to/directory out.docx

Compress XML, than zip files into DOCX:

When having indented XML (i.e. via uzi command) for manual manipulation, the zpc command compresses (= unindents) all XML files before zipping them into a new DOCX:

docxbox zpc path/to/directory out.docx

Output DOCX contents

List files: All, filtered, images only

Lists files contained within a given DOCX, and their attributes:

docxbox ls foo.docx

List contents of DOCX archive

To output as JSON:

docxbox lsj foo.docx
or docxbox ls foo.docx -j
or docxbox ls foo.docx --json

Filter by wildcard

docxbox ls foo.docx *.xml Lists only files ending w/ .xml

List all files containing string or regular expression

docxbox lsl foo.docx foo Lists all files containing the string foo
or docxbox ls foo.docx -l foo
or docxbox ls foo.docx --locate foo

This command is a shorthand to the grep tool (must be installed on your system when using this command).
The search-string therefor can also be given as a regular expression:

docxbox lsl foo.docx '/[0-9A-Z]\{8\}/'
Lists all files containing 8-digit IDs, e.g. word recent session IDs (ISO/IEC 29500-1).

List all files containing string or regular expression as JSON

docxbox lslj foo.docx foo
or docxbox lsl foo.docx -j foo
or docxbox ls foo.docx -lj foo
or docxbox lsl foo.docx --json foo
or docxbox ls foo.docx --locate -j foo
or docxbox ls foo.docx --locate --json foo

List image files

Output list of contained images and their media attributes (like width, height, encoding, compression, etc.)

docxbox lsi foo.docx
or docxbox ls foo.docx -i
or docxbox ls foo.docx --images

To output as JSON:

docxbox lsij foo.docx
or docxbox lsi foo.docx -j
or docxbox ls foo.docx -ij
or docxbox lsi foo.docx --json
or docxbox ls foo.docx --images --json

Note: Media attributes are read using the file command, which must be installed on your system (but usually should be already) when using docxBox's lsi command.

List meta data

docxBox displays only attributes that are contained within the current DOCX file (the attributes can vary by DOCX version and word processor used for creation), also if given empty.

Output meta data of given DOCX:

docxbox lsm foo.docx
or docxbox ls foo.docx -m
or docxbox ls foo.docx --meta

List document meta data

To output as JSON:

docxbox lsmj foo.docx
or docxbox lsm foo.docx -j
or docxbox ls foo.docx -mj
or docxbox lsm foo.docx --json
or docxbox ls foo.docx --meta --json

Reference: Recognized meta attributes

  • Authors: Creator, lastModifiedBy (<dc:creator> and <cp:lastModifiedBy> of docProps/core.xml)
  • Dates (ISO 8601): Creation-, modification and print-date
    (<dcterms:created> and <cp:modified> and <cp:lastPrinted> of docProps/core.xml)
  • Descriptions: Description, Keywords, Subject, Title
    (<dc:description>, <dc:keywords>, <dc:subject>, <dc:title> of docProps/core.xml)
  • Language (<dc:language> of docProps/core.xml)
  • Revision (<cp:revision> of docProps/core.xml)
  • Application created with and its version, name of used template, company, XML schema of document (<Application>, <AppVersion>, <Template>, <Properties xmlns ... and <Company> of docProps/app.xml)

List referenced fonts

docxbox lsf foo.docx
or docxbox ls foo.docx -f
or docxbox ls foo.docx --fonts

List referenced fonts

To output as JSON:

docxbox lsfj foo.docx
or docxbox lsf foo.docx -j or docxbox ls foo.docx -fj
or docxbox lsf foo.docx --json
or docxbox ls foo.docx --fonts --json

List fields

docxbox lsd foo.docx
or docxbox ls foo.docx -d
or docxbox ls foo.docx --fields

To output as JSON:

docxbox lsdj foo.docx
or docxbox ls foo.docx -dj
or docxbox lsd foo.docx --json
or docxbox ls foo.docx --fields --json

Output XML

docxbox cat foo.docx word/_rels/document.xml.rels
outputs the given file's XML, indented for better readability.

Hint: For viewing or editing complex XML, e.g. with syntax highlightning, you can use your favorite text editor via the cmd command

Output document as plaintext

docxbox txt foo.docx outputs the given document's plaintext (ATM: w/o header and footer)

Output plaintext segments:
docxbox txt foo.docx -s
or docxbox txt foo.docx --segments

Outputs the plaintext from document, with markup sections separated by newlines. This can be helpful to identify "segmented" sentences: Texts which visually appear as a unit, but are declared within multiple separate XML elements (due to formatting or change-tracking purposes).

Compare DOCX documents

docxBox helps tracing changes to the files contained within DOCX archives, made when manipulating documents in word processor applications.

When given two DOCX files, the ls command lists all files of both DOCX documents side-by-side. docxBox compares all files and highlights files w/ different attributes or (identical attributes but) different content.

docxbox ls foo_v1.docx foo_v2.docx

Note: Comparisons are always output as plaintext, JSON output is not supported.

Compare two documents

Compare specific file from two DOCX archives

Files that have changed between versions of a document, can be inspected using the diff tool (which must be installed on your system).

Display side-by-side comparison of the formatted XML of given file (word/settings.xml), with differences indicated:
docxbox diff foo_v1.docx foo_v2.docx word/settings.xml

Display unified diff: docxbox diff foo_v1.docx foo_v2.docx word/settings.xml -u
or: docxbox diff foo_v1.docx foo_v2.docx word/settings.xml --unified

Compare file from two documents

Modify document

Modify meta data

docxBox allows to modify existing attributes, or adds attributes if not present.

  • Set creation-date: docxbox mm foo.docx created "2020-01-29T09:21:00Z"
  • Set creator attribute: docxbox mm foo.docx creator "docxBox v0.0.1"
  • Set description attribute: docxbox mm foo.docx description "Foo bar baz"
  • Set keywords attribute: docxbox mm foo.docx keywords "Foo bar baz"
  • Set language attribute: docxbox mm foo.docx language "en-US"
  • Set lastModifiedBy attribute: docxbox mm foo.docx lastModifiedBy "docxBox v0.0.1"
  • Set lastPrinted attribute: docxbox mm foo.docx lastPrinted "2020-01-10T10:31:00Z"
  • Set modification-date: docxbox mm foo.docx modified "2020-01-29T09:21:00Z"
  • Set revision attribute: docxbox mm foo.docx revision 2
  • Set subject attribute: docxbox mm foo.docx subject "Foo bar"
  • Set title attribute: docxbox mm foo.docx title "Foo bar, baz"

Notes:

  1. Altering meta data does NOT automatically update preview texts of generic fields, which display respective meta data.
    For updating field values, use the sfv command.
  2. All modifications automatically update the modification-date attribute to the current timestamp, unless explicitly setting a different one.
  3. During Batch Templating the modification-date is not updated automatically.

To alter/insert an attribute and save the modified document to a new file:
docxbox mm foo.docx <attribute> <value> new.docx

To update multiple meta attributes with one mm command, tuples of attribute-keys and -values can be given as JSON: docxbox mm foo.docx "{\"<attribute>\":\"<value>\",\"<attribute>\":\"<value>\", ...}" new.docx

Replace image

docxbox rpi foo.docx image1.jpeg /home/replacement.jpeg overwrites the DOCX w/ the modified document.

Note: The original and replacement image must be of the same format (bmp, gif, jpg, etc.).

docxbox rpi foo.docx image1.jpeg /home/replacement.jpeg new.docx
This creates a new file: new.docx

Replace text

Replace all (case-sensitive) occurrences of given string in DOCX text:

docxbox rpt foo.docx old new updates foo.docx
docxbox rpt foo.docx old new new.docx creates a new file new.docx

Insert text into existing table

stv inserts values (and cells if needed) into an existing table, starting at 1st cell of given row. If there are less columns in the row than values given, more rows are added after the row.

This is useful for maintaining a specific table style (borders, coloring, font, etc.) when rendering dynamic documents from DOCX templates.

Example: Fill/Insert four cells starting w/ second row of first table in document:
docxbox stv foo.docx "{\"table\":1,\"row\":2,\"values\":[\"foo\",\"bar\",\"baz\",\"qux\"]}

Note: Table and rows are indexed starting w/ 1 (not 0).

The table and row to start inserting data into can also be identified by text (distinct within the document) contained within a cell of that table and row:

docxbox stv foo.docx "{\"cell\":\"insert-data-here\",\"values\":[\"foo\",\"bar\",\"baz\",\"qux\"]}

Replace by markup

Moreover replacing text and fields, docxBox supports rendering and inserting the following Office Open XML elements:

Markup specification for such elements must be given as JSON, following these rules:

  • JSON must be wrapped within {...}
  • The first item must be a type identifier (h1, h2, h3, img / image, ol, table, ul)
  • All attributes are given associative (as JSON object related to the type)
  • The order of attributes within the config of the type is arbitrary
Insert heading

Example: Replace string search by a Heading 1 with the text Foo:
docxbox rpt foo.docx search "{\"h1\":{\"text\":\"Foo\"}}"

docxBox supports rendering of Header 1, 2 and 3 (h1, h2, h3).

Insert text

Example: Replace string search (by a new run) with the text Foo:
docxbox rpt foo.docx search "{\"text\":{\"text\":\"Foo\"}}"

Insert paragraph containing text

Example: Replace string search (by a new paragraph containing a run) with the text Foo:
docxbox rpt foo.docx search "{\"p\":{\"text\":\"Foo\"}}"

Insert hyperlink

Example: Replace string search by a hyperlink:
docxbox rpt foo.docx search "{\"link\":{\"text\":\"docxBox\",\"url\":\"https://github.com/gyselroth/docxbox\"}}"

Insert list

Replace string search by an unordered list:
docxbox rpt foo.docx search "{\"ul\":{\"items\":[\"item-1\",\"item-2\",\"item-3\"]}}"

Insert image

Image markup specification example:

{
    "img":{
        "name":"example.jpg",
        "offset":[0,0],
        "size":[2438400,1828800]
    }
}

Specification rules:

  • The name parameter is optional
  • The offset argument is optional
  • Image size is per default expected to be given in EMUs (= English Metric Unit, being: pixels * 9525), but can also be specified in Pixels like: "size\":[\"256px\",\"192px\"]

When inserting a new image file, it must be given as additional argument:
docxbox rpt foo.docx search "{\"image\":{\"size\":[2438400,1828800]}}" images/ex1.jpg

Insert new table

To replace text by a newly rendered table like:

A B C
a1 b1 c1
a2 b2 c2
a3 b3 c3

the table specification as JSON looks like:

{
    "table":{
        "columns":3,
        "rows":3,
        "header":["A","B","C"],
        "content":[
            ["a1","b1","c1"],
            ["a2","b2","c2"],
            ["a3","b3","c3"]
        ]
    }
}
Specification rules:
  • header is optional, when given: columns is optional
  • content is optional, when given: rows is optional

Replace search by table:
docxbox rpt foo.docx search "{\"table\":{\"header\":[\"A\",\"B\",\"C\"],\"content\":[[\"a1\",\"a2\",\"a3\"],[\"b1\",\"b2\",\"b3\"],[\"c1\",\"c2\",\"c3\"]]}}"

Remove content between text

Remove content between (and including) given strings (left and right):

docxbox rmt foo.docx left right updates foo.docx
docxbox rmt foo.docx left right new.docx creates a new file new.docx

Set field value: Merge fields, generic fields

Merge fields

When setting the value (text) of a merge field, the merge field is reduced to its textual component (maintaining its visual style).

Note: A particular merge field can NOT be merged repeatedly: merging turns the former field into a text element (the field subsequently does not exist as such any more).

docxbox sfv foo.docx "MERGEFIELD foo" bar (Updates foo.docx)
Changes all merge fields, whose identifier begins with foo, into the text bar.

docxbox sfv foo.docx "MERGEFIELD foo" bar new.docx Saves the resulting DOCX to a new DOCX file: new.docx

Hint: To find out field identifiers use docxBox's lsd command.

Generic fields

Setting field values includes also preview texts of otherwise generic fields, which in some word processing applications have to be updated explicitly.

docxbox sfv foo.docx "PRINTDATE" "10.01.2020"
Updates the shown text of all print-date fields to 10.01.2020.

Randomize document text

Replace all text of an existing document by similarly structured random "Lorem Ipsum" dummy text, helpful for generating DOCX documents for testing purposes:

docxbox lorem foo.docx updates foo.docx
docxbox lorem foo.docx new.docx creates a new file new.docx

Batch Templating

docxBox's batch templating mode allows to perform an arbitrary sequence of operations (supporting all docxBox commands for document manipulation) upon a given DOCX. It thereby facilitates a more extensive range of templating options than the commands directly (= without batch templating) available.

Example: docxBox does not directly support replacing merge fields by other than plain textual content. Via batch templating, merge fields can be transformed into text in one step of a sequence, which can completely or in part, in a later step be replaced by generic content like for example a table, which can later be filled with more content.

Replacement Pre/Post Markers

Batch templating can make use of "markers": optional text elements containing a distinct identifier string. Markers can temporarily be inserted and can subsequently be replaced again at a later step of the batch sequence by other generic content.

Rules:

  • Markers can be added before (key: pre) and after (key: post) the actual generic replacement content
  • Markers can either be of the type text or paragraph (or p) to insert surrounding line-breaks
  • Markers contain a textual identifier, which can use any text (but should be distinct within the document)
Batch sequence JSON

Sequences of templating steps to be batch-processed must be given like:

{
 "<STEP_ID>": {"<COMMAND>": [("<ARGUMENT_1>",)(,"<ARGUMENT_2>",...)]},
 "<STEP_ID>": {"<COMMAND>": [("<ARGUMENT_1>",)(,"<ARGUMENT_2>",...)]},
 ...
}

Example:

{
 "1": {"mm": ["description", "foo"]},
 "2": {"rpt": ["bar", "baz"]},
 "3": {"rpt": [
    "qux", 
    {"h1": {"text": "Quux"}}
 ]}
}

Rules:

  • Every step must be given as a tuple of step-ID and -parameters
  • <STEP_ID> is an arbitrary string, must be distinct within the sequence
  • Parameters must be given as a tuple of a command and its respective arguments
  • <COMMAND> accepts any of docxBox's commands for DOCX manipulation (rmt, rpi, rpt, lorem, mm and sfv)
  • <ARGUMENT>: Argument(s) for respective command, same as in non-batch mode
  • When a command has no arguments (e.g. lorem), an empty array must be given though (E.g.: {"lorem":[]})
  • Arguments for markup-configuration of generic document elements can be given as nested JSON
Example: Replace string by heading-1 followed by table containing images

Templating sequence:

  1. Step "1": Replace string foo by heading-1 with the text: Foobar (followed by a temporary marker my-marker-1)
  2. Step "2": Replace the marker my-marker-1 by table containing 2x2 cells
  3. Steps "3" to "6": Replace (the placeholder texts within the) table cells by images
  4. Add new image files into docx document

Batch config:

{
 "1": {"rpt": [
    "foo",
    {
     "h1": {
        "text": "Foobar",
        "post": {"text": "my-marker-1"}
     }
    }
 ]},
 "2": {"rpt": [
   "my-marker-1",
   {
     "table": {
         "columns": 2,
         "rows": 2,
         "header": ["A","B"],
         "content": [
             ["img-a1", "img-b1"],
             ["img-a2", "img-b2"]
         ]
     }
   }  
 ]},
 "3": {"rpt": [
    "img-a1",
    {
      "img": {
          "name": "blue.png",
          "size": [2438400, 1828800]
      }
    }  
 ]},
 "4": {"rpt": [
    "img-b1",
    {
      "img": {
          "name": "green.png",
          "size": [2438400, 1828800]
      }
    }
 ]},
 "5": {
    "rpt": [
      "img-a2",
      {
        "img":{
          "name": "orange.png",
          "size": [2438400, 1828800]
        }
      }
 ]},
 "6": {"rpt": [
    "img-b2",
    {
      "img": {
          "name": "red.png",
          "size": [2438400,1828800]
      }
    }
 ]}  
}

The full batch command:

Note: As when inserting new images in non-batch mode (via rpt or rpi), also during batch templating, image files to be added into the document must be given as trailing arguments.

docxbox batch foo.docx "{\"1\":{\"rpt\":[\"foo\",{\"h1\":{\"text\":\"Foobar\",\"post\":{\"text\":\"my-marker-1\"}}}]},\"2\":{\"rpt\":[\"my-marker-1\",{\"table\":{\"columns\":2,\"rows\":2,\"header\":[\"A\",\"B\"],\"content\":[[\"img-a1\",\"img-b1\"],[\"img-a2\",\"img-b2\"]]}}]},\"3\":{\"rpt\":[\"img-a1\",{\"img\":{\"name\":\"blue.png\",\"size\":[2438400,1828800]}}]},\"4\":{\"rpt\":[\"img-b1\",{\"img\":{\"name\":\"green.png\",\"size\":[2438400,1828800]}}]},\"5\":{\"rpt\":[\"img-a2\",{\"img\":{\"name\":\"orange.png\",\"size\":[2438400,1828800]}}]},\"6\":{\"rpt\":[\"img-b2\",{\"img\":{\"name\":\"red.png\",\"size\":[2438400,1828800]}}]}}" blue.png green.png orange.png red.png

Save batch processed document to new file

To save the resulting document of batch processed manipulations to a new file, instead of overwriting the source document, the destination filename can optionally be given as the very last argument (also trailing other optional arguments like image files):

docxbox batch foo.docx "{\"1\":{\"mm\":[\"description\",\"foo\"]},\"2\":{\"rpt\":[\"bar\",\"baz\"]},\"3\":{\"rpt\":[\"qux\",{\"h1\":{\"text\":\"Quux\"}}]}}" new.docx

Arbitrary manual and scripted analysis / modification

docxBox eases conducting arbitrary modifications on files contained within a DOCX, manually and scripted.
All steps besides the actual modification are automated via docxBox, with the respective user-defined modification inserted.

Example - Edit XML file manually:

docxbox cmd foo.docx "nano *DOCX*/word/document.xml"

docxBox in the above example does:

  1. Unzip foo.docx
  2. Indent all extracted XML files
  3. Render (= replace *DOCX* w/ the resp. extraction path)
    and execute the command: nano *DOCX*/word/document.xml, thereby opening document.xml for editing in nano, halting docxBox until exiting the editor.
  4. Unindent all extracted XML files
  5. Zip the extracted files back into foo.docx

Output docxBox help or version number

docxbox
or docxbox h
Outputs docxBox's help text.

docxbox h <command> Outputs more help on a given command.

docxbox v Outputs the installed docxBox's version number.

Configuration

docxBox can optionally be configured using the following environment variables:

Option Possible Values Default
docxBox_notify stdout = Output notifications to stdout only stdout
log = Log all notifications to file only
both = Output notifications to stdout and log file
off = Do not output any notifications
docxBox_log_path empty = out.log is written to out.log in current working directory empty
arbitary_path/filename.out = log file is written to given path
docxBox_clear_log_on_start 0 = docxBox appends notifications to logfile 0
1 = docxBox resets the logfile on startup
docxBox_verbose 0 = Only most relevant notifications, if not disabled, are output to stdout 0
1 = If enabled, all modification notifications are output to stdout

Example:
Export variable to the environment docxBox runs in: export docxBox_verbose=1

Build Instructions

cmake CMakeLists.txt; make

Running tests

In order to run functional tests, Bats must be installed.

Run all tests: ./test.sh

Run specific test suite:
./test.sh <suite>
E.g.: ./test.sh ls - Filenames in test/functional/ correspond to test suite names.

Check all tests for memory-leaks via Valgrind:
./test.sh valgrind
In order to check for memory-leaks, Valgrind must be installed on your computer.

Code Convention

The source code of docxBox follows the Google C++ Style Guide.
The source code of functional tests follows the Google Shell Style Guide

Changelog

See Changelog

Roadmap

  • v1.0.0: Ensure all templating options work and output is microsoft word compatible
  • v1.0.0: Add HTTP/s server mode (make usable as local web service)
  • v1.1.0: Libre-Office compatible appending of two DOCX files into a single one (by XML appending, instead of adding sub-documents)

Bug Reporting and Feature Requests

If you find a bug or have an enhancement request, please file an issue on the github repository.

Third Party References

Microsoft Office and Word are registered trademarks of Microsoft Corporation.

docxBox was built using the following third party libraries and tools:

Library Description License
nlohmann/json JSON for Modern C++ MIT License
tfussel/miniz-cpp Cross-platform header-only C++14 library for reading and writing ZIP files MIT License
leethomason/tinyxml2 A simple, small, efficient, C++ XML parser zlib License
Tool Description License
Bats Bash Automated Testing System MIT License
Clang A C language family frontend for LLVM Apache License
Cmake Family of tools designed to build, test and package software New BSD License
Cppcheck Static analysis tool for C/C++ code GNU General Public License version 3
cpplint Static code checker for C++ BSD-3 Clause
GCC GCC, the GNU Compiler Collection GNU General Public License version 3
Travis CI Hosted Continuous Integration Service MIT License
Valgrind System for debugging and profiling Linux programs GNU General Public License, version 2

Thanks a lot!

License

docxBox is licensed under The MIT License (MIT)

docxbox's People

Contributors

kstenschke avatar lucasbornhauser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

docxbox's Issues

use word2016 can not open bio_assay.docx in test file

Describe the bug
replace text using markup text, then the docx file can not be open by word2016

To Reproduce
Steps to reproduce the behavior:

  1. Command ' ./docxbox rpt /mnt/hgfs/vmware-share/docxbox-master/docxbox-master/test/files/docx/mergefields.docx sunt "{"h1":{"text":"Foo"}}" '

Environment:

  • docxBox Version:newest
  • DOCX Processor, if involved (Microsoft Word or other application)

Replacing string with table corrupts DOCX

Describe the bug
Replacing a string with a table makes DOCX invalid

To Reproduce
Steps to reproduce the behavior:

  1. Command:
  • docxbox rpt table_unordered_list_images.docx Officia "{"table":{"header":["A","B","C"],"content":[["a1","a2","a3"],["b1","b2","b3"],["c1","c2","c3"]]}}"
  1. Resources: table_unordered_list_images.docx

Expected behavior

  • Given string is replaced by table (works)
  • DOCX is valid →DOCX is invalid

Screenshots
Auswahl_059

word_errormessage

Environment:

  • docxBox Version: 0.0.5
  • DOCX Processor: Word 2019

Additional information
Screenshot only shows relevant lines, the actual diff has more lines

error in "lslj" command

Describe the bug
When running
"./docxbox lsl filename.docx -j searchString"
or
"./docxbox lsl filename.docx --json searchString"
an error is thrown:
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'

To Reproduce
Steps to reproduce the behavior:

  1. Command:
  • ./docxbox lsl bio_assay.docx -j fonts
  • ./docxbox lsl bio_assay.docx --json fonts
  1. Resources: valid .docx file

Expected behavior
List of files containing given search string as JSON

Environment:

  • docxBox Version: 0.0.4

command "lslj" does not work

Describe the bug
The "./docxbox h lslj" or any other "lslj" command don't work.
The Output is: "Unknown command: lslj."

To Reproduce

  1. Command: "./docxbox h lslj"

Expected behavior
Help displayed for lslj command

Environment:

  • docxBox Version: 0.0.4

"ls wrong-file-type" throws an error

Describe the bug
When trying to list files in a none .docx file docxbox throws an error:

terminate called after throwing an instance of 'std::runtime_error'
what(): bad zip

To Reproduce
Steps to reproduce the behavior:

  1. Command: docxbox ls wrong_file_type
  2. Resources: any none .docx file

Expected behavior
An error message stating a wrong file type was provided.

Environment:

  • docxBox Version: 0.0.4

setting meta attribute via batch processing does not work

Describe the bug
Trying to set a meta attribute through batch processing throws an error:
docxBox Error - Invalid argument: Unknown or unsupported attribute: "title"
docxBox Error - Initialization for meta modification failed.

To Reproduce
Steps to reproduce the behavior:

  1. Command:
  • ./docxbox batch table_unordered_list_images.docx "{"1":{"mm":["title","foo"]}}"
  1. Resources: table_unordered_list_images.docx

Expected behavior
Meta attribute gets set, batch processing continues

Environment:

  • docxBox Version: 0.0.5

command "rpi" doesn't replace image

Describe the bug
When trying to replace an image in a docx an error is thrown:
File not found: cp_table_unordered_list_images.docx-1589441700/word/media/image1.jpeg
The doesn't get replaced.

To Reproduce
Steps to reproduce the behavior:

  1. Command:
  • docxbox rpi validDocxWithImage image1.jpeg test/files/images/2100x400.jpeg
  1. Resources:
  • table_unordered_list_images.docx
  • 2100x400.jpeg

Expected behavior
The image gets replaced

Environment:

  • docxBox Version: 0.0.4

"rpt" doesn't replace text

Describe the bug
When trying to replace a string by another string, it doesn't get replaced and the process isn't terminated. No error is shown.

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox rpt valid_docx Lorem xxx
  2. Resources: any valid docx file

Expected behavior
All occurrences of given string are replaced

Environment:

  • docxBox Version: 0.0.4

Replacing a string with a heading is not working

Describe the bug
When trying to replace a given string by a heading docxBox throws an error:

docxBox Error - DOCX creation failed

To Reproduce
Steps to reproduce the behavior:

  1. Command: doxbox rpt table_unordered_list_images.docx "searchString" "{\"h1":{\"text\":\"Foo\"}}"
  2. Resources: table_unordered_list_images.docx

Expected behavior
The given string gets replaced by the given heading

Environment:

  • docxBox Version: 0.0.5

rpt ol renders ul instead

Describe the bug
rpt w/ ordered-list renders an unordered list instead.

Additional information
Speculation: abstractNumId of numbering.xml and document.xml might point to the wrong markup.

error when runnnig longhand command of lslj

Describe the bug
When running the longhand command of lslj (--lj) all files are listed (not just those containing given search string) and are not in JSON-Format

To Reproduce

  1. Command: ./docxbox ls filename.docx --lj "searchString"
  2. Resources: any valid .docx file

Expected behavior
A list of files containing given search string as JSON

Screenshots
Expected Output:
Auswahl_053

Actual Output:
Auswahl_052

Environment:

  • docxBox Version: 0.0.4

incomplete "lsd" command

Describe the bug
When listing fields with the "lsd" command fields in the footer are not listed.

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox lsd mergefields.docx
  2. Resources: mergefields.docx

Expected behavior
Command lists all fields in docx

Screenshots
Auswahl_062

Auswahl_063

Environment:

  • docxBox Version: 0.0.5

refine error message of rpi command without a docx file

Describe the bug
The error message of the rpi command without providing a docx file is misleading.
Error message now: "Missing argument: Filename of image to be replaced"
Expected Error message: "Missing argument: Filename of DOCX to be extracted"

To Reproduce

  1. Command './docxbox rpi '

Expected behavior
An error message stating the docx file is missing. The correct message was provided, but possibly got lost.

Environment:

  • 0.0.1

fields in header and footer can't be replaced

Describe the bug
Trying to replace a merge field which is located either in the footer or the header with a string doesn't work. No error message is given.

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox sfv mergefields.docx "MERGEFIELD Mergefield_Header" FooBar
  2. Resources: mergefields.docx

Expected behavior
The provided field is replaced by the given string

Environment:

  • docxBox Version: 0.0.5

Replacing text with an image throws error

Describe the bug
Replacing text with an image throws an error when using EMU's:

  • terminate called after throwing an instance of 'nlohmann::detail::type_error'
    what(): [json.exception.type_error.302] type must be string, but is number
    Abgebrochen

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox rpt table_unordered_list_images.docx "{"image":{"size":[2438400,1828800]}}" test/files/images/2100x400.jpeg
  2. Resources:
  • table_unordered_list_images.docx
  • 2100x400.jpeg

Expected behavior
The provided text gets replaced by the given image

Environment:

  • docxBox Version: 0.0.5

error in "mm" command

Describe the bug
Trying to change the meta attribute "subject" throws an error:
"terminate called after throwing an instance of 'char const*'"

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox mm file.docx subject "replacementString"
  2. Resources: any valid docx file containing the meta attribute "subject"

Expected behavior
Meta attribute gets changed

Environment:

  • docxBox Version: 0.0.5

"uz -i" and "uz --indent" dont indent XML files

Describe the bug
When using the longhand option of the uzi command (uz -i or uz --indent) the XML files are not indented.
Additionally, there is a spelling error when displaying help (ident instead of indent)

To Reproduce

  1. Command:
  • docxbox uz file.docx -i
  • docxbox uz file.docx --indent
  1. Resources: any valid docx file

Expected behavior
XML files get indented

Screenshots
Auswahl_054

Environment:

  • docxBox Version: 0.0.4

remove unused methods

ccpcheck reports:

[src/docxbox/helper/helper_image.cc:9]: (style) The function 'GetDimension' is never used.
[src/docxbox/helper/helper_string.cc:102]: (style) The function 'GetSubStrAfter' is never used.
[src/docxbox/helper/helper_string.cc:28]: (style) The function 'GetSubStrCount' is never used.
[src/vendor/tinyxml2/tinyxml2.cpp:1979]: (style) The function 'InsertNewChildElement' is never used.
[src/vendor/tinyxml2/tinyxml2.cpp:1985]: (style) The function 'InsertNewComment' is never used.
[src/vendor/tinyxml2/tinyxml2.cpp:1997]: (style) The function 'InsertNewDeclaration' is never used.
[src/vendor/tinyxml2/tinyxml2.cpp:1991]: (style) The function 'InsertNewText' is never used.
[src/vendor/tinyxml2/tinyxml2.cpp:2003]: (style) The function 'InsertNewUnknown' is never used.
[src/docxbox/helper/helper_file.cc:6]: (style) The function 'IsFile' is never used.
[src/docxbox/helper/helper_string.cc:123]: (style) The function 'IsNumeric' is never used.
[src/docxbox/helper/helper_string.cc:60]: (style) The function 'ReplaceAll' is never used.
[src/docxbox/helper/helper_string.cc:38]: (style) The function 'ReplaceFirstOccurrence' is never used.
[src/docxbox/helper/helper_string.cc:155]: (style) The function 'ToLower' is never used.
[src/docxbox/helper/helper_string.cc:163]: (style) The function 'ToUpper' is never used.
[src/docxbox/helper/helper_string.cc:168]: (style) The function 'UcFirst' is never used.

Functional tests allover return code is inconclusive

Describe the missing feature
test.sh does not return any conclusive status code: its return code always signals successful execution, independent of the results of the individual test suites.

TODO
When any single one (or more) of the invoked test suites does return a code different to 0 (= success), test.sh should finally conclude with a return signal 1 (= general error).

Optional out-filename not recognized during cmd command

TODO(kay): optional out-filename not recognized during cmd command:

docxbox cmd foo.docx "nano DOCX/word/document.xml" foo2.docx

=> Hit [Enter] when done.
mv: missing destination file operand after 'tmp.zip'
Try 'mv --help' for more information.

command "lorem" replaces file

Describe the bug
When trying to replace text in a file by dummy text and saving it to a new file, the old file is replaced by the new one.

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox lorem docx_v1 new_docx.docx
  2. Resources: valid docx file

Expected behavior
Text gets replaced and is saved in new file

Environment:

  • docxBox Version: 0.0.4

command "lorem" doesn't replace text and throws an error

Describe the bug
When trying to randomize text in a docx file by running "docxbox lorem .docx" an error is thrown:

terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid

To Reproduce
Steps to reproduce the behavior:

  1. Command: docxbox lorem
  2. Resources: any valid docx

Expected behavior
Text in provided docx file gets replaced by random text

Environment:

  • docxBox Version: 0.0.4

error when setting field values with "sfv" command

Describe the bug
Setting filed values by using "sfv" breaks and throws an error:
terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_M_construct null not valid.
A directory with the unziped files is created in the project root

To Reproduce
Steps to reproduce the behavior:

  1. Command:
  • ./docxbox sfv test/files/docx/file_with_mergefields.docx "MERGEFIELD Schueler_Anrede" TEST
  1. Resources:
  • file_with_mergefields.docx

Expected behavior
Setting new value to given field.

Screenshots
Auswahl_051

Environment:

  • docxBox Version: 0.0.4

command to list fiels is displayed wrong in help

Describe the bug
The command to list fields in a docx is: ./docxbox lsd filename.docx, but the output of "./docxbox h " is: {see screenshot} lsg <--> lsd

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox h

Expected behavior
correct output of commands

Screenshots
Auswahl_046

Environment:

  • docxBox Version: 0.0.1

functional tests linting issues

Please see issues within the ShellCheck report of functional tests.

  • see: ShellCheck
  • note 1: The unrecognized shebang line of bats we might have to mark for being ignored by ShellCheck
  • note 2: I disabled ShellCheck linting within Travis CI config for now / until these issues are fixed - please use ShellCheck locally (at least) for the time being

command "rmt" removes too much

Describe the bug
When running the "rem" command more than the expected strings are removed.

To Reproduce

  1. Command: "./docxbox rem test/files/docx/cp_table_unordered_list_images.docx Dolore incididunt
  2. Resources: copy of table_unordered_list_images.docx

Expected behavior
All strings in between and including given strings are removed from given .docx

Screenshots
Original docx:
Auswahl_048

Manipulated docx:
Auswahl_047

Environment:

  • docxBox Version: 0.0.2

command "lsl" throws an error

Describe the bug
When running
"./docxbox lsl filename.docx {missing search string}"
or
"./docxbox lsl {missing filename}"
an error is thrown:
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'

To Reproduce
Steps to reproduce the behavior:

  1. Command :
  • ./docxbox lsl table_unordered_list_images.docx

OR

  • ./docxbox lsl
  1. Resources: table_unordered_list_images.docx

Expected behavior
An error message stating that a required argument is missing respectively
An error message stating that a required file is missing

Environment:

  • docxBox Version: 0.0.4

consistency: rename "rem" to "rmt"

For consistency of naming (existing and future) commands, "rem" command should be named "rmt".
The naming will than correspond to "replace text" being abbreviated "rpt", and will keep sibling commands consistent and deductable and easier rememberable (e.g. "remove between fields": "rmf").

Zipping files requires installed zip

Technical Debt
ATM zipping extracted files into a DOCX invokes (and requires having installed) the zip application instead of using MinizCpp (as the unzip operations do). The existing method for zipping via MinizCpp creates a DOCX which MS Word notifies as being corrupt (other word processors accept it).

Error when setting meta attribute "created" or "modified"

Describe the bug
Setting the meta attribute "created" resets "modified" and vice versa

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox mm bio_assay.docx created "2020-10-11T15:02:22Z"
  2. Resources: bio_assay.docx

Expected behavior
Given meta attribute gets changed without changing the other attribute

Environment:

  • docxBox Version: 0.0.5

Multiple errors when setting meta attributes

Describe the bug
Setting or changing meta attributes throw errors:

  • description:
    docxBox Error - iled render opening tag. Unknown attribute:
    docxBox Error - iled render opening tag. Unknown attribute:
    docxBox Error - iled render closing tag. Unknown attribute:
    docxBox Error - Update/Insert meta attribute failed.

  • Application, AppVersion, Company, xmlSchema, Template:
    docxBox Error - Invalid argument: Unknown or unsupported attribute: {Meta-Attribute}
    docxBox Error - Initialization for meta modification failed.

To Reproduce
Steps to reproduce the behavior:

  1. Command: ./docxbox mm bio_assay.docx {Meta-Attribute} "Replacement"
  2. Resources: bio_assay.docx

Expected behavior
Provided Meta-Attribute is set/changed

Environment:

  • docxBox Version: 0.0.5

Feature: Add valgrind memory-leak test

  • allow test.sh to receive optional argument: "valgrind"
  • when running in valgrind mode: add bats-preparation for executing test-suite w/ valgrind: copy functional-suites, replace doxbox binary-path within all copied suites prefixed w/ valgrind execution
  • test leak detection and reporting: provoke leakage, ensure correct error-message and return-signal

Replacing an image without providing a replacement image creates a folder in project root

Describe the bug
Trying to replace an image omitting the image- name and path to be replaced, a folder containing the files of the given .docx named like the .docx file is created in the project root.
An error message is given, stating a file name of the replacement image is missing.

To Reproduce
Command:

  • $ ./bin/linux/docxbox rpi test/files/table_unordered_list_images.docx image1.jpeg

Files:

  • table_unordered_list_images.docx

Expected behavior
An error message.

Environment:

  • docxBox Version: 0.0.1

Possible std::bad_alloc during rpt

TODO(kay): fix possible crash during rpt:
docxbox rpt foo.docx "22.02.2016" "11.01.2020"
=> terminate called after throwing an instance of 'std::bad_alloc'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.