Code Monkey home page Code Monkey logo

docx's People

Contributors

philgale92 avatar saigyoujiyuyuko233 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

docx's Issues

Images are not in their original position

This library works great since it preserves the spacing, images and text.

But the problem is, images are not in their original position. Images are appearing in next line to every line of their actual position. For instance, I have four images in one single line. This line is enclosed in <p> tag without any images. All the four images then prints after the </p> tag.

Please fix this issue. I'm in middle of something very important.

Table parsing issues

When using tables sometimes text gets duplicated & it appears to break when one of the table cells has multiple rows inside.

Unreliable table parsing

Due to how tables were originally parsed, occasionally the paragraphs immediately after the table ended could be removed or placed inside the table, depending on the table structure.

The rewrite deals with table parsing with a much more sane method.

Table Rewrite

Add support for reliable colspan & vertical cell merging.

Images within tables

Currently if an image is inside a table within a word document, the image generally is moved outside of the table all together.

Migrating base features

Re-writing the previous stages to work with version 4.

  • Image support
  • Lists
  • Word styles
  • Text indentation
  • Hyperlinks

Html Entities

Refactored the htmlEntities & how the resulting html is generated for each text run. Thus removing a huge amount of over-complexity in escaping text.

Cannot parse large docx files

The library works perfectly when I upload files which have a small size, but it fails to parse files that are larger in size.

For example, asume that the docx file has only text/characters and tables (black and white, no colour), it works if the file has around 40 such pages (size approx. 50 KB), but fails to parse files with more number of pages and a greater size that.

Can this project to do the word file paging?

hello phil, thanks for your job , everything is ok to used your packge, but if my docx file has N pages, how can I do a page to read;

I try to solve this problem, but it is difficult for me, Can you help me check this question?

thanks again.

Hyperlink parsing

Currently hyperlinked text is ignored.

A link parser is now in progress, as the link could make up a single word within a line of text the linked word(s)'s have to be wrapped directly with the link html (depending on $this->convertPlaceholders it will either be wrapped with an <a> tag, or PLACEHOLDER_* brackets)

Corrupt images

Sometimes corrupt images are encoded and saved into the parsed data. I expect this is from word art or some other form of docx proprietary data.

This bug does not affect images inside the document added by users.

Incorrect output for docx

Hey Phil,

I've try to convert docx file
test111.docx file attached

But it outputs with extra

    with <li's> in the start :( here's what I got:

    string(6865) "

    • Neuraxial Anesthesia Reduces the Need for Transfusions after THA
    • A study from Rush University Medical Center – reported at the MAOA annual meeting – also showed a reduction in the rate of complications with neuraxial versus general anesthesia.
    • The largest study of its kind has shown that patients who receive neuraxial anesthesia for total hip arthroplasty (THA) are less likely to need a blood transfusion after the procedure than patients who receive general anesthesia.
    • They were also less likely to experience complications such as deep infection and pneumonia.
    • For the study – reported last week at the annual meeting of the Mid-America Orthopaedic Association (MAOA) – researchers from Rush University Medical Center, Chicago, Illinois, evaluated data from the National Surgical Quality Improvement (NSQIP) Database, which includes prospectively collected data on perioperative laboratory, co-morbidity, and postoperative complications.
    • The objective was to compare neuraxial versus general anesthesia with regard to risk for blood transfusion and complications in more than 29,000 patients who underwent primary THAs from 2005 to 2012.
    “A growing body of literature has demonstrated an increased rate of complications with blood transfusions following primary total hip arthroplasty,” said Brett R. Levine, MD, senior author of the study. “While previous studies have demonstrated a decreased rate of complications with neuraxial anesthesia, these studies have included small numbers of patients undergoing heterogeneous orthopaedic procedures.”Large Numbers – and a SurpriseThat’s why the researchers turned to the NSQIP database. “The NSQIP database gives us access to a large volume of procedures and has been useful in identifying independent risk factors for several variables during a total joint arthroplasty,” Dr. Levine said.“In this case, we were able to review the data from almost 30,000 total hip replacements, including over 11,000 utilizing neuraxial anesthesia. The ability to evaluate such large quantities of procedures lends a greater validity to the final conclusions of our study.”

    In total, data were retrieved for 29,452 procedures, 18,032 under general anesthesia and 11,420 under neuraxial anesthesia. The large number of general anesthesia cases was an unexpected finding, but did not affect the data analysis.

    “We were surprised to discover that nearly 60% of providers in the NSQIP database utilized general anesthesia for primary total hip arthroplasty,” said Brian D. Haughom, MD, who presented the team’s data at the MAOA Annual Meeting. “Despite a trend toward increasing use of regional anesthesia, it is still not used in the majority of hip replacement cases.”

    Dr. Levine speculated that this was the choice of the anesthesiologists on these cases, as surgeons typically prefer neuraxial anesthesia. “I do think spinal anesthesia is becoming more popular, and I bet the numbers will be reversed in the next 5 years,” he said. He also thinks the numbers would be different if they were evaluating anesthesia for total knee arthroplasty, in which regional anesthesia is more commonly used. 

    Favorable Results for Neuraxial Anesthesia

    The researchers used univariate analysis to evaluate postoperative complications between the two types of anesthesia. Multivariate analysis determined independent risk factors for blood transfusion following THA.

    Surgical times were found to be shorter for patients who received neuraxial anesthesia (88.2 vs. 101.4 minutes; p<0.001), as was the length of hospital stay (3.3 vs. 3.5 days; p=0.03).

    The researchers also found that with neuraxial anesthesia, patients had lower rates of:

    Overall complications (4.1% vs. 4.8%; p=0.006)

    Medical complications (2.7 vs. 3.5%; p<0.001)

    Deep infection (0.23% vs. 0.37%; p=0.04)

    Pneumonia (0.23% vs. 0.37%; p=0.04)

    Unplanned intubation (0.16% vs. 0.29%; p=0.015)

    Ventilation over 48 hours (0.04% vs. 0.13%; p=0.03)

    Stroke (0.08% vs. 0.20%; p=0.013)Death (0.12% vs. 0.24%; p=0.025)

    What about risk factors for postoperative blood transfusion? Overall, patients who received neuraxial anesthesia had a decreased risk of postoperative transfusion (OR=0.79; CI:0.69-0.91) than patients who received general anesthesia.

    The researchers identified 3 independent risk factors for transfusion:

    Female gender (OR=1.90; CI:1.66-2.18)

    Operative time (OR=1.23 per 30 minutes; CI:1.18-1.29 )

    History of hypertension (OR=1.33; CI:1.16-1.51)

    “Our results demonstrate that neuraxial anesthesia decreased the rate of overall complications following primary total hip arthroplasty, and furthermore portended an 18% reduction in the rate of post operative blood transfusions,” said Dr. Haughom.

    “While we suspected there may be advantages to neuraxial anesthesia, it was surprising to find such a large reduction in transfusion rates based upon anesthetic type as an independent risk factor.”

    Recommendation for Practice

    Dr. Levine believes these results suggest neuraxial anesthesia should have a larger role in THA.

    “While the exact mechanism by which neuraxial anesthesia decreases the rate of complications and postoperative transfusions remains elusive, we feel that neuraxial anesthesia should be utilized whenever possible in the setting of a primary total hip arthroplasty,” he said. “By maximizing means to reduce complications and blood transfusions, it may be possible to reduce healthcare costs and readmissions in the future.“While this study highlights the impact of neuraxial anesthesia in regard to primary total hip arthroplasty, further studies are needed to elucidate the exact mechanisms leading to these findings.”

    Source

    • Poster 098: Does Neuraxial Anesthesia Decrease the Rate of Postoperative Complications and Blood Transfusions? An Analysis of 29,452 Primary Total Hip Arthroplasty Cases; presented by Brian D. Haughom, MD; William W. Schairer, MD; Michael D. Hellman, MD; Benedict U. Nwachukwu, MD; and Brett R. Levine, MD, at the Mid-Atlantic Orthopaedic Association Annual Meeting, April 22-25, 2015, Hilton Head Island, South Carolina."

      Could you please check what is wrong? Thanks.

Inline lists - Refactor needed

Currently this parser detects inline styles by exploding out "•" and converting each item into an <li>, this has caused issues with how the styles are wrapped around each item, as the BOLD_PLACEHOLDER may be wrapped around a • which can corrupt the html for that list item.

I have applied a temp-fix which fixes the symptoms of the html by removing invalid <b> / BOLD_PLACEHOLDER tags, but before I can add further inline style support the underlying issue needs to be hammered out.

Save inline text metadata

Further text styling to be saved into the parser from the document - bold / emphasized wording and potentially different colours & highlighting effects.

As this can be applied per character in word the , tags will have to be inserted directly into the output rather than simply added on within the rendering stage.

Text indentation

Row indentation is not always picked up on by the parser, I've now added basic support for this in commit:
63bfb4e

Note: Currently it only works with one level deep indents.

Code refactor

A code refactor / general cleanup has been made to make it easier to add future changes, and deal with the more deep rooted bugs & enhancements.

This will be tested for 1-2 days before being pushed into stable.

Word-art arrows & textboxes

Currently if word-art arrows & textboxes are used to create a diagram, the parser simply outputs all of the text straight into the page as a normal paragraph, and duplicates its content.

I may rewrite the parser in the future to fully support textboxes however for now I will work towards removing the duplication issue first as that is an outright bug.

Coloured text

At present this parser does not attempt to save any inline text font sizes or colours.

Link handling issues

Hi , Phil Gale, thanks for your job, I'd very like your work, I use your docx package in one of my project, but now I found an issue.

If there is a paragraph with a link in the text, after processing, the link text will sometimes be discharged to the front, sometimes the string will be truncated.

I try to solve this problem, but this is a bit difficult for me.
test.docx

Can you help me check this question?

thanks again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.