philgale92 / docx Goto Github PK
View Code? Open in Web Editor NEWPHP Based Docx Parser
License: MIT License
PHP Based Docx Parser
License: MIT License
This library works great since it preserves the spacing, images and text.
But the problem is, images are not in their original position. Images are appearing in next line to every line of their actual position. For instance, I have four images in one single line. This line is enclosed in <p>
tag without any images. All the four images then prints after the </p>
tag.
Please fix this issue. I'm in middle of something very important.
When using tables sometimes text gets duplicated & it appears to break when one of the table cells has multiple rows inside.
Due to how tables were originally parsed, occasionally the paragraphs immediately after the table ended could be removed or placed inside the table, depending on the table structure.
The rewrite deals with table parsing with a much more sane method.
Hello Phil,
Great work you have here, it was really interesting. But what I wanted to know was can we fetch the font family and the font size and extend the known styles usage of this project.
I'm unable to parse maths equations or text with subscript/superscript..Just returns me a normal characters without any formatting.
Add support for reliable colspan & vertical cell merging.
Currently if an image is inside a table within a word document, the image generally is moved outside of the table all together.
I cloned the repository earlier today, pointed MAMP at the right directory, and tried to load http://localhost/docx. Chrome tells me "The localhost page isn't working."
MAMP's php_error.log says:
PHP Fatal error: 'continue' not in the 'loop' or 'switch' context in /Users/#####/docx/Docx/Node.class.php on line 553
Node.class.php has the following code on line 553:
if (!isset($prevElement->nodeName)) continue;
Re-writing the previous stages to work with version 4.
Hello,
We have came across this script, after an audit, we have confirmed that it's vulnerable to an XXE (https://www.owasp.org/index.php/XML_External_Entity_%28XXE%29_Processing).
Refactored the htmlEntities & how the resulting html is generated for each text run. Thus removing a huge amount of over-complexity in escaping text.
Please use this test file: http://www.haiwaitou.cn/test/a.docx
The library works perfectly when I upload files which have a small size, but it fails to parse files that are larger in size.
For example, asume that the docx file has only text/characters and tables (black and white, no colour), it works if the file has around 40 such pages (size approx. 50 KB), but fails to parse files with more number of pages and a greater size that.
hello phil, thanks for your job , everything is ok to used your packge, but if my docx file has N pages, how can I do a page to read;
I try to solve this problem, but it is difficult for me, Can you help me check this question?
thanks again.
Currently hyperlinked text is ignored.
A link parser is now in progress, as the link could make up a single word within a line of text the linked word(s)'s have to be wrapped directly with the link html (depending on $this->convertPlaceholders it will either be wrapped with an <a> tag, or PLACEHOLDER_* brackets)
Sometimes corrupt images are encoded and saved into the parsed data. I expect this is from word art or some other form of docx proprietary data.
This bug does not affect images inside the document added by users.
Hey Phil,
I've try to convert docx file
test111.docx file attached
But it outputs with extra
string(6865) "
In total, data were retrieved for 29,452 procedures, 18,032 under general anesthesia and 11,420 under neuraxial anesthesia. The large number of general anesthesia cases was an unexpected finding, but did not affect the data analysis.
“We were surprised to discover that nearly 60% of providers in the NSQIP database utilized general anesthesia for primary total hip arthroplasty,” said Brian D. Haughom, MD, who presented the team’s data at the MAOA Annual Meeting. “Despite a trend toward increasing use of regional anesthesia, it is still not used in the majority of hip replacement cases.”
Dr. Levine speculated that this was the choice of the anesthesiologists on these cases, as surgeons typically prefer neuraxial anesthesia. “I do think spinal anesthesia is becoming more popular, and I bet the numbers will be reversed in the next 5 years,” he said. He also thinks the numbers would be different if they were evaluating anesthesia for total knee arthroplasty, in which regional anesthesia is more commonly used.
Favorable Results for Neuraxial Anesthesia
The researchers used univariate analysis to evaluate postoperative complications between the two types of anesthesia. Multivariate analysis determined independent risk factors for blood transfusion following THA.
Surgical times were found to be shorter for patients who received neuraxial anesthesia (88.2 vs. 101.4 minutes; p<0.001), as was the length of hospital stay (3.3 vs. 3.5 days; p=0.03).
The researchers also found that with neuraxial anesthesia, patients had lower rates of:
Overall complications (4.1% vs. 4.8%; p=0.006)
Medical complications (2.7 vs. 3.5%; p<0.001)
Deep infection (0.23% vs. 0.37%; p=0.04)
Pneumonia (0.23% vs. 0.37%; p=0.04)
Unplanned intubation (0.16% vs. 0.29%; p=0.015)
Ventilation over 48 hours (0.04% vs. 0.13%; p=0.03)
Stroke (0.08% vs. 0.20%; p=0.013)Death (0.12% vs. 0.24%; p=0.025)
What about risk factors for postoperative blood transfusion? Overall, patients who received neuraxial anesthesia had a decreased risk of postoperative transfusion (OR=0.79; CI:0.69-0.91) than patients who received general anesthesia.
The researchers identified 3 independent risk factors for transfusion:
Female gender (OR=1.90; CI:1.66-2.18)
Operative time (OR=1.23 per 30 minutes; CI:1.18-1.29 )
History of hypertension (OR=1.33; CI:1.16-1.51)
“Our results demonstrate that neuraxial anesthesia decreased the rate of overall complications following primary total hip arthroplasty, and furthermore portended an 18% reduction in the rate of post operative blood transfusions,” said Dr. Haughom.
“While we suspected there may be advantages to neuraxial anesthesia, it was surprising to find such a large reduction in transfusion rates based upon anesthetic type as an independent risk factor.”
Recommendation for Practice
Dr. Levine believes these results suggest neuraxial anesthesia should have a larger role in THA.
“While the exact mechanism by which neuraxial anesthesia decreases the rate of complications and postoperative transfusions remains elusive, we feel that neuraxial anesthesia should be utilized whenever possible in the setting of a primary total hip arthroplasty,” he said. “By maximizing means to reduce complications and blood transfusions, it may be possible to reduce healthcare costs and readmissions in the future.“While this study highlights the impact of neuraxial anesthesia in regard to primary total hip arthroplasty, further studies are needed to elucidate the exact mechanisms leading to these findings.”Source
Could you please check what is wrong? Thanks.
Currently this parser detects inline styles by exploding out "•" and converting each item into an <li>, this has caused issues with how the styles are wrapped around each item, as the BOLD_PLACEHOLDER may be wrapped around a • which can corrupt the html for that list item.
I have applied a temp-fix which fixes the symptoms of the html by removing invalid <b> / BOLD_PLACEHOLDER tags, but before I can add further inline style support the underlying issue needs to be hammered out.
Further text styling to be saved into the parser from the document - bold / emphasized wording and potentially different colours & highlighting effects.
As this can be applied per character in word the , tags will have to be inserted directly into the output rather than simply added on within the rendering stage.
Row indentation is not always picked up on by the parser, I've now added basic support for this in commit:
63bfb4e
Note: Currently it only works with one level deep indents.
A code refactor / general cleanup has been made to make it easier to add future changes, and deal with the more deep rooted bugs & enhancements.
This will be tested for 1-2 days before being pushed into stable.
Occasionally the text style name is not saved along side the text, due to the ::_getArray(); method breaking.
List parsing is breaking the html due to missing li / ul tags or the tags being in the wrong positions
Currently if word-art arrows & textboxes are used to create a diagram, the parser simply outputs all of the text straight into the page as a normal paragraph, and duplicates its content.
I may rewrite the parser in the future to fully support textboxes however for now I will work towards removing the duplication issue first as that is an outright bug.
At present this parser does not attempt to save any inline text font sizes or colours.
Hi , Phil Gale, thanks for your job, I'd very like your work, I use your docx package in one of my project, but now I found an issue.
If there is a paragraph with a link in the text, after processing, the link text will sometimes be discharged to the front, sometimes the string will be truncated.
I try to solve this problem, but this is a bit difficult for me.
test.docx
Can you help me check this question?
thanks again.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.