Comments (5)
Yes, I think so - I don't certainly know of an XML parser written in Fortran that supports character references to unicode characters.
from fox.
Could you post the Fortran code you are using along with the (X)HTML document (here or elsewhere)?
from fox.
program xml_mini
use FoX_dom
use FoX_sax
implicit none
integer :: i
type(Node), pointer :: doc => null()
type(Node), pointer :: p1 => null()
type(Node), pointer :: p2 => null()
type(NodeList), pointer :: pointList => null()
character(len=100) :: name
doc => parseFile("file.xml")
if(.not. associated(doc)) stop "error doc"
p1 => item(getElementsByTagName(doc, "Students"), 0)
if(.not. associated(p1)) stop "error p1"
write(*,*) getNodeName(p1)
pointList => getElementsByTagname(p1, "Student")
write(*,*) getLength(pointList), "Student elements"
do i = 0, getLength(pointList) - 1
p2 => item(pointList, i)
call extractDataAttribute(p2, "Name", name)
write(*,*) "number ", i," name = ", name
enddo
call destroy(doc)
end program xml_mini
file.xml
<Students>
<Student Name="April" Gender="F" DateOfBirth="1989-01-02" />
<Student Name="Bob" Gender="M" DateOfBirth="1990-03-04" />
<Student Name="Chad" Gender="M" DateOfBirth="1991-05-06" />
<Student Name="Dave" Gender="M" DateOfBirth="1992-07-08">
<Pet Type="dog" Name="Rover" />
</Student>
<Student DateOfBirth="1993-09-10" Gender="F" Name="Émily" />
</Students>
output
./xml_mini.x
Students
5 Student elements
number 0 name = April
number 1 name = Bob
number 2 name = Chad
number 3 name = Dave
number 4 name = &mily
from fox.
I've now had a chance to take a proper look at this. I'm afraid the way FoX is set up (and, in particular, the way the SAX parser works) makes it impossible to 'smuggle' a non-ascii character in and out of the DOM as a character reference. The main problem is that tokenisation of the document involves converting character references into their ascii representation and putting the result into an array of Fortran characters.
If É
is included in text (between element tags) the SAX parser gives an error apologising that it "cannot digest" the character reference. This is the intended behaviour. When using the DOM you just end up with a "parsing failed" error, but this is ultimately the same error. I think it's a bug that you don't see this error when the character reference is part of an attribute value. This should probably be fixed...
To properly fix this would involve finally making the upgrade to allow FoX to handle unicode. Those arrays of fortran characters would need replacing with integer arrays of unicode code points, and the reading and writing sorted out (Toby White once figured out this bit, it is possible in modern Fortran).
I think any quick fix to try to avoid the problem by storing the character reference is going to be very messy and involve surgery to the SAX parser and, I think, modifications to the DOM code. I really wouldn't want to go down that road.
from fox.
Dear Andrew,
thank you for the answer. I am aware of problematics of unicode characters in Fortran but the ability to hande (or ignore) extended special XHTML characters is rather a (basic and expected) feature of any modern xml parser ( this example comes from the http://rosettacode.org/wiki/XML/Input#C and most of parsers used there support such characters trafo). So, without changing the mentioned input file, the only option for fortran now is writing interface and using LIBXML2 ?
from fox.
Related Issues (20)
- Add more of the tests to ctest and make use of a CI server HOT 1
- cannot compile with gfortran earlier than 4.6 HOT 1
- First bullet point in LICENSE should not be a bullet HOT 1
- Found differences between the code in github and the official(?) site HOT 1
- Tests with ifort HOT 3
- Tests with PGI HOT 3
- Compilation issues with recent gfortran versions (6.4.1 and 7.3.1) HOT 4
- Documentation for MPI etc
- cmake doesn't install anything HOT 1
- Compilation error while linking fox_sax as a library HOT 2
- Compilation issue with vfproj HOT 1
- How to compile the FoX with -fPIC flag? HOT 2
- support additional architectures with autoconf HOT 9
- Test wider range of compilers in CI HOT 1
- User code does not compile with pgi 20.4 for dom (generic procedure empty in destroy) HOT 2
- could not initialise potential from xml_label HOT 2
- Build using configure and make breaks for spaces in the path HOT 4
- Python module has some python2 syntax needs to be updated to python3
- Fatal Error: Cannot open module file 'fox_m_fsys_count_parse_input.mod' for reading at (1): No such file or directory
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fox.