Code Monkey home page Code Monkey logo

Comments (5)

jmdavis avatar jmdavis commented on June 25, 2024

Memory management of what's being parsed is left up to the range being parsed and is not the concern of dxml. The parser will operate on any forward range of char, wchar, or dchar. I thought that the documentation was clear about that. As such, if you have a forward range of characters over a file which does not read in the entire file at once, you can parse the file without reading into all into memory (though obviously, any parts you keep around will then stay in memory).

However, if you're going to use parseDOM, then any portion of the document that it parses is going to result in memory allocations in order to build the DOM regardless of the underlying range being parsed. That's going to be true of any DOM parser since the whole point of a DOM parser is to build the document tree in memory.

from dxml.

FreeSlave avatar FreeSlave commented on June 25, 2024

I see, but do you have any preferred solution? Phobos does not seem to provide any means to represent file as a forward range without loading the whole file.
DOM of course needs to allocate some structs that represent a tree, but still uses slices of original range to hold the stored data. The point is to minimize allocations, not nullify them.

Upd: I've found MmFile can remap portions of file on demand when window argument is given. Still needs a little trickery to make it a forward range, but it might work.

from dxml.

jmdavis avatar jmdavis commented on June 25, 2024

As dxml uses the standard range mechanism, it bypasses the issue in the sense that it doesn't provide the range that reads in the file efficiently. It assumes that it already exists. Unfortunately, reading from a file efficiently is kind of the achilles heel of ranges in that every time you call save you then need that range to always be valid for as long as it exists, so in order to buffer file access, you could need to have an arbitrarily large number of buffers, and it gets complicated. It's something that needs to be solved, but Phobos has largely ignored the issue (probably because it's complicated and no one really wants to take the time to write it). It does have stuff like std.stdio's byLine or byChunk, to read lines or chunks efficiently, but that translates to a range of bytes or characters only awkwardly, because what you're really getting then is ranges of lines or chunks (and since those algorithms reuse their buffers, it gets even more complicated). Actually, properly, buffering chunks of a file and referencing-counting that with the range API to properly support save gets complicated fast, and in a lot of cases, simply reading the file in in pieces rather than as a forward range or reading it in all at once avoids the whole issue (though that obviously isn't always an option).

Personally, if I had to read a file in as a forward range and couldn't read it all in at once, I'd probably just use std.mmfile rather than trying to deal with buffering everything, since that gets really complicated. There's always Steven's https://github.com/schveiguy/iopipe, but it's still a work in progress, and as I understand it, he's had to work around the range API on some level precisely because it's so poorly suited to reading in a file efficiently, so I don't know exactly how that's going to work with the range API. I'm aware of iopipe but have to spend time studying it.

I wrote dxml the way I did so that it could work with a range that read over a file without reading it all in but without trying to actually solve that problem. By just operating on ranges, it pushes that entire problem off to ranges, which doesn't entirely solve the problem, but it does mean that as long as the problem is solved with ranges in general, it's solved for dxml.

from dxml.

JesseKPhillips avatar JesseKPhillips commented on June 25, 2024

Yeah, and byline/splitlines aren't good for parsers because they loose vital line ending information. Xml cdata section are most likely the problem for Xml.

I did a range over mmap files awhile back.
https://github.com/JesseKPhillips/libosm/blob/master/source/util/filerange.d

from dxml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.