Code Monkey home page Code Monkey logo

Comments (9)

bbandlamudi avatar bbandlamudi commented on August 18, 2024

We need to create a new URILoader combining the functionality of QueryURILoader and FileURILoader i.e., perform a query to ML and write the URIs to a file as it reads and then load it back similar to File loader. Also, we need to change the Manager class not to load all the URIs into memory - if it does, it defeats the purpose of saving memory footprint.

from corb2.

Hemalatha-S avatar Hemalatha-S commented on August 18, 2024

I have given DISK-QUEUE-MAX-IN-MEMORY-SIZE=100 and URIS-QUEUE-TEMP-DIR pointing to some local directory.
In log file, I see a backing store tmp file created in /tmp directory. But I don't find any .tmp file in /tmp dir and could not verify whether this DISK-QUEUE option is working for me.

Is there any way to check this functionality.

from corb2.

bbandlamudi avatar bbandlamudi commented on August 18, 2024

The disk queue is supported by the version of corb you are using (ex: 2.3.1+) and is only used for QueryUrisLoader (xquery selector). You should see a corresponding message in the log when disk queue is used also, you will see a temp file when number uris is larger than max. To really test it, try to keep your java heap low and selector return large enough values to get out of memory error and then turn on the disk-queue and see the problem goes away.

from corb2.

bbandlamudi avatar bbandlamudi commented on August 18, 2024

Hi Hemalatha-S,
Please register/use the email reflector http://developer.marklogic.com/mailman/listinfo/corb2 for all Corb2 related questions.

Thanks,
Bhagat

from corb2.

hansenmc avatar hansenmc commented on August 18, 2024

@Hemalatha-S did you also set the option DISK-QUEUE=true? If not, those other DISK-QUEUE options will not take effect and the URIs will use the default in-memory queue.

DISK-QUEUE Boolean value indicating whether the CoRB job should spill to disk when a maximum number of URIs have been loaded in memory, in order to control memory consumption and avoid Out of Memory exceptions for extremely large sets of URIs.

When the disk-queue is enabled, you should see a file created in the DISK-QUEUE-TEMP-DIR with the file pattern DiskQueue-backingstore-{random_number}.tmp and it will contain the sequence of URIs that have spilled to disk.

from corb2.

Hemalatha-S avatar Hemalatha-S commented on August 18, 2024

Hi,
I have set max heap size to 100Mb and enabled DISK-QUEUE=true
DISK-QUEUE-MAX-IN-MEMORY-SIZE=10
DISK-QUEUE-TEMP-DIR=/xx/xx
In log I could see "created backing store /xx/xx/DiskQueue-backingstore*" .. But I could not find the .tmp file under the directory. Am I missing something?

from corb2.

bbandlamudi avatar bbandlamudi commented on August 18, 2024

Hi Hemalatha,
It looks like since you are seeing the above message, then the code is using the disk queue. The .tmp file gets deleted after all the uris are read from it. So, the .tmp file will not be kept for long time.

Thanks,
Bhagat

from corb2.

Hemalatha-S avatar Hemalatha-S commented on August 18, 2024

Hi Bhagat,

What is the default number of URIs loaded in memory, if don't specify DISK-QUEUE-MAX-IN-MEMORY-SIZE?

If DISK-QUEUE-MAX-IN-MEMORY-SIZE=100, it would hold 100 URIs or 100MBs of URIs in memory?

Thanks,
Hema

from corb2.

hansenmc avatar hansenmc commented on August 18, 2024

@Hemalatha-S If not specified, the default in-memory queue size is 1,000. That is the maximum number of URIs will be held in-memory before spilling over to disk, not the memory footprint. If you specify 100, then it will only have up to 100 URIs in-memory while processing. As the in-memory queue is depleted, it will fetch additional URIs from the disk queue.

Depending on the size of your URIs values and the amount of memory available, you might want to adjust accordingly. A smaller set of URIs for DISK-QUEUE-MAX-IN-MEMORY-SIZE will be safer, but at the expense of performance, since there will be more overhead reading chunks of URIs from the disk queue temp file.

from corb2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.