Code Monkey home page Code Monkey logo

Comments (4)

Pramodnagarajarao avatar Pramodnagarajarao commented on July 30, 2024

I am getting the same error if I try to read the entire file at once. If being read in chunks (Say few rows of data), I can read the data. Any pointers on how to resolve the issue?

Error as seen in the log window (I have miniconda3 installed with Python 3.4)

py4j.protocol.Py4JJavaError: An error occurred while calling z:com.continuumio.seqreaderapp.SequenceReader.head.
: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:68)
at java.lang.StringBuilder.(StringBuilder.java:89)
at org.apache.nutch.crawl.CrawlDatum.toString(CrawlDatum.java:408)
at com.continuumio.seqreaderapp.SequenceReader.head(SequenceReader.java:73)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)

from nutchpy.

karanjeets avatar karanjeets commented on July 30, 2024

Hi @Pramodnagarajarao
As a work around may be you can create a script to read over segments (instead of one crawldb data file) iteratively.

from nutchpy.

Pramodnagarajarao avatar Pramodnagarajarao commented on July 30, 2024

Thanks @karanjeets.
You mean to say that we need to read from segment directory's associated data file and not crawldb data file with sequence_reader.read() methods intact? If yes, I have tried that too. Didn't succeed.

from nutchpy.

thammegowda avatar thammegowda commented on July 30, 2024

@karanjeets @Pramodnagarajarao

Theres is a better way, using stream/iterator reader: example

from nutchpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.