Code Monkey home page Code Monkey logo

Comments (10)

wfang avatar wfang commented on May 27, 2024 1

Hi sbeamer,

Sorry for the delay in reply.
I revisited the issue and found out it is caused by the downloading issue. I downloaded again with michaelsutton's suggestion and the new file works fine now.

Both the two files are having the same size: 6475352982.
The md5sum for the new file is:
c31b4c2d6f3ae325e516e78b499c46f8 twitter_rv.tar.gz
The oldfile I downloaded before is:
88a1991d2a24ad50c99062864880f89f ./twitter_rv.tar.gz

I think this is un-related to the software itself.
Thanks a lot.

from gapbs.

wfang avatar wfang commented on May 27, 2024

Just realise that the error is due to my change the type size to 64 bit in the benchmark.h file.
typedef int64_t NodeID;
typedef int64_t WeightT;
If I change it back to 32 bit, the error goes away.
typedef int32_t NodeID;
typedef int32_t WeightT;

Not sure why it doesn't work for 64 bit though.

from gapbs.

sbeamer avatar sbeamer commented on May 27, 2024

Thank you for reporting this!

It would be great to improve support for 64-bit IDs. How much memory does your system have? Running out of memory seems like the most probable cause (http://en.cppreference.com/w/cpp/memory/new/bad_array_new_length).

from gapbs.

wfang avatar wfang commented on May 27, 2024

Hi sbeamer,
Thanks for the reply.
The machine has 1TB memory. I also tested it in another 3TB machine, it failed too for the 64bit.

from gapbs.

sbeamer avatar sbeamer commented on May 27, 2024

I just built converter with NodeID set to int64_t, and I was unable to reproduce this error when converting twitter. My console output:

Read Time:           222.30987
Build Time:          13.85658
Graph has 61578415 nodes and 1468364884 directed edges for degree: 23
serialized graphs only allowed for 32b IDs

The program will (correctly) exit and not write out the .sg file since the binary encoding expects the 32-bit word width, but it will read the edge list in and build the graph before the exception.

Looking at the code more, I'm not sure if I could imagine how the error could occur within ReadInEL. Your console output prints the read time, which implies ReadInEL completed. Could you double check with gdb where the error occurs. If you aren't running out of memory, perhaps the code is somehow asking for a pvector with a negative size, but I am not sure how expanding to int64_t could cause that.

Did you change anything other than NodeID and WeightT?

from gapbs.

wfang avatar wfang commented on May 27, 2024

Nothing was changed expect NodeID and WeightT.

I think I found out the reason for the error: in my file there is an graph vertex of 4099117456045389565.

grep "4099117456045389565" benchmark/graphs/raw/twitter_rv.net 
12819	4099117456045389565

This will cause issue with FindMaxNodeID in builder.h . The max_seen value will be 4099117456045389565, causing my error.

from gapbs.

sbeamer avatar sbeamer commented on May 27, 2024

Thank you for debugging this further!

I don't seem to have that line in the twitter_rv.net I downloaded from KAIST. I re-downloaded it last week trying to re-create this issue, and that file too appears to not contain it. Can you compare the MD5 hashes for your files with the ones they provide? I would check both the .tar.gz and the .net. My files do match the MD5 hashes.

If somehow your hashes do match, can you re-run grep with -n and let me know the line number for that crazy line? Thanks!

from gapbs.

sbeamer avatar sbeamer commented on May 27, 2024

@wfang Did you get a chance to check on this?

from gapbs.

michaelsutton avatar michaelsutton commented on May 27, 2024

Don't know if it's related, but I had trouble downloading the twitter graph correctly as well.
I ended up using curl instead of wget, and that fixed all the issues. Also, the download became blazing fast (minutes rather than hours)
I used the following line to download:
curl 'https://an.kaist.ac.kr/~haewoon/release/twitter_social_graph/twitter_rv.tar.gz' -H 'Connection: keep-alive' --compressed -o twitter_rv.tar.gz

from gapbs.

sbeamer avatar sbeamer commented on May 27, 2024

Thanks for the suggestion @michaelsutton! On my system, curl (with your arguments) still took hours. I'd be happy to merge a PR for speeding up the Twitter download, since that the slow speed is a common complaint.

from gapbs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.