Code Monkey home page Code Monkey logo

ligra's People

Contributors

grtkachenko avatar jshun avatar ldhulipala avatar markcjeffrey avatar ogreen avatar shivankgarg98 avatar wenluhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ligra's Issues

Does BFS cannot traverse all vertices?

Does BFS cannot traverse all vertices?

while(!Frontier.isEmpty()){ //loop until frontier is empty
vertexSubset output = edgeMap(GA, Frontier, BFS_F(Parents));
Frontier.del();
Frontier = output; //set new frontier
}

Adding the `inline` qualifier to the functions in header files

Lack of header file/ implementation file is bad enough (compilation takes a lot longer, not to mention that C++ wasn't designed that way), but lack of inline qualifiers in the header files, means that there would be redefinitions of all functions as soon as there is need to link two files that include ligra.h.

memory used issue

I think your work great and I am very interested in it.
I get the code and when I try to run some algorithm in a graph larger than the naive example, I find it use much more memory than the origin graph expression,which is simplely each line. I think it is strange.
As I get into the source code, a struct named Word catch my eye. through the /proc/pid/status I find that the form of this struct take more than half memory of the program. And I find some comment in ligra/IO.h#220 (sorry I dont know how to give the link) that cancel the destruct function.
I am confused about these:
Why it takes so much memory to compute in ligra even in symmetric graph?
why the destruct func commented ?
Thanks.

char* getOptionValue(string option) called instead of bool getOption(string option)

Going through the code, I wondered in the ligra.h file why the lines 475 - 478 call the function char* getOptionValue(string option) and not bool getOption(string option) and why this works. After a while I undestood this works because the last argument is always the file, thus after every option is always another argument. I don't think this is wanted, because the function bool getOption(string option) exists, too.

Are the implementations in `apps/` multicore?

I tried to run a couple of them (BFS, CC, PR etc.) on large graphs and I see only one core at 100% with htop. Are those implementations in apps/ multicore? If so, how do I make it run on multiple cores?

SNAPtoAdj converter issue

I am using the wikipedia-ru graph, which is in the SNAP format to create the Ligra supported Adjacency graph format.
But, it is segfaulting for me. I am using g++ compiler with -fopenmp flag.

Please, let me know where I am going wrong.

Thanks.
-Nibedita

How is graph represented/ stored for running algorithms here?

I wanted to know how is the Graph is actually represented in memory for some algorithms written here in this framework. I wanted to run some algorithms for my school project where the input to the algorithm is a graph.

I can see the input data provided here but I can't conclude the representation of the Graph(Although, it's written 'Adjacency Graph' on the top, but is it 'Adjacency List' or what?). It seems like just vertices of the graph is provided in the input file, but how do I form edges in the Graph?

In the docs here I see that:

One of the main goals of Ligra is to abstract away implementation specific details about how the graph is represented (Is the graph compressed/uncompressed? Is it directed/undirected?)

But still, I wanted to pass my own graph input which is represented as Adjacency List? Can you please tell how is the Graph formed from the data given below? How can I form Adjacency List out of it?

AdjacencyGraph
128
708
0
8
8
17
24
28
28
33
37
.
.
.

These are the first few lines of the input file written here.

Or can you please tell how to pass our own graph (represented as Adjacency List) as the input to the algorithms written here.

Example: How do I pass the below graph in the input?

         1
        /  \
       2    3
      / \  / \
     4  5,6   7
      \ | | /
         8

If I have to represent the above graph in memory then I would write the file something like this:

(source node, destination node)
1, 2
1, 3
2, 4
2, 5
3, 6
3, 7
4, 8
5, 8
6, 8
7, 8

So how can I pass this representation(or modify this representation) to work with the algorithms written here.

Segfault using SNAPtoAdj on Twitter graph

I am trying to convert the Twitter graph to the adjacency list format but it always seems to end in a segfault.

I tried debugging with gdb and it seems to be happening in theutils/graphIO.hpp file when the words are being converted into edges.

Any help would be greatly appreciated.

undirected graph

If I have an undirected graph, but I did not put -s in the command line, will the result be affected? Based on my result and ligra.h file, I think no, but just want to double check. Many thanks!

RandLocalGraph seg faults when n > m

How to produce :

$./utils/randLocalGraph -s -m 100 -n 101 graph_file.txt
Segmentation fault (core dumped)

Does randLocalGraph not permit to generate graph where n > m ? (it works for n <= m)

( I am trying to generate graph with many components, any help on that will be great)
Thanks !

SNAPtoAdj segmentation fault

I compiled SNAPtoAdj with Cilk enabled, and the tool works well for all my graphs, except for friendster.
It ends up with segmentation fault, if I tell the tool to convert the friendster graph into a symmetric one.
The command is like

./SNAPtoAdj -s friendster.txt friendster_J

wghSNAPtoAdj has similar issue as well.

I tried different compilers, icpc and g++ 5.4.0.
When compiling, it prints out the following commands respectively.

icpc -std=c++14 -O3 -DCILKP   -o SNAPtoAdj SNAPtoAdj.C

or

g++ -std=c++14 -fcilkplus -lcilkrts -O2 -DCILK   -o SNAPtoAdj SNAPtoAdj.C

However, the executable files of both failed on friendster.

Could someone give me any idea what is going wrong?
Thanks in advance!

Some applications get a segfault when compiled with -O0

Not sure if this is a known issue - {Bfs, bfs-bitvector, and Components} get a segfault when compiled with -O0 -g. These applications don't crash when compiled with -O2. The applications seem to fail consistently for different input graphs.

System Info - debian stretch, g++-6.3, using CILK for parallelization

Update
Made the following observations on further testing:

  1. The problem persists even when compiled without CILK (or any other parallelization option)
  2. All applications in which edgeMap outputs a vertexSubset seem to be affected. The no_output edgeMap applications are not affected

Backtrace of segfault

Thread 1 "BFS-Bitvector" received signal SIGSEGV, Segmentation fault.
0x0000555555577b3c in std::_Tuple_impl<0ul, unsigned int, pbbs::empty>::operator=(std::_Tuple_impl<0ul, unsigned int, pbbs::empty>&&) (this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f0d>)
at /usr/include/c++/6/tuple:299
299 _M_head(this) = std::forward<_Head>(_M_head(__in));
(gdb) bt
#0 0x0000555555577b3c in std::_Tuple_impl<0ul, unsigned int, pbbs::empty>::operator=(std::_Tuple_impl<0ul, unsigned int, pbbs::empty>&&) (
this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f0d>)
at /usr/include/c++/6/tuple:299
#1 0x000055555557274e in std::tuple<unsigned int, pbbs::empty>::operator=(std::tuple<unsigned int, pbbs::empty>&&) (this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f23>)
at /usr/include/c++/6/tuple:1178
#2 0x000055555556eadd in auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>
)::{lambda(unsigned int, unsigned int, bool)#1}::operator()(unsigned int, unsigned int, bool) const (__closure=0x7fffffffdbb0, ngh=1, offset=0, m=true) at edgeMap_utils.h:47
#3 0x000055555557a28b in decode_uncompressed::decodeOutNghSparse<asymmetricVertex, BFS_F, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}>(asymmetricVertex, long, unsigned int, BFS_F&, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}&) (v=0x7fffffffd8e0, i=0, o=0, f=..., g=...)
at vertex.h:66
#4 0x0000555555574a2f in asymmetricVertex::decodeOutNghSparse<BFS_F, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>
)::{lambda(unsigned int, unsigned int, bool)#1}>(long, unsigned int, BFS_F&, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>*)::{lambda(unsigned int, unsigned int, bool)#1}&) (this=0x7fffffffd8e0, i=0, o=0, f=..., g=...) at vertex.h:337
#5 0x0000555555563297 in edgeMapSparse<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> () at ligra.h:125
#6 0x00007ffff7bc43b1 in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#7 0x00007ffff7bc4d4b in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#8 0x00007ffff7bc50fe in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#9 0x0000555555571485 in edgeMapSparse<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=...,
frontierVertices=0x55563c61b720, indices=..., degrees=0x55555579c6e0, m=1, f=..., fl=0) at ligra.h:122
#10 0x000055555556ca46 in edgeMapData<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=..., vs=..., f=...,
threshold=23149703, fl=@0x7fffffffdf0c: 0) at ligra.h:266
#11 0x000055555556a506 in edgeMap<asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=..., vs=..., f=..., threshold=-1,
fl=@0x7fffffffdf0c: 0) at ligra.h:276
#12 0x0000555555566d63 in Compute (GA=..., P=...) at BFS-Bitvector.C:70
#13 0x00005555555588f4 in main (argc=2, argv=0x7fffffffe448) at ligra.h:510

Could sweep cut result be returned?

could I also get all the members of returned set S? I want to know what are the index of nodes that belong to the local community of the certain input node. Many thanks!

Betweenness output

Hi, how do I view the computed betweenness values for each vertex in BC?

BFS, Components and Pagerank fail for large compressed weighted graphs.

I have a bunch of weighted/unweighted graphs in .adj format and I'm trying to run BFS, CC and Pagerank on them. Since, BFS doesn't directly run weighted graphs, I used the ./encode to compress the graphs and then run BFS on it. I'm getting segmentation faults when I run it with compressed weighted graphs, but they run fine when I run with compressed/uncompressed unweighted graphs.

To reproduce:

  1. Download CTU-46.adj from here: CTU-46 and download CTu-uw-46.adj from here: CTu-uw-46

  2. Compress the matrix:

./encoder -w CTU-46.adj out46-w.compressed
./encoder CTu-uw-46.adj out46.compressed
  1. Run BFS/Pagerank/Components

Weighted

./BFS -c out46-w.compressed          
size = 1396572
n = 41474 m = 85951 totalSpace = 369766
reading file...
inTotalSpace = 363182
creating graph...
[1]    15186 segmentation fault  ./BFS -c out46-w.compressed

Unweighted

./BFS -c out46-uw.compressed 
size = 994259
n = 41474 m = 85951 totalSpace = 164219
reading file...
inTotalSpace = 166416
creating graph...
Running time : 0.00235
Running time : 0.00234
Running time : 0.00164

getting rid of the -w flag for encoder program

The header of the generated files perfectly well describes that a specific graph is weighted or that it is unweighted. I suggest that using the encoder with the -w flag is offloading the responsibility onto the user.

However, if the user determines that a graph is weighted, by looking at the first line of the input graph, then so can the program. Otherwise, a user will only have to guess, and the program (given it exits abnormally) can again try interpreting the graph as weighted.

More Descriptive name than Adjacency Graph

Adjacency graph provides the largest amount of words with minimum information. The word that would best describe the format would be adjacency list. According to #21 the internal storage type is Compressed Sparse Row, but you'd never guess that from the readMe.

PagerankDelta correctness

It looks like you are discarding the residuals of all vertices every round, as opposed to only resetting residuals that are greater than the activation threshold. As a consequence, a vertex that takes more than one round to accumulate enough residual error to become active again, stays inactive, and the algorithm converges in less iterations than it should.

Maybe something like:

struct PR_Vertex_Reset {
  double* nghSum;
  PR_Vertex_Reset(double* _nghSum) :
    nghSum(_nghSum) {}
  inline bool operator () (uintE i) {
    if (nghSum[i] > epsilon2) nghSum[i] = 0.0; //reset only if we crossed threshold
    return 1;
  }
};

Please let me know if I am misunderstanding something here.

Error when make -j

[clong@shamrock apps]$ make -j
g++ -std=c++14 -O3 -DBYTERLE -o encoder encoder.C
g++ -std=c++14 -O3 -DBYTERLE -o decoder decoder.C
g++ -std=c++14 -O3 -DBYTERLE -o BFS BFS.C
g++ -std=c++14 -O3 -DBYTERLE -o BC BC.C
g++ -std=c++14 -O3 -DBYTERLE -o BellmanFord BellmanFord.C
g++ -std=c++14 -O3 -DBYTERLE -o Components Components.C
g++: error: unrecognized command line option ‘-std=c++14’
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [encoder] Error 1
make: *** Waiting for unfinished jobs....
make: *** [decoder] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BFS] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BC] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BellmanFord] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [Components] Error 1

-s flag issue

I downloaded the snap file soc-LiveJournal1
Converted it using SNAPtoAdj in /utils

$ ./SNAPtoAdj soc-LiveJournal1.txt lig_socLJ1

Then I tried to Run BFS on this data in the /apps folder
This was the output

$ ./BFS lig_socLJ1
Segmentation fault(core dumped)

However, when I used the -s flag

$ ./BFS -s lig_socLJ1
Running time : 0.571
Running time : 0.518
Running time : 0.511

As you can see, I did not use the SNAPtoAdj utility with the -s flag.
I was wondering, where I could be wrong. Please, help me out.

What if I delete rounds and just run Compute one time?

I noticed "long rounds = P.getOptionLongValue("-rounds",3);" in ligra.h file. And I am finding the same result is calculated four times, which seems taking more time.

I reduced rounds to 1, does that mean actually setting rounds to 1 will take even longer time in terms of parallel computing? I do not understand why we need to compute the same thing for four times.

Also , if I deleted the for loops, and replace it with below code

graph<compressedSymmetricVertex> G =
readCompressedGraph<compressedSymmetricVertex>(iFile,symmetric); //symmetric graph
Compute(G,P);
G.del();

And I got the result:

size = 13233762
n = 7164775534703567937 m = 3677866877581936505 totalSpace = 3904913592795935540
reading file...
Segmentation fault: 11

I am thinking maybe it is because my input data file is too large, and some memory problem occur. But I am still confused, what is the best strategy here. I am running local graph clustering to get PPR vector for each node of a graph of 343299 nodes. So I also want to make sure that I choose an approach that saves time. Thanks!

[edgeMap] segfault in sparse mode when graph has less than 20 edges

This is on master branch. Steps to replicate:

  1. Change BellmanFord.C to use no_dense instead of dense_forward, but don't change the initial threshold of G.m/20 here:

vertexSubset output = edgeMap(GA, Frontier, BF_F(ShortestPathLen,Visited), GA.m/20, dense_forward);

  1. Use a small graph with less than 20 edges as input, get segfault.

Why:
Step 1 passes a threshold of 0 to edgeMap. This is not a problem when in dense mode, but when in sparse mode, it leads to activating a code path that skips computation of out degrees here:

if(threshold > 0) { //compute sum of out-degrees if threshold > 0

Later edgeMap tries to deference a nullptr by attempting to run a prefix-sum on the offsets array, which is pointing to degrees (now null because of above):

uintT* offsets = degrees;

In the example above, threshold was set to 0 by the application. However, edgeMap can also end up setting it 0 here:

if(threshold == -1) threshold = numEdges/20; //default threshold

(A threshold of 0 again is not a problem when in dense mode, but skips degrees computation when in sparse mode)

A potential fix on edgeMap (instead of on the application) would be to do this, which is what I do locally (UPDATE: changed it so that we don't accidentally overwrite user-specified threshold):

  if (threshold == -1) {
    if (numEdges > 20) {
      threshold = numEdges / 20;  // default threshold
    } else {
      // num edges is too small, use 1 as a threshold instead
      threshold = 1;
    }
  } 

Julian, let me know if you're ok with this and I'll send it over.

Feature request: Conversion to and from standard Graph Formats

Being able to convert to and from Graphviz (or any other format, for that matter) would allow for this freamework to be somewhat more useable than just for benchmarking. Right now, for use with anything but the included generator I need to write my own parser, which defeats the purpose of this being a framework.

Feature Request: Consistent comments in header of file.

Is it possible to have a header describing what each file does and which problem does it solve? For example, I only know that Bellman Ford solves the Single source shortest path, because I've implemented Bellman-Ford.

Docstrings for functions would be nice. Right now it's near impossible to tell which function does what and what its return type is. E.g. I don't know where or how to find the result of the Bellman Ford computation. For all I know it might not even compute the Shortest Path correctly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.