jshun / ligra Goto Github PK

View Code? Open in Web Editor NEW

447.0 447.0 162.0 11.12 MB

Ligra: A Lightweight Graph Processing Framework for Shared Memory

Home Page: http://jshun.github.io/ligra

License: MIT License

C 47.90% Makefile 1.17% C++ 50.93%

ligra's People

Contributors

Stargazers

Watchers

Forkers

fruitfly1026 grtkachenko mikerainey deepsea-inria hvdieren mkmojo whqch flavioc ogreen yunmingzhang17 datachand arkadiuszsz hpc-projects yingfeizhang ldhulipala mindis codeaudit cequencer muntasirraihan maqayum tkaramp schevalier datourat graphprocessor jcarreira massemibrahim neoblizz foxjas ekiwi denizokt daghlny a419308144 faintfay alchem-lab yunzhan2014 jfuentess caomw runhuiwang amaya382 sgk98 yaobaiwei markcjeffrey rackingroll zyz915 johnpzh cuiopen yangdan4 bisheshk ytinifni-zc vin01188 maxwell-ying reiisky siegfriedgreg francktcheng efode-r2d2 appetrosyan siddharthrai speeeday hookk e-neo leodestiny utonat arithmetica desert0616 rtvt123 frankszn bambowu songhtdo tonythomascn yuhao-su abhilasj czhu95 willzma agyaglikci djuanbei tnphuong zizai wlm328cs gumi-presentation-by-dzh guben omodolapovictorb mitu626 benjaminwinger mattagostini yukewang96 geraldofojunior meng-sun tide999 akatsukis donghl17 soietre shang2014 ab-10 marufdsi biqar shivankgarg98 lusu2004 michaelyhuang23 ailiop yukuncn

ligra's Issues

Does BFS cannot traverse all vertices？

while(!Frontier.isEmpty()){ //loop until frontier is empty
vertexSubset output = edgeMap(GA, Frontier, BFS_F(Parents));
Frontier.del();
Frontier = output; //set new frontier
}

Adding the `inline` qualifier to the functions in header files

Lack of header file/ implementation file is bad enough (compilation takes a lot longer, not to mention that C++ wasn't designed that way), but lack of inline qualifiers in the header files, means that there would be redefinitions of all functions as soon as there is need to link two files that include ligra.h.

memory used issue

I think your work great and I am very interested in it.
I get the code and when I try to run some algorithm in a graph larger than the naive example, I find it use much more memory than the origin graph expression,which is simplely each line. I think it is strange.
As I get into the source code, a struct named Word catch my eye. through the /proc/pid/status I find that the form of this struct take more than half memory of the program. And I find some comment in ligra/IO.h#220 (sorry I dont know how to give the link) that cancel the destruct function.
I am confused about these:
Why it takes so much memory to compute in ligra even in symmetric graph?
why the destruct func commented ?
Thanks.

char* getOptionValue(string option) called instead of bool getOption(string option)

Going through the code, I wondered in the ligra.h file why the lines 475 - 478 call the function char* getOptionValue(string option) and not bool getOption(string option) and why this works. After a while I undestood this works because the last argument is always the file, thus after every option is always another argument. I don't think this is wanted, because the function bool getOption(string option) exists, too.

Are the implementations in `apps/` multicore?

I tried to run a couple of them (BFS, CC, PR etc.) on large graphs and I see only one core at 100% with htop. Are those implementations in apps/ multicore? If so, how do I make it run on multiple cores?

How to change the number of threads on the command line？

SNAPtoAdj converter issue

I am using the wikipedia-ru graph, which is in the SNAP format to create the Ligra supported Adjacency graph format.
But, it is segfaulting for me. I am using g++ compiler with -fopenmp flag.

Please, let me know where I am going wrong.

Thanks.
-Nibedita

How is graph represented/ stored for running algorithms here?

I wanted to know how is the Graph is actually represented in memory for some algorithms written here in this framework. I wanted to run some algorithms for my school project where the input to the algorithm is a graph.

I can see the input data provided here but I can't conclude the representation of the Graph(Although, it's written 'Adjacency Graph' on the top, but is it 'Adjacency List' or what?). It seems like just vertices of the graph is provided in the input file, but how do I form edges in the Graph?

In the docs here I see that:

One of the main goals of Ligra is to abstract away implementation specific details about how the graph is represented (Is the graph compressed/uncompressed? Is it directed/undirected?)

But still, I wanted to pass my own graph input which is represented as Adjacency List? Can you please tell how is the Graph formed from the data given below? How can I form Adjacency List out of it?

AdjacencyGraph
128
708
0
8
8
17
24
28
28
33
37
.
.
.

These are the first few lines of the input file written here.

Or can you please tell how to pass our own graph (represented as Adjacency List) as the input to the algorithms written here.

Example: How do I pass the below graph in the input?

If I have to represent the above graph in memory then I would write the file something like this:

(source node, destination node)
1, 2
1, 3
2, 4
2, 5
3, 6
3, 7
4, 8
5, 8
6, 8
7, 8

So how can I pass this representation(or modify this representation) to work with the algorithms written here.

Segfault using SNAPtoAdj on Twitter graph

I am trying to convert the Twitter graph to the adjacency list format but it always seems to end in a segfault.

I tried debugging with gdb and it seems to be happening in theutils/graphIO.hpp file when the words are being converted into edges.

Any help would be greatly appreciated.

undirected graph

If I have an undirected graph, but I did not put -s in the command line, will the result be affected? Based on my result and ligra.h file, I think no, but just want to double check. Many thanks!

Question on CC (connected components)

Below is the edgeMap function in https://github.com/jshun/ligra/blob/master/apps/Components.C

inline bool update(uintE s, uintE d){ //Update function writes min ID
  uintE origID = IDs[d];
  if(IDs[s] < origID) {
    IDs[d] = min(origID,IDs[s]);
    if(origID == prevIDs[d]) return 1;
  }
  return 0;
}

Why do we need if(origID == prevIDs[d])? Is this check always true?

How to set the threads numbers of Ligra?

RandLocalGraph seg faults when n > m

How to produce :

$./utils/randLocalGraph -s -m 100 -n 101 graph_file.txt
Segmentation fault (core dumped)

Does randLocalGraph not permit to generate graph where n > m ? (it works for n <= m)

( I am trying to generate graph with many components, any help on that will be great)
Thanks !

SNAPtoAdj segmentation fault

I compiled SNAPtoAdj with Cilk enabled, and the tool works well for all my graphs, except for friendster.
It ends up with segmentation fault, if I tell the tool to convert the friendster graph into a symmetric one.
The command is like

./SNAPtoAdj -s friendster.txt friendster_J

wghSNAPtoAdj has similar issue as well.

I tried different compilers, icpc and g++ 5.4.0.
When compiling, it prints out the following commands respectively.

icpc -std=c++14 -O3 -DCILKP   -o SNAPtoAdj SNAPtoAdj.C

g++ -std=c++14 -fcilkplus -lcilkrts -O2 -DCILK   -o SNAPtoAdj SNAPtoAdj.C

However, the executable files of both failed on friendster.

Could someone give me any idea what is going wrong?
Thanks in advance!

Using Matrix Market files in ligra

Is there any conversion code from graphs in Matrix Market format to the graph format used by libra? Thanks

Some applications get a segfault when compiled with -O0

Not sure if this is a known issue - {Bfs, bfs-bitvector, and Components} get a segfault when compiled with -O0 -g. These applications don't crash when compiled with -O2. The applications seem to fail consistently for different input graphs.

System Info - debian stretch, g++-6.3, using CILK for parallelization

Update
Made the following observations on further testing:

The problem persists even when compiled without CILK (or any other parallelization option)
All applications in which edgeMap outputs a vertexSubset seem to be affected. The no_output edgeMap applications are not affected

Backtrace of segfault

Thread 1 "BFS-Bitvector" received signal SIGSEGV, Segmentation fault.
0x0000555555577b3c in std::_Tuple_impl<0ul, unsigned int, pbbs::empty>::operator=(std::_Tuple_impl<0ul, unsigned int, pbbs::empty>&&) (this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f0d>)
at /usr/include/c++/6/tuple:299
299 _M_head(this) = std::forward<_Head>(_M_head(__in));
(gdb) bt
#0 0x0000555555577b3c in std::_Tuple_impl<0ul, unsigned int, pbbs::empty>::operator=(std::_Tuple_impl<0ul, unsigned int, pbbs::empty>&&) (
this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f0d>)
at /usr/include/c++/6/tuple:299
#1 0x000055555557274e in std::tuple<unsigned int, pbbs::empty>::operator=(std::tuple<unsigned int, pbbs::empty>&&) (this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f23>)
at /usr/include/c++/6/tuple:1178
#2 0x000055555556eadd in auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}::operator()(unsigned int, unsigned int, bool) const (__closure=0x7fffffffdbb0, ngh=1, offset=0, m=true) at edgeMap_utils.h:47
#3 0x000055555557a28b in decode_uncompressed::decodeOutNghSparse<asymmetricVertex, BFS_F, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}>(asymmetricVertex, long, unsigned int, BFS_F&, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}&) (v=0x7fffffffd8e0, i=0, o=0, f=..., g=...)
at vertex.h:66
#4 0x0000555555574a2f in asymmetricVertex::decodeOutNghSparse<BFS_F, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}>(long, unsigned int, BFS_F&, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>*)::{lambda(unsigned int, unsigned int, bool)#1}&) (this=0x7fffffffd8e0, i=0, o=0, f=..., g=...) at vertex.h:337
#5 0x0000555555563297 in edgeMapSparse<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> () at ligra.h:125
#6 0x00007ffff7bc43b1 in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#7 0x00007ffff7bc4d4b in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#8 0x00007ffff7bc50fe in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#9 0x0000555555571485 in edgeMapSparse<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=...,
frontierVertices=0x55563c61b720, indices=..., degrees=0x55555579c6e0, m=1, f=..., fl=0) at ligra.h:122
#10 0x000055555556ca46 in edgeMapData<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=..., vs=..., f=...,
threshold=23149703, fl=@0x7fffffffdf0c: 0) at ligra.h:266
#11 0x000055555556a506 in edgeMap<asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=..., vs=..., f=..., threshold=-1,
fl=@0x7fffffffdf0c: 0) at ligra.h:276
#12 0x0000555555566d63 in Compute (GA=..., P=...) at BFS-Bitvector.C:70
#13 0x00005555555588f4 in main (argc=2, argv=0x7fffffffe448) at ligra.h:510

Could sweep cut result be returned?

could I also get all the members of returned set S? I want to know what are the index of nodes that belong to the local community of the certain input node. Many thanks!

Betweenness output

Hi, how do I view the computed betweenness values for each vertex in BC?

Make a pretty README.md

Use the markdown format to make README look pretty.

BFS, Components and Pagerank fail for large compressed weighted graphs.

I have a bunch of weighted/unweighted graphs in .adj format and I'm trying to run BFS, CC and Pagerank on them. Since, BFS doesn't directly run weighted graphs, I used the ./encode to compress the graphs and then run BFS on it. I'm getting segmentation faults when I run it with compressed weighted graphs, but they run fine when I run with compressed/uncompressed unweighted graphs.

To reproduce:

Download CTU-46.adj from here: CTU-46 and download CTu-uw-46.adj from here: CTu-uw-46
Compress the matrix:

./encoder -w CTU-46.adj out46-w.compressed
./encoder CTu-uw-46.adj out46.compressed

Run BFS/Pagerank/Components

Weighted

./BFS -c out46-w.compressed          
size = 1396572
n = 41474 m = 85951 totalSpace = 369766
reading file...
inTotalSpace = 363182
creating graph...
[1]    15186 segmentation fault  ./BFS -c out46-w.compressed

Unweighted

./BFS -c out46-uw.compressed 
size = 994259
n = 41474 m = 85951 totalSpace = 164219
reading file...
inTotalSpace = 166416
creating graph...
Running time : 0.00235
Running time : 0.00234
Running time : 0.00164

getting rid of the -w flag for encoder program

The header of the generated files perfectly well describes that a specific graph is weighted or that it is unweighted. I suggest that using the encoder with the -w flag is offloading the responsibility onto the user.

However, if the user determines that a graph is weighted, by looking at the first line of the input graph, then so can the program. Otherwise, a user will only have to guess, and the program (given it exits abnormally) can again try interpreting the graph as weighted.

More Descriptive name than Adjacency Graph

Adjacency graph provides the largest amount of words with minimum information. The word that would best describe the format would be adjacency list. According to #21 the internal storage type is Compressed Sparse Row, but you'd never guess that from the readMe.

PagerankDelta correctness

It looks like you are discarding the residuals of all vertices every round, as opposed to only resetting residuals that are greater than the activation threshold. As a consequence, a vertex that takes more than one round to accumulate enough residual error to become active again, stays inactive, and the algorithm converges in less iterations than it should.

Maybe something like:

struct PR_Vertex_Reset {
  double* nghSum;
  PR_Vertex_Reset(double* _nghSum) :
    nghSum(_nghSum) {}
  inline bool operator () (uintE i) {
    if (nghSum[i] > epsilon2) nghSum[i] = 0.0; //reset only if we crossed threshold
    return 1;
  }
};

Please let me know if I am misunderstanding something here.

Are centrality measures available?

How can I use the closeness and betweenness centrality measures?

Error when make -j

[clong@shamrock apps]$ make -j
g++ -std=c++14 -O3 -DBYTERLE -o encoder encoder.C
g++ -std=c++14 -O3 -DBYTERLE -o decoder decoder.C
g++ -std=c++14 -O3 -DBYTERLE -o BFS BFS.C
g++ -std=c++14 -O3 -DBYTERLE -o BC BC.C
g++ -std=c++14 -O3 -DBYTERLE -o BellmanFord BellmanFord.C
g++ -std=c++14 -O3 -DBYTERLE -o Components Components.C
g++: error: unrecognized command line option ‘-std=c++14’
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [encoder] Error 1
make: *** Waiting for unfinished jobs....
make: *** [decoder] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BFS] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BC] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BellmanFord] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [Components] Error 1

-s flag issue

I downloaded the snap file soc-LiveJournal1
Converted it using SNAPtoAdj in /utils

$ ./SNAPtoAdj soc-LiveJournal1.txt lig_socLJ1

Then I tried to Run BFS on this data in the /apps folder
This was the output

$ ./BFS lig_socLJ1
Segmentation fault(core dumped)

However, when I used the -s flag

$ ./BFS -s lig_socLJ1
Running time : 0.571
Running time : 0.518
Running time : 0.511

As you can see, I did not use the SNAPtoAdj utility with the -s flag.
I was wondering, where I could be wrong. Please, help me out.

What if I delete rounds and just run Compute one time?

I noticed "long rounds = P.getOptionLongValue("-rounds",3);" in ligra.h file. And I am finding the same result is calculated four times, which seems taking more time.

I reduced rounds to 1, does that mean actually setting rounds to 1 will take even longer time in terms of parallel computing? I do not understand why we need to compute the same thing for four times.

Also , if I deleted the for loops, and replace it with below code

graph<compressedSymmetricVertex> G =
readCompressedGraph<compressedSymmetricVertex>(iFile,symmetric); //symmetric graph
Compute(G,P);
G.del();

And I got the result:

size = 13233762
n = 7164775534703567937 m = 3677866877581936505 totalSpace = 3904913592795935540
reading file...
Segmentation fault: 11

I am thinking maybe it is because my input data file is too large, and some memory problem occur. But I am still confused, what is the best strategy here. I am running local graph clustering to get PPR vector for each node of a graph of 343299 nodes. So I also want to make sure that I choose an approach that saves time. Thanks!

[edgeMap] segfault in sparse mode when graph has less than 20 edges

This is on master branch. Steps to replicate:

Change BellmanFord.C to use no_dense instead of dense_forward, but don't change the initial threshold of G.m/20 here:

ligra/apps/BellmanFord.C

Line 79 in 7755d95

    
           vertexSubset output = edgeMap(GA, Frontier, BF_F(ShortestPathLen,Visited), GA.m/20, dense_forward);

Use a small graph with less than 20 edges as input, get segfault.

Why:
Step 1 passes a threshold of 0 to edgeMap. This is not a problem when in dense mode, but when in sparse mode, it leads to activating a code path that skips computation of out degrees here:

ligra/ligra/ligra.h

Line 248 in 7755d95

if(threshold > 0) { //compute sum of out-degrees if threshold > 0

Later edgeMap tries to deference a nullptr by attempting to run a prefix-sum on the offsets array, which is pointing to degrees (now null because of above):

ligra/ligra/ligra.h

Line 119 in 7755d95

uintT* offsets = degrees;

In the example above, threshold was set to 0 by the application. However, edgeMap can also end up setting it 0 here:

ligra/ligra/ligra.h

Line 238 in 7755d95

if(threshold == -1) threshold = numEdges/20; //default threshold

(A threshold of 0 again is not a problem when in dense mode, but skips degrees computation when in sparse mode)

A potential fix on edgeMap (instead of on the application) would be to do this, which is what I do locally (UPDATE: changed it so that we don't accidentally overwrite user-specified threshold):

  if (threshold == -1) {
    if (numEdges > 20) {
      threshold = numEdges / 20;  // default threshold
    } else {
      // num edges is too small, use 1 as a threshold instead
      threshold = 1;
    }
  }

Julian, let me know if you're ok with this and I'll send it over.

When Running Ligra CPU Utilization is only 100%

When I run ./BFS and ./PageRank on a generated graph with 100M nodes, cpu utilization is only around 100%. Is it normal? Shouldnt ligra uses all cpus to compute?

Feature request: Conversion to and from standard Graph Formats

Being able to convert to and from Graphviz (or any other format, for that matter) would allow for this freamework to be somewhat more useable than just for benchmarking. Right now, for use with anything but the included generator I need to write my own parser, which defeats the purpose of this being a framework.

Feature Request: Consistent comments in header of file.

Is it possible to have a header describing what each file does and which problem does it solve? For example, I only know that Bellman Ford solves the Single source shortest path, because I've implemented Bellman-Ford.

Docstrings for functions would be nice. Right now it's near impossible to tell which function does what and what its return type is. E.g. I don't know where or how to find the result of the Bellman Ford computation. For all I know it might not even compute the Shortest Path correctly

jshun / ligra Goto Github PK

ligra's People

Contributors

Stargazers

Watchers

Forkers

ligra's Issues

Recommend Projects

Recommend Topics

Recommend Org