bears-r-us / arkouda-njit Goto Github PK
View Code? Open in Web Editor NEWHome of Arachne and other Arkouda functionality provided by researchers at NJIT
License: MIT License
Home of Arachne and other Arkouda functionality provided by researchers at NJIT
License: MIT License
After meeting with Mike, an Arkouda method-based graph construction could be simpler and more effective. Will attach a sample script later of methods we would want to use.
Say we have a graph, G
with vertex names {2,6,7} the internal values would be represented with {0,1,2}. When we run BFS on G
the returned depth array, D
, is based off values {0,1,2}. How can we let the user index into D
based off the original values instead of the internal values?
In order to facilitate the implementation of various algorithms and foster research in graph theory, it is imperative to have access to a priori probabilities of Nodes' labels and edges' rels of graphs. This addition would significantly enhance the efficiency and effectiveness of graph-related endeavors, enabling users to make informed decisions and conduct thorough analyses.
Calling graph_bfs raises a NameError:
NameError: name 'RCMFlag' is not defined
Looking at the code it's on line 436 of graph.py where the argument passed in is called rcm_flag
but the line needs to be updated to use that name instead of "RCMFlag": RCMFlag
in the args
dictionary.
ar.PropGraph()
behaves as a wrapper to dataframes where new arrays are explicitly stored in the back-end for columnar data. We need support for when a column is an Arkouda Categorical.
Self-explanatory. We want documentation that looks like this: https://bears-r-us.github.io/arkouda/ or like this: https://networkx.org/documentation/stable/reference/index.html.
When running load_edge_attributes as below:
prop_graph.load_edge_attributes(test_edge_df, source_column="src1", destination_column="dst1", relationship_columns=["data5", "data1"])
The following chain of errors is thrown
KeyError: "Invalid column name 'src'."
KeyError: 'duplicated attribute (column) name in relationships'
This has to do with add_edge_relationships
expecting columns named "src" and "dst".
Hi!
I'm currently using the subgraph_isomorphism
function to find isomorphic subgraphs in a host graph. However, the current isos
array returned only includes the vertex IDs from the host graph without providing a mapping to the IDs of the subgraph. This makes it challenging to reconstruct the subgraph using the host graph IDs.
Here is the code illustrating the current behavior:
G = ar.PropGraph()
src = [1, 2, 3, 2]
dst = [2, 3, 4, 7]
G.add_edges_from(ak.array(src), ak.array(dst))
subgraph = ar.PropGraph()
src = [10, 20, 30, 20]
dst = [20, 30, 40, 70]
subgraph.add_edges_from(ak.array(src), ak.array(dst))
isos = ar.subgraph_isomorphism(G, subgraph)
print(isos)
The output I get is:
[1 2 3 4 7]
With help from Oliver, I created a function that maps the subgraph IDs to the host graph IDs based on the isos output, allowing me to recreate the subgraph. Here's a concise explanation of the provided code:
The function first checks if the length of the isos array is a multiple of the number of subgraph nodes. It then iterates through each isomorphic subgraph found, creating a mapping from the subgraph node IDs to the corresponding host graph node IDs. Finally, it reconstructs the subgraph edges using the host graph node IDs.
Here's the complete code:
import arkouda as ak
import arachne as ar
ak.connect()
# Define the host graph
G = ar.PropGraph()
src_host = [1, 2, 3, 2]
dst_host = [2, 3, 4, 7]
G.add_edges_from(ak.array(src_host), ak.array(dst_host))
# Define the subgraph
subgraph = ar.PropGraph()
src_sub = [10, 20, 30, 20]
dst_sub = [20, 30, 40, 70]
subgraph.add_edges_from(ak.array(src_sub), ak.array(dst_sub))
# Find isomorphic subgraphs
isos = ar.subgraph_isomorphism(G, subgraph)
print(f"Isomorphisms found: {isos}")
isos_ndarray = isos.to_ndarray() # Convert pdarray to ndarray
# Check if the length of isomorphisms is a multiple of the number of subgraph nodes
if len(isos) % len(subgraph) != 0:
raise ValueError("The length of isomorphisms is not a multiple of the number of subgraph nodes.")
subgraph_nodes=[]
host_graph_nodes=[]
node_mapping={}
number_isos_found= len(isos)/len(subgraph)
for i in range(0,int(number_isos_found)):
# Create a mapping from subgraph nodes to host graph nodes
subgraph_nodes = sorted(list(set(src_sub + dst_sub)))
host_graph_nodes = isos_ndarray[i*len(subgraph_nodes):i*len(subgraph_nodes) + len(subgraph_nodes)]
node_mapping = dict(zip(subgraph_nodes, host_graph_nodes))
print("Mapping:", node_mapping)
# Recreate the subgraph with host graph node IDs
mapped_src_sub = [node_mapping[node] for node in src_sub]
mapped_dst_sub = [node_mapping[node] for node in dst_sub]
# Output the mapped subgraph edges
print("Mapped subgraph edges (using host graph node IDs):")
for s, d in zip(mapped_src_sub, mapped_dst_sub):
print(f"{s} -> {d}")
The output I get is:
Isomorphisms found: [1 2 3 4 7]
Mapping: {10: 1, 20: 2, 30: 3, 40: 4, 70: 7}
Mapped subgraph edges (using host graph node IDs):
1 -> 2
2 -> 3
3 -> 4
2 -> 7
I hope this helps to implement a mapping feature directly into the library!
Request:
To elevate the depth and accuracy of graph analysis, I propose the inclusion of both local and global clustering coefficients within the framework. These coefficients play pivotal roles in characterizing the structural properties of graphs.
There are some graph generators in arkouda-contrib/akgraph that can be ported into Arachne. They are expected to live inside of arachne/client in a file called generators.py
or similar.
Most of the return statements for the generator functions are of the format:
return standardize_edges(U,V)
This can be directly mapped to Arachne functionality by doing:
graph = ar.Graph() # or DiGraph() or PropGraph()
graph.add_edges_from(U,V)
Users should be able to dictate if they want to build a Graph(), DiGraph(), or PropGraph(). No need to worry about remove multiple edges or self-loops as the add_edges_from()
method handles removing multiple edges, and self-loops are allowed in the base data structure.
Using the following commands to set the Chapel environment on version 1.33:
source $CHPL_HOME/util/setchplenv.bash
export CHPL_RE2=bundled
export CHPL_LLVM=bundled
export CHPL_GMP=bundled
export CHPL_COMM=none
We get the following error when building Arachne:
/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:17: In function 'fastLocalSubdomain':
/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:19: error: unresolved call 'unmanaged domain(1,int(64),one).locDoms[int(64)]'
$CHPL_HOME/modules/dists/BlockDist.chpl:546: note: this candidate did not match: BlockDom.locDoms
/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:19: note: because call includes 1 argument
$CHPL_HOME/modules/dists/BlockDist.chpl:546: note: but function can only accept 0 arguments
/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:19: note: other candidates are:
$CHPL_HOME/modules/dists/SparseBlockDist.chpl:89: note: SparseBlockDom.locDoms
/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:65: called as fastLocalSubdomain(blockArray: [domain(1,int(64),one)] int(64)) from function 'generateRanges'
/scratch/users/oaa9/arkouda-njit/arachne/server/BuildGraphMsg.chpl:98: called as generateRanges(graph: shared SegGraph, key: string, key2insert: string, array: [domain(1,int(64),one)] int(64))
note: generic instantiations are underlined in the above callstack
make: *** [Makefile:359: arkouda_server] Error 1
make: Leaving directory '/scratch/users/oaa9/arkouda'
This is caused due to BlockDom.locDoms
in Chapel expecting 1 argument when CHPL_COMM
is set and 0 arguments when CHPL_COMM
is none
.
uint64 dtype does not seem to be supported. I have cast those data types to int64. This noticeably showed up on the src and dst columns for the graphs.
To streamline graph analysis workflows and facilitate easier examination of node connectivity, I suggest incorporating sorting functionality for degrees in both ascending and descending order.
Currently, errors in Arachne such as using uint64 instead of int64 during file ready are crashing the server. These errors should be handled properly so that the Arkouda server doesn't need to be restarted every time an error is encountered. Most of this will be handled by using Chapel throws and catches. (https://chapel-lang.org/docs/language/spec/error-handling.html). The reading method is just an example, all functions will need to be checked .
Currently, we require edge attributes and node attributes to be loaded separately from different dataframes via the load_edge_attributes
and load_node_attributes
functions in Arachne. This issue proposes that we allow node attribute columns to be specified for vertices during edge insertion where the data specified for that vertex in that edge can be stored as a node attribute.
This will improve performance since currently a user may have to do a large merge-join on an edge attribute dataframe to get all of the data for each vertex.
Example of desired functionality from Tom:
import arkouda as ak
import arachne as akg
from glob import glob
import pandas as pd
import numpy as np
import socket
import timeit
import os
ak.connect("xxxx")
rawfilelist = ["file1","file2","file3"]
rawfilelist = rawfilelist[0:500]
columns = ["src_ip","src_port","dst_ip","dst_port","protocol"]
rawdata = ak.readmethod(rawfilelist,datasets = columns) # substitute with method to read the appropriate file types
raw_df = ak.DataFrame(rawdata)
raw_df.columns
["src_ip",
"src_port",
"dst_ip",
"dst_port",
"protocol"]
#
# TEMP FIX FOR PROPERTY GRAPH AS CODE LOOKS for "src" and "dst"
#
filtered_df = raw_df
filtered_df["src"] = filtered["src_ip"]
filtered_df["dst"] = filtered["dst_ip"]
prop_graph = akg.PropGraph()
#
# Add new collection to indicate properties to gather from src/dst nodes when they are created.
# In my case I calculated those values to use in the load_node_attributes method like the following:
#
# prop_graph.load_node_attributes(node_df,node_column="nodes",label_columns=["ip","port","protocol"]
#
# BELOW IS THE DESIRED CODE WHICH AVOIDS HAVING TO PERFORM MERGE-JOINS.
#
# NOTE: Some items left to decide. Does the load_edge_attributes calculate the node_columns or are they
# provided in the filtered_df. If there are multiple node_column values for each vertice, are they
# stored as a collection. For instance, src has two different protocols (two edges) or two ports, etc.
#
prop_graph.load_edge_attributes(filtered_df,
source_column="src",
destination_column="dst",
relationship_columns=["protocol","src_port","dst_port"],
node_columns=["ip","port","protocol"])
For some reason, the triangle counting function in Arachne returns the node input ID rather than the found triangles. Here's the head .csv dataset I used to build the graph.
**HEADER**
int64,int64,float64
*/HEADER/*
bodyId_pre,bodyId_post,weight
294437328,295470623,1
294437328,295133902,1
294437328,448260940,1
294437328,294783423,1
294437328,5812979995,1
294437328,295474441,2
294437328,265120223,1
294437328,296139882,1
Here's the code snippet and output of my triangle function call.
tri_nodes = ak.array([-1])
count = ar.triangles(graph, vertexArray=tri_nodes)
print(count)
# [-1]
@alvaradoo thinks this might be a bug. Any idea how this could be fixed?
Currently, the BFS method has declining performance as the number of locales increases. There is an update found in the attached branch with an aggregated BFS method that has to be pulled into main Arachne to handle this problem.
Tom can successfully create an edge for the property graph, but when I create a node it consistently fails with "None Type is not iterable".
There seems to be an issue relating to the way way read_matrix_market_file() reads in the karate.mtx file ( in /arachne/data/) and instantiates an ar.Digraph. The edges added in don't appear to be correct.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.