Code Monkey home page Code Monkey logo

arkouda-njit's People

Contributors

ak47mrj avatar alvaradoo avatar arjreddy97 avatar glitch avatar jakobtroidl avatar jtpatchett avatar mdindoost avatar stress-tess avatar zhihuidu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arkouda-njit's Issues

Mismatched argument in graph_bfs

Calling graph_bfs raises a NameError:

NameError: name 'RCMFlag' is not defined

Looking at the code it's on line 436 of graph.py where the argument passed in is called rcm_flag but the line needs to be updated to use that name instead of "RCMFlag": RCMFlag in the args dictionary.

Add support for Categorical in Property Graphs

ar.PropGraph() behaves as a wrapper to dataframes where new arrays are explicitly stored in the back-end for columnar data. We need support for when a column is an Arkouda Categorical.

Even when specifying the name of source and destination columns, error is thrown expecting "src" or "dst"

When running load_edge_attributes as below:

prop_graph.load_edge_attributes(test_edge_df, source_column="src1", destination_column="dst1", relationship_columns=["data5", "data1"])

The following chain of errors is thrown

KeyError: "Invalid column name 'src'."
KeyError: 'duplicated attribute (column) name in relationships'

This has to do with add_edge_relationships expecting columns named "src" and "dst".

Feature Request: Add Subgraph Node ID Mapping in subgraph_isomorphism Output

Hi!

I'm currently using the subgraph_isomorphism function to find isomorphic subgraphs in a host graph. However, the current isos array returned only includes the vertex IDs from the host graph without providing a mapping to the IDs of the subgraph. This makes it challenging to reconstruct the subgraph using the host graph IDs.

Here is the code illustrating the current behavior:

G = ar.PropGraph()
src = [1, 2, 3, 2]
dst = [2, 3, 4, 7]
G.add_edges_from(ak.array(src), ak.array(dst))

subgraph = ar.PropGraph()
src = [10, 20, 30, 20]
dst = [20, 30, 40, 70]
subgraph.add_edges_from(ak.array(src), ak.array(dst))

isos = ar.subgraph_isomorphism(G, subgraph)
print(isos)

The output I get is:

[1 2 3 4 7]

With help from Oliver, I created a function that maps the subgraph IDs to the host graph IDs based on the isos output, allowing me to recreate the subgraph. Here's a concise explanation of the provided code:

The function first checks if the length of the isos array is a multiple of the number of subgraph nodes. It then iterates through each isomorphic subgraph found, creating a mapping from the subgraph node IDs to the corresponding host graph node IDs. Finally, it reconstructs the subgraph edges using the host graph node IDs.

Here's the complete code:

import arkouda as ak
import arachne as ar

ak.connect()

# Define the host graph
G = ar.PropGraph()
src_host = [1, 2, 3, 2]
dst_host = [2, 3, 4, 7]
G.add_edges_from(ak.array(src_host), ak.array(dst_host))

# Define the subgraph
subgraph = ar.PropGraph()
src_sub = [10, 20, 30, 20]
dst_sub = [20, 30, 40, 70]
subgraph.add_edges_from(ak.array(src_sub), ak.array(dst_sub))

# Find isomorphic subgraphs
isos = ar.subgraph_isomorphism(G, subgraph)
print(f"Isomorphisms found: {isos}")
isos_ndarray = isos.to_ndarray()  # Convert pdarray to ndarray

# Check if the length of isomorphisms is a multiple of the number of subgraph nodes
if len(isos) % len(subgraph) != 0:
    raise ValueError("The length of isomorphisms is not a multiple of the number of subgraph nodes.")

subgraph_nodes=[]
host_graph_nodes=[]
node_mapping={}

number_isos_found= len(isos)/len(subgraph)
for i in range(0,int(number_isos_found)):
    # Create a mapping from subgraph nodes to host graph nodes
    subgraph_nodes = sorted(list(set(src_sub + dst_sub)))
    host_graph_nodes = isos_ndarray[i*len(subgraph_nodes):i*len(subgraph_nodes) + len(subgraph_nodes)]
    node_mapping = dict(zip(subgraph_nodes, host_graph_nodes))
    print("Mapping:", node_mapping)
    # Recreate the subgraph with host graph node IDs
    mapped_src_sub = [node_mapping[node] for node in src_sub]
    mapped_dst_sub = [node_mapping[node] for node in dst_sub]

    # Output the mapped subgraph edges
    print("Mapped subgraph edges (using host graph node IDs):")
    for s, d in zip(mapped_src_sub, mapped_dst_sub):
        print(f"{s} -> {d}")

The output I get is:

Isomorphisms found: [1 2 3 4 7]
Mapping: {10: 1, 20: 2, 30: 3, 40: 4, 70: 7}
Mapped subgraph edges (using host graph node IDs):
1 -> 2
2 -> 3
3 -> 4
2 -> 7

I hope this helps to implement a mapping feature directly into the library!

Port arkouda-contrib/akgraph graph generators to arachne

There are some graph generators in arkouda-contrib/akgraph that can be ported into Arachne. They are expected to live inside of arachne/client in a file called generators.py or similar.

Most of the return statements for the generator functions are of the format:

return standardize_edges(U,V)

This can be directly mapped to Arachne functionality by doing:

graph = ar.Graph() # or DiGraph() or PropGraph()
graph.add_edges_from(U,V)

Users should be able to dictate if they want to build a Graph(), DiGraph(), or PropGraph(). No need to worry about remove multiple edges or self-loops as the add_edges_from() method handles removing multiple edges, and self-loops are allowed in the base data structure.

Arachne build breaks on Chapel 1.33 with `CHPL_COMM` set to `none`

Using the following commands to set the Chapel environment on version 1.33:

source $CHPL_HOME/util/setchplenv.bash
export CHPL_RE2=bundled
export CHPL_LLVM=bundled
export CHPL_GMP=bundled
export CHPL_COMM=none

We get the following error when building Arachne:

/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:17: In function 'fastLocalSubdomain':
/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:19: error: unresolved call 'unmanaged domain(1,int(64),one).locDoms[int(64)]'
$CHPL_HOME/modules/dists/BlockDist.chpl:546: note: this candidate did not match: BlockDom.locDoms
/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:19: note: because call includes 1 argument
$CHPL_HOME/modules/dists/BlockDist.chpl:546: note: but function can only accept 0 arguments
/scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:19: note: other candidates are:
$CHPL_HOME/modules/dists/SparseBlockDist.chpl:89: note:   SparseBlockDom.locDoms
  /scratch/users/oaa9/arkouda-njit/arachne/server/Utils.chpl:65: called as fastLocalSubdomain(blockArray: [domain(1,int(64),one)] int(64)) from function 'generateRanges'
  /scratch/users/oaa9/arkouda-njit/arachne/server/BuildGraphMsg.chpl:98: called as generateRanges(graph: shared SegGraph, key: string, key2insert: string, array: [domain(1,int(64),one)] int(64))
note: generic instantiations are underlined in the above callstack
make: *** [Makefile:359: arkouda_server] Error 1
make: Leaving directory '/scratch/users/oaa9/arkouda'

This is caused due to BlockDom.locDoms in Chapel expecting 1 argument when CHPL_COMM is set and 0 arguments when CHPL_COMM is none.

Running list of Arachne errors that kill the Arkouda server

Currently, errors in Arachne such as using uint64 instead of int64 during file ready are crashing the server. These errors should be handled properly so that the Arkouda server doesn't need to be restarted every time an error is encountered. Most of this will be handled by using Chapel throws and catches. (https://chapel-lang.org/docs/language/spec/error-handling.html). The reading method is just an example, all functions will need to be checked .

Combined edge and node attribute insertion

Currently, we require edge attributes and node attributes to be loaded separately from different dataframes via the load_edge_attributes and load_node_attributes functions in Arachne. This issue proposes that we allow node attribute columns to be specified for vertices during edge insertion where the data specified for that vertex in that edge can be stored as a node attribute.

This will improve performance since currently a user may have to do a large merge-join on an edge attribute dataframe to get all of the data for each vertex.

Example of desired functionality from Tom:

import arkouda as ak
import arachne as akg
from glob import glob
import pandas as pd
import numpy as np
import socket
import timeit
import os

ak.connect("xxxx")

rawfilelist = ["file1","file2","file3"]
rawfilelist = rawfilelist[0:500]

columns = ["src_ip","src_port","dst_ip","dst_port","protocol"]

rawdata = ak.readmethod(rawfilelist,datasets = columns) # substitute with method to read the appropriate file types
raw_df = ak.DataFrame(rawdata)

raw_df.columns

["src_ip",
 "src_port",
 "dst_ip",
 "dst_port",
 "protocol"]

#
# TEMP FIX FOR PROPERTY GRAPH AS CODE LOOKS for "src" and "dst"
#
filtered_df = raw_df
filtered_df["src"] = filtered["src_ip"]
filtered_df["dst"] = filtered["dst_ip"]

prop_graph = akg.PropGraph()

#
# Add new collection to indicate properties to gather from src/dst nodes when they are created.
# In my case I calculated those values to use in the load_node_attributes method like the following:
#
# prop_graph.load_node_attributes(node_df,node_column="nodes",label_columns=["ip","port","protocol"]
#
# BELOW IS THE DESIRED CODE WHICH AVOIDS HAVING TO PERFORM MERGE-JOINS.
#
# NOTE: Some items left to decide.  Does the load_edge_attributes calculate the node_columns or are they
#       provided in the filtered_df.  If there are multiple node_column values for each vertice, are they
#       stored as a collection.  For instance, src has two different protocols (two edges) or two ports, etc.
#
prop_graph.load_edge_attributes(filtered_df,
                                source_column="src",
                                destination_column="dst",
                                relationship_columns=["protocol","src_port","dst_port"],
                                node_columns=["ip","port","protocol"])

Triangle counting returns node ID rather than triangle count

For some reason, the triangle counting function in Arachne returns the node input ID rather than the found triangles. Here's the head .csv dataset I used to build the graph.

**HEADER**
int64,int64,float64
*/HEADER/*
bodyId_pre,bodyId_post,weight
294437328,295470623,1
294437328,295133902,1
294437328,448260940,1
294437328,294783423,1
294437328,5812979995,1
294437328,295474441,2
294437328,265120223,1
294437328,296139882,1

Here's the code snippet and output of my triangle function call.

tri_nodes = ak.array([-1])
count = ar.triangles(graph, vertexArray=tri_nodes)
print(count)
# [-1]

@alvaradoo thinks this might be a bug. Any idea how this could be fixed?

Integrate new aggregated bfs method

Currently, the BFS method has declining performance as the number of locales increases. There is an update found in the attached branch with an aggregated BFS method that has to be pulled into main Arachne to handle this problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.