Code Monkey home page Code Monkey logo

nebulagraph-yelp-frauddetection's Introduction

Dataset Intro

This data set was introduced by Dou et al. in Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters.

The data and paper's code could be found here, and here I cheated during the processing of data by leveraging dgl to convert ajacency matrix to edgelist, and nodes with features & label.

Schema of the data:

  • vertices: Yelp Reviews, with label(is_fruad) as a property and 32 normalized features as properties.
  • edges: Relationship between reviews without property.
    • R-U-R: shares_user_with
    • R-S-R: shares_restaurant_rating_with
    • R-T-R: shares_restaurant_in_one_month_with

Download and convert data into CSV

python3 -m pip install -r requirements.txt
python3 data_download.py
ls -l data/*.csv

Generated files:

$ ls data/*.csv
$

net_rsr.csv  net_rtr.csv  net_rur.csv  vertices.csv

Import data into NebulaGraph

Assuming that we boostrap a NebulaGraph with Nebula-UP.

docker run --rm -ti \
    --network=nebula-net \
    -v ${PWD}/yelp_nebulagraph_importer.yaml:/root/importer.yaml \
    -v ${PWD}/data:/root \
    vesoft/nebula-importer:v3.1.0 \
    --config /root/importer.yaml

After it's imported, we could query the stats of the graph:

~/.nebula-up/console.sh -e "USE yelp; SHOW STATS"

It should be like this:

(root@nebula) [(none)]> USE yelp; SHOW STATS
+---------+---------------------------------------+---------+
| Type    | Name                                  | Count   |
+---------+---------------------------------------+---------+
| "Tag"   | "review"                              | 45954   |
| "Edge"  | "shares_restaurant_in_one_month_with" | 1147232 |
| "Edge"  | "shares_restaurant_rating_with"       | 6805486 |
| "Edge"  | "shares_user_with"                    | 98630   |
| "Space" | "vertices"                            | 45954   |
| "Space" | "edges"                               | 8051348 |
+---------+---------------------------------------+---------+
Got 6 rows (time spent 1911/4488 us)

NebulaGraph DGL Integration

I know I don't have to do this as we have it in DGL dataset already, this is just a demo of how to use NebulaGraph with DGL.

In [1]:
from nebula_dgl import NebulaLoader

nebula_config = {
    "graph_hosts": [
                ('graphd', 9669),
                ('graphd1', 9669),
                ('graphd2', 9669)
            ],
    "user": "root",
    "password": "nebula",
}

with open('nebulagraph_yelp_dgl_mapper.yaml', 'r') as f:
    feature_mapper = yaml.safe_load(f)

nebula_loader = NebulaLoader(nebula_config, feature_mapper)

g = nebula_loader.load()

# This will take a while

In [2]: g
Out[2]:
Graph(num_nodes={'review': 45954},
      num_edges={('review', 'shares_restaurant_in_one_month_with', 'review'): 1147232, ('review', 'shares_restaurant_rating_with', 'review'): 6805486, ('review', 'shares_user_with', 'review'): 98630},
      metagraph=[('review', 'review', 'shares_restaurant_in_one_month_with'), ('review', 'review', 'shares_restaurant_rating_with'), ('review', 'review', 'shares_user_with')])

In [3]: g.canonical_etypes
Out[3]:
[('review', 'shares_restaurant_in_one_month_with', 'review'),
 ('review', 'shares_restaurant_rating_with', 'review'),
 ('review', 'shares_user_with', 'review')]

nebulagraph-yelp-frauddetection's People

Contributors

wey-gu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

nebulagraph-yelp-frauddetection's Issues

failed to import data with this yaml file and nebula-importer-v3.1.0 docker container

[ERROR] clientpool.go:158: Client(1) fails to execute commands (USE yelp;), error: failed to execute: Session has been released。
and when i use the same method to import another dataset [fraud-detection-datagen] it shows

2023/08/15 01:14:11 Client(0) fails to execute commands (CREATE SPACE frauddetection (partition_num = 5, replica_factor = 1, vid_type = FIXED_STRING(32)); USE frauddetection; CREATE TAG corp (corp_name string, phone_num string, is_risky string, risk_comment string); CREATE TAG application (apply_agent_id string, apply_date date, application_uuid string, approval_status string, application_type string, rejection_reason string); CREATE TAG phone_num (phone_num string); CREATE TAG device (device_id string); CREATE TAG person (name string); CREATE TAG louvain (louvain int); CREATE TAG applicant (name string, gender string, birth date, addr string, degree string, occupation string, year_salary string, is_risky string, risk_comment string); CREATE EDGE applied_for_loan (start_time date); CREATE EDGE with_phone_num (start_time date); CREATE EDGE is_related_to (start_time date, level int); CREATE EDGE worked_for (start_time date); CREATE EDGE used_device (start_time date); CREATE TAG INDEX louvain_index on louvain(louvain);
), response error code: -1005, message: Existed!

problem about root/err

Sorry to bother you, but I have a problem, as shown in the picture, when I run this code
docker run --rm -ti \

--network=nebula-net
-v ${PWD}/yelp_nebulagraph_importer.yaml:/root/importer.yaml
-v ${PWD}/data:/root
vesoft/nebula-importer:v3.1.0
--config /root/importer.yaml

image
How can I solve it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.