Code Monkey home page Code Monkey logo

data's Introduction

Data Repository for PyGOD

The statistics of the available dataset (#Con. means the number of contextual outliers, while #Strct. means the number of structural outliers. The number of outliers is slightly less than the sum of two types of outliers because of the intersection between two types of outliers.):

Dataset Type #Nodes #Edges #Feat Avg. Degree #Con. #Strct. #Outliers Outlier Ratio
'weibo' organic 8,405 407,963 400 48.5 - - 868 10.3%
'reddit' organic 10,984 168,016 64 15.3 - - 366 3.3%
'disney' organic 124 335 28 2.7 - - 6 4.8%
'books' organic 1,418 3,695 21 2.6 - - 28 2.0%
'enron' organic 13,533 176,987 18 13.1 - - 5 0.04%
'inj_cora' injected 2,708 11,060 1,433 4.1 70 70 138 5.1%
'inj_amazon' injected 13,752 515,042 767 37.2 350 350 694 5.0%
'inj_flickr' injected 89,250 933,804 500 10.5 2,240 2,240 4,414 4.9%
'gen_time' generated 1,000 5,746 64 5.7 100 100 189 18.9%
'gen_100' generated 100 618 64 6.2 10 10 18 18.0%
'gen_500' generated 500 2,662 64 5.3 10 10 20 4.0%
'gen_1000' generated 1,000 4,936 64 4.9 10 10 20 2.0%
'gen_5000' generated 5,000 24,938 64 5.0 10 10 20 0.4%
'gen_10000' generated 10,000 49,614 64 5.0 10 10 20 0.2%

To use the datasets:

from pygod.utils import load_data
data = load_data('weibo') # in PyG format

Alternative download source in Baidu Disk (Chinese): https://pan.baidu.com/s/1afEZaygCRUYWJPtVbzuRYw Access Code: bond

For injected/generated datasets, the labels meanings are as follows.

  • 0: inlier
  • 1: contextual outlier only
  • 2: structural outlier only
  • 3: both contextual outlier and structural outlier

Examples to convert the labels are as follows:

y = data.y.bool()    # binary labels (inlier/outlier)
yc = data.y >> 0 & 1 # contextual outliers
ys = data.y >> 1 & 1 # structural outliers

data's People

Contributors

kayzliu avatar yingtongdou avatar

Stargazers

 avatar  avatar  avatar Crystal avatar  avatar 杨贞 avatar Xiaocong Chen avatar  avatar  avatar ZhongLIFR avatar sunyf avatar Fakhri Robi Aulia avatar voidreamy avatar Chuyan Qin avatar Jingwei avatar  avatar Akanksha Ahuja avatar Dou sir avatar  avatar Zhihao Wu avatar  avatar Zak Jost avatar xbingsun avatar Xiaoxiao Ma avatar Qian Peisheng avatar Guanwei Hu avatar  avatar XXXX avatar Zhiyuan Liu avatar Tong Su avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

data's Issues

关于几个异常数据集的问题

作者您好,我想请问例如现在我在做半监督分类的异常检测,我想使用您的weibo或者boonks这些数据集,我需要设置训练/测试/验证集的mask,请问train-mask/test-mask等中的0和1的比列是自己随机设定还是如何设定呢?期待您的回信

关于人工添加异常的数据集问题

尊敬的作者您好,拜读了您的BOND文章,对于其中数据集构建有一些疑问:
1.inj_cora,inj_amazon和inj_flickr是人工添加异常的数据集,其中标签是0和1构成的,用于异常检测二分类,在图节点分类任务中,cora的类别有七类,对应的标签0-6,图异常检测的标签划分是基于这个转换的吗,还是说这些数据集在构建的时候就按照0和1构成,那么用于异常检测的原始cora数据集可以在哪里下载呢
2.这两个函数def gen_contextual_outlier(data, n, k, seed=None):def gen_structural_outlier(data, m, n, p=0, directed=False, seed=None):中对应上面三个数据集的m,n,k三个参数的值可以在那里查看,有cora生成inj_cora的完整代码可以参考吗
非常期待您的回复

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.