paddlepaddle / pgl Goto Github PK

View Code? Open in Web Editor NEW

1.6K 27.0 311.0 29.01 MB

Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

License: Apache License 2.0

Python 94.44% Jupyter Notebook 2.43% Shell 1.81% Makefile 0.03% Cython 1.23% Dockerfile 0.06%

heterogeneous-graph-learning graph-neural-network graph metapath graph-learning

pgl's Introduction

DOC | Quick Start | 中文

Breaking News !!

One amazing paper about knowledge representation learning was accepted! (2022.05.06)

Simple and Effective Relation-based Embedding Propagation for Knowledge Representation Learning, to appear in IJCAI2022. Code can be found here.

PGL v2.2 2021.12.20

Graph4Rec: We released a universal and large-scale toolkit with graph neural networks for recommender systems. Details can be found here.
Graph4KG: We released a flexible framework named Graph4KG to learn embeddings of entities and relations in KGs, which supports training on massive KGs. Details can be found here.
GNNAutoScale: PGL now supports GNNAutoScale framework, which can scale arbitrary message-passing GNNs to large graphs. Details can be found here.

🔥 🔥 🔥 OGB-LSC KDD CUP 2021 winners announced!! (2021.06.17)

Super excited to announce our PGL team won TWO FIRST place and ONE SECOND place in a total of three track in OGB-LSC KDD CUP 2021. Leaderboards can be found here.

First place in MAG240M-LSC track: Code and Technical Report can be found here.
First place in WikiKG90M-LSC track: Code and Technical Report can be found here.
Second place in PCQM4M-LSC track: Code and Technical Report can be found here.

Two amazing paper using PGL are accepted: (2021.06.17)

Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification, to appear in IJCAI2021.
HGAMN: Heterogeneous Graph Attention Matching Network for Multilingual POI Retrieval at Baidu Maps, to appear in KDD2021.

PGL Dstributed Graph Engine API released!!

Our Dstributed Graph Engine API has been released and we developed a tutorial to show how to launch a graph engine and a demo for training model using graph engine.

Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle.

The newly released PGL supports heterogeneous graph learning on both walk based paradigm and message-passing based paradigm by providing MetaPath sampling and Message Passing mechanism on heterogeneous graph. Furthermor, The newly released PGL also support distributed graph storage and some distributed training algorithms, such as distributed deep walk and distributed graphsage. Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.

One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt Message Passing Paradigm similar to DGL to help to build a customize graph neural network easily. Users only need to write send and recv functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function to send the message from the source to the target node. For the second step, the recv function is responsible for aggregating messages together from different sources.

To write a sum aggregator, users only need to write the following codes.

    import pgl
    import paddle
    import numpy as np

    
    num_nodes = 5
    edges = [(0, 1), (1, 2), (3, 4)]
    feature = np.random.randn(5, 100).astype(np.float32)

    g = pgl.Graph(num_nodes=num_nodes,
        edges=edges,
        node_feat={
            "h": feature
        })
    g.tensor()

    def send_func(src_feat, dst_feat, edge_feat):
        return src_feat

    def recv_func(msg):
        return msg.reduce_sum(msg["h"]) 
     
    msg = g.send(send_func, src_feat=g.node_feat)

    ret = g.recv(recv_func, msg)

Highlight: Flexibility - Natively Support Heterogeneous Graph Learning

Graph can conveniently represent the relation between things in the real world, but the categories of things and the relation between things are various. Therefore, in the heterogeneous graph, we need to distinguish the node types and edge types in the graph network. PGL models heterogeneous graphs that contain multiple node types and multiple edge types, and can describe complex connections between different types.

Support meta path walk sampling on heterogeneous graph

The left side of the figure above describes a shopping social network. The nodes above have two categories of users and goods, and the relations between users and users, users and goods, and goods and goods. The right of the above figure is a simple sampling process of MetaPath. When you input any MetaPath as UPU (user-product-user), you will find the following results

Then on this basis, and introducing word2vec and other methods to support learning metapath2vec and other algorithms of heterogeneous graph representation.

Support Message Passing mechanism on heterogeneous graph

Because of the different node types on the heterogeneous graph, the message delivery is also different. As shown on the left, it has five neighbors, belonging to two different node types. As shown on the right of the figure above, nodes belonging to different types need to be aggregated separately during message delivery, and then merged into the final message to update the target node. On this basis, PGL supports heterogeneous graph algorithms based on message passing, such as GATNE and other algorithms.

Large-Scale: Support distributed graph storage and distributed training algorithms

In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted PaddleFleet as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so it can easily set up a large scale distributed training algorithm with MPI clusters.

Model Zoo

The following graph learning models have been implemented in the framework. You can find more examples and the details here.

Model	feature
ERNIESage	ERNIE SAmple aggreGatE for Text and Graph
GCN	Graph Convolutional Neural Networks
GAT	Graph Attention Network
GraphSage	Large-scale graph convolution network based on neighborhood sampling
unSup-GraphSage	Unsupervised GraphSAGE
LINE	Representation learning based on first-order and second-order neighbors
DeepWalk	Representation learning by DFS random walk
MetaPath2Vec	Representation learning based on metapath
Node2Vec	The representation learning Combined with DFS and BFS
Struct2Vec	Representation learning based on structural similarity
SGC	Simplified graph convolution neural network
GES	The graph represents learning method with node features
DGI	Unsupervised representation learning based on graph convolution network
GATNE	Representation Learning of Heterogeneous Graph based on MessagePassing

The above models consists of three parts, namely, graph representation learning, graph neural network and heterogeneous graph learning, which are also divided into graph representation learning and graph neural network.

System requirements

PGL requires:

paddlepaddle >= 2.2.0
cython

PGL only supports Python 3

Installation

You can simply install it via pip.

pip install pgl

The Team

PGL is developed and maintained by NLP and Paddle Teams at Baidu

E-mail: nlp-gnn[at]baidu.com

License

PGL uses Apache License 2.0.

pgl's People

Contributors

Stargazers

Watchers

Forkers

yelrose kirayummy fantasycheung jiangjiajun worldeditors githubutilities zhui np1973 matrix23-tech liwb5 myminsomnia spbohai whitewum 0xqq muyurainy littlebadrobot windfivefloor seeker1943 liuming88 xjqbest garyxiangk yea02 yaoxuefeng6 achalagarwal zgsxwsdxg wawltor rhn2328 songjgit kiminh lch1234 joejiong hsiachubby sharpiless gaofengs zhch8888168 hu18yuanwai kelvi desmonday yueshang123 chengliangchai skyorca yipeng5 babalaka wrchow rhpatra olenet action-wei weiyuesu asdlei99 1522454735 zhangchao1993 xiuechen vishalbelsare cheeryoung79 yuzhilv burness zzs95 nancy823 zeta1999 angelzou augf swapnil-gandhi xiaoanshi haojiepan1 yangking0834131 mzy2240 lastrei yangyuwu whtt jadentan hablee fanmengdan zuowk wise-x qyd1106 ylyy jiapengwei superxiang zlq-bigdata tspp520 zhaoteng7912 waiyanps charles-chan912 r-hummingbird kengtao xiangzheng666 githubltqc githubgreat886 jiechensimon yuanhuachao ruihwang kpatr1ck lutaochu tianyuzelin liheng0811 zhang-sanqia konstantinklepikov milkigit grd-zs wstchhwp

pgl's Issues

代码中多次出现 import torch 但没有使用，是什么意思？

部分机器没有NC(netcat) 命令

./src/dump_data.sh: line 7: nc: command not found
部分机器没有预装netcat，启动redis时会报错，需要进行提示

GATNE例程有个小bug

GATNE/model.py

trans_weights = fl.create_parameter(
        shape=[
            self.edge_type_count, self.embedding_u_size,
            self.embedding_size // self.att_head
        ],
        attr=fluid.initializer.TruncatedNormalInitializer(
            loc=0.0, scale=1.0 / math.sqrt(self.embedding_size)),
        dtype='float32',
        name='trans_w')

attr应该是default_initializer 吧

distribute_metapath，dis_embedding模式下保存的weight_part0_moment1_0，这是个什么文件，这里保存的是什么？？

跑distribute metapath模型，采用dis embedding模式，跑完后，载入模型参数后，再去看模型保存文件夹里面多了一些weight_moment文件，跟原来的weight参数文件一样大小，这是个什么文件，里面保存的是什么？？

out of memory error

Hallo,
am having difficulties running the node2vec example on gpu

the command python node2vec.py --use_cuda

it seems to work fine on cpu

results in an error message
Both GPU I use have 6GB.
am I overseeing something ?

0223 18:59:36.058634 11348 operator.cc:179] lookup_table raises an exception paddle::memory::allocation::BadAlloc,

Out of memory error on GPU 0. Cannot allocate 6.933594GB memory on GPU 0, available memory is only 726.750000MB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please try one of the following suggestions:
1. Decrease the batch size of your model.
2. FLAGS_fraction_of_gpu_memory_to_use is 0.92 now, please set it to a higher value but less than 1.0.
  The command is export FLAGS_fraction_of_gpu_memory_to_use=xxx.

at (/paddle/paddle/fluid/memory/detail/system_allocator.cc:151)

能否在AIStudio里搞一个运行ErnieSage的项目

能否在AIStudio里搞一个运行ErnieSage的NoteBook项目
这样就可以方便的在AIStudio里研究ErnieSage的用法了

PGL手册里Quick Start Instructions例程报错

使用aistudio飞桨 1.7.2 版本：PaddlePaddle 1.7.2
python3.7

PGL手册里，https://pgl.readthedocs.io/en/latest/quick_start/instruction.html
按照手册里的内容键入notebook，到了这句运行报错：

output = gcn_layer(gw, gw.node_feat['feature'], gw.edge_feat['edge_feature'],
                hidden_size=8, name='gcn_layer_1', activation='relu')
output = gcn_layer(gw, output, gw.edge_feat['edge_feature'],
                hidden_size=1, name='gcn_layer_2', activation=None)

报错信息：

---------------------------------------------------------------------------KeyError Traceback (most recent call last) in
----> 1 output = gcn_layer(gw, gw.node_feat['feature'], gw.edge_feat['edge_feature'],
2 hidden_size=8, name='gcn_layer_1', activation='relu')
3 output = gcn_layer(gw, output, gw.edge_feat['edge_feature'],
4 hidden_size=1, name='gcn_layer_2', activation=None)
KeyError: 'edge_feature'

后来看了一下，print(gw.edge_feat) 显示为{}
g.node_feat_info() 显示为[('feature', [None, 16], dtype('float32'))]

关于调用pgl.Graph.tensor()后计算adj_dst_index报错的问题

您好，我在使用pgl.Graph构建一个存储为numpy.ndarray的图数据之后，调用.tensor()方法将图中的ndarray转换为paddle.Tensor时，会报出如下错误：

进一步观察发现，没转换为tensor之前的Graph中是包含adj_dst_index属性的

然而转换之后，在计算这个属性的degree属性时，paddle.scatter()会报错

同时还发现，如果我分两次分别执行创建Graph以及Graph.tensor()转换(例如在ipython中单步执行两次)，那么转换不会出问题；如果我将这两步操作一起执行，那么在计算adj_dst_index时就会出错。

erniesage example raise generator wrong

Traceback (most recent call last):
File "link_predict.py", line 274, in
train(config)
File "link_predict.py", line 169, in train
train_ds = Dataset.from_generator_func(train_iter).repeat(config.epochs)
File "/home/alphis/ws/pei_zhi_huan_jing/ERNIE/propeller/data/functional.py", line 278, in from_generator_func
raise ValueError('expect generator function, got %s' % repr(_gen))

请问STGCN的样例数据在哪下载？

PGL/examples/stgcn中没有提供数据下载方式。

【Windows】Could not find a version that satisfies the requirement pgl

win10
anaconda
python 3.7.1
paddlepaddle-gpu 1.7.1.post97
使用pip install pgl 提示 Could not find a version that satisfies the requirement pgl
用pip search pgl 提示有pgl(1.1.0)版本
请问是哪里有问题导致无法安装这个包呢QAQ

run error

win10 Platform
paddle 1.8.5

错别字

https://github.com/PaddlePaddle/PGL/blob/master/README.zh.md
@ line 31
"并不同充分利用GPU进行加速"

部分机器没有NC(netcat) 命令

./src/dump_data.sh: line 7: nc: command not found
部分机器没有预装netcat，启动redis时会报错，需要进行提示

GATNE的异构节点复现

GATNE的示例是同构节点、异构边，好像没找到GATNE在https://github.com/THUDM/GATNE/tree/master/data另外三个数据集上异构节点的复现？

Feature Request: Multi-GPU Training for UniMP Models

Hello,

Thank you for your work.

@sys1874 Can you please implement multi-gpu training for the UniMP models?

graphsage的fleet分布式的问题

hi, 我跑了PGL提供的graphsage的demo，可以正常跑，然后把本地的程序改成了fleet的分布式。网络结构和超参数都没有变，启动一个pserver和一个worker，发现fleet的分布式程序loss不降，请问事什么问题。
下面是我跑本地版graphsage的log

这是我fleet分布式的log

下面是main函数的分布式部分，我只修改了main函数

def main(args):
    data = load_data(args.normalize, args.symmetry)
    log.info("preprocess finish")
    log.info("Train Examples: %s" % len(data["train_index"]))
    log.info("Val Examples: %s" % len(data["val_index"]))
    log.info("Test Examples: %s" % len(data["test_index"]))
    log.info("Num nodes %s" % data["graph"].num_nodes)
    log.info("Num edges %s" % data["graph"].num_edges)
    log.info("Average Degree %s" % np.mean(data["graph"].indegree()))

    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
    train_program = fluid.default_main_program()
    startup_program = fluid.default_startup_program()
    samples = []
    if args.samples_1 > 0:
        samples.append(args.samples_1)
    if args.samples_2 > 0:
        samples.append(args.samples_2)

    with fluid.program_guard(train_program, startup_program):
        feature, feature_init = paddle_helper.constant(
            "feat",
            dtype=data['feature'].dtype,
            value=data['feature'],
            hide_batch_size=False)

        graph_wrapper = pgl.graph_wrapper.GraphWrapper(
            "sub_graph", place, node_feat=data['graph'].node_feat_info())
        model_loss, model_acc = build_graph_model(
            graph_wrapper,
            num_class=data["num_class"],
            feature=feature,
            hidden_size=args.hidden_size,
            graphsage_type=args.graphsage_type,
            k_hop=len(samples))

    test_program = train_program.clone(for_test=True)
    
    trainer_id = int(os.environ["PADDLE_TRAINER_ID"])
    trainers = int(os.environ["PADDLE_TRAINERS"])
    training_role = os.environ["PADDLE_TRAINING_ROLE"]
    log.info(training_role )
    training_role = role_maker.Role.WORKER if training_role == "TRAINER" else role_maker.Role.SERVER
    ports = os.getenv("PADDLE_PSERVER_PORTS")
    pserver_ip = os.getenv("PADDLE_PSERVER_IP", "")
    pserver_endpoints = []
    for port in ports.split(","):
        pserver_endpoints.append(':'.join([pserver_ip, port]))
    role = role_maker.UserDefinedRoleMaker(current_id=trainer_id, role=training_role, worker_num=trainers, server_endpoints=pserver_endpoints)
    config = DistributeTranspilerConfig()
    config.sync_mode = True

    fleet.init(role)
    optimizer = fluid.optimizer.SGD(learning_rate=args.lr)
    optimizer = fleet.distributed_optimizer(optimizer, config)
    optimizer.minimize(model_loss)

    exe = fluid.Executor(place)

    if fleet.is_server():
        log.info('running server')
        fleet.init_server()
        fleet.run_server()

    if fleet.is_worker():
        log.info('running worker')
        fleet.init_worker()
        exe.run(fleet.startup_program)
        feature_init(place)

erniesage的样例数据去哪下载？

我按照https://github.com/PaddlePaddle/PGL/tree/master/examples/erniesage上去运行样例，然后报错：

+ python3 ./preprocessing/dump_graph.py -i ./data.txt -o ./workdir --encoding utf8 -l 40 --vocab_file ./vocab.txt
Traceback (most recent call last):
  File "./preprocessing/dump_graph.py", line 121, in <module>
    dump_graph(args)
  File "./preprocessing/dump_graph.py", line 57, in dump_graph
    with io.open(args.inpath, encoding=args.encoding) as f:
FileNotFoundError: [Errno 2] No such file or directory: './data.txt'

这应该是缺了样例数据吧，我翻了下文档也没找到，你们能提供吗？

How were the hyperparameters chosen for OGB?

You write in the paper the parameters that were selected, but could you go into more detail about how the hyperparameters were chosen? Like, what ranges were you looking at, etc.?

a question, what’s legacy?

sorry to interrupt. i found some distributed version of pgl, but they are in legacy, so what’s legacy? how can i run demo in legacy example, such like ?

thanks!

不同节点特征维度不同如何处理

您好，我在用PGL解决传感器分类问题建模时遇到了困惑，麻烦问一下图神经网络中如果部分节点特征维度不同，或者某些节点压根就没有特征，这种弄情况下除了采用异质图以外还有其他的方案么？另外我理解同质体不同的节点特征维度必须要相同吧，这个理解对么？谢谢！

distribute_deepwalk的问题：offset is too big

您好，在跑distribute_deepwalk的训练阶段时提示offset is too big，然后训练中断。请问这个要怎么修改呀，多谢了~ paddle的版本是1.6，以下是server具体的报错信息。
terminate called after throwing an instance of 'Xbyak::Error'
what(): offset is too big
W1212 15:27:18.687328 19979 init.cc:212] *** Aborted at 1576135638 (unix time) try "date -d @1576135638" if you are using GNU date ***
W1212 15:27:18.690397 19979 init.cc:212] PC: @ 0x0 (unknown)
W1212 15:27:18.690459 19979 init.cc:212] *** SIGABRT (@0x1f400003fd1) received by PID 16337 (TID 0x7f38bbfff700) from PID 16337; stack trace: ***
W1212 15:27:18.693048 19979 init.cc:212] @ 0x7f3ac3ba5160 (unknown)
W1212 15:27:18.695641 19979 init.cc:212] @ 0x7f3ac31133f7 __GI_raise
W1212 15:27:18.698221 19979 init.cc:212] @ 0x7f3ac31147d8 __GI_abort
W1212 15:27:18.699606 19979 init.cc:212] @ 0x7f3a65bdbc65 __gnu_cxx::__verbose_terminate_handler()
W1212 15:27:18.700861 19979 init.cc:212] @ 0x7f3a65bd9e06 __cxxabiv1::__terminate()
W1212 15:27:18.702225 19979 init.cc:212] @ 0x7f3a65bd9e33 std::terminate()
W1212 15:27:18.703485 19979 init.cc:212] @ 0x7f3a65c2c935 execute_native_thread_routine
W1212 15:27:18.705993 19979 init.cc:212] @ 0x7f3ac3b9d1c3 start_thread
W1212 15:27:18.708621 19979 init.cc:212] @ 0x7f3ac31c512d __clone
W1212 15:27:18.711194 19979 init.cc:212] @ 0x0 (unknown)

另外worker有一处报错为PaddleCheckError: Expected posix_memalign(&p, alignment, size) == 0, but received posix_memalign(&p, alignment, size):12 != 0:0.

pgl 在docker中构建graph报segment fault

报错信息：

W0720 03:52:59.207820 3586 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0720 03:52:59.207885 3586 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0720 03:52:59.207908 3586 init.cc:221] The detail failure signal is:

W0720 03:52:59.207937 3586 init.cc:224] W0720 03:52:59.209323 3586 init.cc:224] PC: @ W0720 03:52:59.209501 3586 init.cc:224] W0720 03:52:59.210655 3586 init.cc:224] W0720 03:52:59.211284 3586 init.cc:224] W0720 03:52:59.211845 3586 init.cc:224] W0720 03:52:59.213013 3586 init.cc:224] W0720 03:52:59.214179 3586 init.cc:224] W0720 03:52:59.215322 3586 init.cc:224] W0720 03:52:59.216428 3586 init.cc:224] W0720 03:52:59.217643 3586 init.cc:224] W0720 03:52:59.218773 3586 init.cc:224] W0720 03:52:59.219944 3586 init.cc:224] W0720 03:52:59.221132 3586 init.cc:224] W0720 03:52:59.222276 3586 init.cc:224] W0720 03:52:59.223424 3586 init.cc:224] W0720 03:52:59.224547 3586 init.cc:224] W0720 03:52:59.225683 3586 init.cc:224] W0720 03:52:59.226790 3586 init.cc:224] W0720 03:52:59.227934 3586 init.cc:224] W0720 03:52:59.229049 3586 init.cc:224] W0720 03:52:59.230197 3586 init.cc:224] W0720 03:52:59.231338 3586 init.cc:224] W0720 03:52:59.232473 3586 init.cc:224] W0720 03:52:59.233608 3586 init.cc:224] W0720 03:52:59.234735 3586 init.cc:224] W0720 03:52:59.235885 3586 init.cc:224] W0720 03:52:59.236999 3586 init.cc:224] W0720 03:52:59.238148 3586 init.cc:224] W0720 03:52:59.239271 3586 init.cc:224] W0720 03:52:59.240423 3586 init.cc:224] W0720 03:52:59.241578 3586 init.cc:224] W0720 03:52:59.242700 3586 init.cc:224] W0720 03:52:59.243827 3586 init.cc:224] W0720 03:52:59.244946 3586 init.cc:224] Segmentation fault *** Aborted at 1595217179 (unix time) try "date -d @1595217179" if you are using GNU date ***
0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 3586 (TID 0x7f213102e740) from PID 0; stack trace: ***
@ 0x7f2130840630 (unknown)
@ 0x7f2104830b62 (unknown)
@ 0x7f210483390d (unknown)
@ 0x7f2130b32300 PyEval_EvalFrameEx
@ 0x7f2130b3464d PyEval_EvalCodeEx
@ 0x7f2130abe07d (unknown)
@ 0x7f2130a99073 PyObject_Call
@ 0x7f2130aa8065 (unknown)
@ 0x7f2130a99073 PyObject_Call
@ 0x7f2130af0097 (unknown)
@ 0x7f2130aeedaf (unknown)
@ 0x7f2130a99073 PyObject_Call
@ 0x7f2130b2d846 PyEval_EvalFrameEx
@ 0x7f2130b3464d PyEval_EvalCodeEx
@ 0x7f2130abdf88 (unknown)
@ 0x7f2130a99073 PyObject_Call
@ 0x7f2130a99155 (unknown)
@ 0x7f2130a9923e PyObject_CallFunction
@ 0x7f2130ad5561 _PyObject_GenericGetAttrWithDict
@ 0x7f2130b2f800 PyEval_EvalFrameEx
@ 0x7f2130b3464d PyEval_EvalCodeEx
@ 0x7f2130b31b4c PyEval_EvalFrameEx
@ 0x7f2130b3464d PyEval_EvalCodeEx
@ 0x7f2130abe07d (unknown)
@ 0x7f2130a99073 PyObject_Call
@ 0x7f2130aa8065 (unknown)
@ 0x7f2130a99073 PyObject_Call
@ 0x7f2130af0097 (unknown)
@ 0x7f2130aeedaf (unknown)
@ 0x7f2130a99073 PyObject_Call
@ 0x7f2130b2d846 PyEval_EvalFrameEx
@ 0x7f2130b3464d PyEval_EvalCodeEx

测试代码：
import pgl
from pgl import graph # import pgl module
import numpy as np

edge_list=[]
node_list=[]
filename="./testnode.txt"
#filename="./edgelist.txt"
try:
fp=open(filename,"r")
print("%s open succesfully"%filename)
done=False
while not done:
aline=fp.readline();
if aline!="":
a=long(aline.strip("\n").strip("(").strip(")").split(",")[0])
b=long(aline.strip("\n").strip("(").strip(")").split(",")[1])
edge_list.append((a,b))
node_list.append(a)
node_list.append(b)
else:
done=True
fp.close()
except IOError:
print("%s open failed"%filename)

print type(list)
print(edge_list)
print(node_list)
news_nodes = []

for id in node_list:
if id not in news_nodes:
news_nodes.append(id)
num_node=len(news_nodes)
print("%d nodes"%num_node)
print(news_nodes)

#num_node=359625121
d = 8
feature = np.random.randn(num_node, d).astype("float32")
edge_feature = np.random.randn(len(edge_list), d).astype("float32")

g = graph.Graph(num_nodes = num_node,edges = edge_list,node_feat = {'feature':feature},edge_feat ={'edge_feature': edge_feature})

读入的文件内容：
(-4649263633069986650,-5524713781035048896)
(1271252742248293960,5257009004707542774)
(-7235151214169208912,-801785351457387666)
(3878525930642985553,-4845667399646036208)
(3787724060214927072,2498311633260070737)
(-1926501030799233262,-2531193103782375080)
(5379329040149508336,-1753726466388496271)
(-2731878471209782768,7141705257445771183)
(208737387650153426,-4842986495434924796)
(6859451388928841912,-1849683348999068048)
(-5297793359529213911,-730495328556966270)
(-8990674550612404115,5475799282437796300)
(8281941951883016219,2817935382386340348)
(-3531905366784664905,8330449695852382521)

我的paddle版本是:
PaddlePaddle 1.8.3, compiled with
with_avx: ON
with_gpu: OFF
with_mkl: ON
with_mkldnn: ON
with_python: ON

python版本2.7.5

erniesage的例子

用python ./preprocessing/dump_graph.py --conf='./config/erniesage_link_predict.yaml'运行后，报错：

Traceback (most recent call last):
File "./preprocessing/dump_graph.py", line 221, in
dump_node_feat(config)
File "./preprocessing/dump_graph.py", line 185, in dump_node_feat
tokenizer = ErnieTinyTokenizer.from_pretrained(config.ernie_name)
File "/home/alphis/ws/pei_zhi_huan_jing/ERNIE/ernie/tokenizing_ernie.py", line 223, in from_pretrained
t = cls(vocab_dict, sp_model_path, **kwargs)
File "/home/alphis/ws/pei_zhi_huan_jing/ERNIE/ernie/tokenizing_ernie.py", line 232, in init
self.sp_model.Load(sp_model_path)
File "/home/alphis/anaconda3/envs/nlp/lib/python3.7/site-packages/sentencepiece/init.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/home/alphis/anaconda3/envs/nlp/lib/python3.7/site-packages/sentencepiece/init.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

status:[-1] meets grpc error, error_code:4 error_message:Deadline Exceeded error_details:

在distribute_metapath2vec项目里，换成别的的数据，sgd模式能跑的很好，就是loss下降很慢，而将optimizer从sgd换成adam就会报如上错？？
另外，为什么在此例子里，sgd模式lr＝lr✖️real_batchsize?导致刚开始的lr极大，很容易loss爆炸，这么写的原因是什么呢？？

请问pinSage在什么地方呢？

怎么输入一个batch的图

我想输入一个batch的图，比如batch=32，每个图都很小，独立的。如何像输入图片一样的同时输入，有这样的例子吗

关于分布式版本部署的问题

PGL的分布式版本可以在哪些平台上部署呢？有分布式版本PGL的部署指引可以提供吗？example中提供的deepwalk及graphsage的分布式版本的运行环境可以详细说明下吗？

请教pgl安装问题

(paddle) C:\Users\l>pip install pgl
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement pgl (from versions: none)
ERROR: No matching distribution found for pgl

1-Introduction运行问题

---------------------------------------------------------------------------EnforceNotMet Traceback (most recent call last) in
111 feed_dict['node_label'] = label
112
--> 113 train_loss = exe.run(fluid.default_main_program(), feed=feed_dict, fetch_list=[loss], return_numpy=True)
114 print('Epoch %d | Loss: %f'%(epoch, train_loss[0]))
115 print('1')
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
649 scope=scope,
650 return_numpy=return_numpy,
--> 651 use_program_cache=use_program_cache)
652 else:
653 if fetch_list and program._is_data_parallel and program._program and (
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in _run(self, program, exe, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
747 self._feed_data(program, feed, feed_var_name, scope)
748 if not use_program_cache:
--> 749 exe.run(program.desc, scope, 0, True, True, fetch_var_name)
750 else:
751 exe.run_cached_prepared_ctx(ctx, scope, False, False, False)
EnforceNotMet: Invoke operator sigmoid_cross_entropy_with_logits error.

Windows10系统下pgl无法安装

Windows10系统，paddlepaddle版本1.6.1.post107，cudnn是7.3.1，CDUA是10.0.0，python3.7.5。
在conda命令行中输入pip install pgl，错误提示如下：

请问在我上述环境下pgl应该通过何种途径安装？

请问PGL框架进行分布式训练采用的是Parameter Server架构还是ring-AllReduce啊

感谢回复

GES Example，训练提前break后程序会hang住无法正常退出

环境：python3、最新pgl、paddle1.6、gpu环境
和example中的ges代码变动如下，就是step=5的时候提前break，程序会卡住：

def train(train_exe, exe, program, loss, node2vec_pyreader, args, train_steps):
    """ train
    """
    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
    step = 0
    while True:
        try:
            begin_time = time.time()
            loss_val, = train_exe.run(fetch_list=[loss])
            log.info("step %s: loss %.5f speed: %.5f s/step" %
                     (step, np.mean(loss_val), time.time() - begin_time))
            step += 1
            if step == 5:
                break
        except F.core.EOFException:
            node2vec_pyreader.reset()

        if (step % args.steps_per_save == 0 or
                step == train_steps) and trainer_id == 0:

            model_save_dir = args.output_path
            model_path = os.path.join(model_save_dir, str(step))
            if not os.path.exists(model_save_dir):
                os.makedirs(model_save_dir)
            F.io.save_params(exe, model_path, program)

        if step == train_steps:
            break

卡住的显示

请问pgl支持paddle2.0吗？

看了一下paddle2.0的介绍，弱化了lodtensor的概念，相关的一些api比如sequence_pool也不能使用了，升级到2.0后要怎么使用pgl呢？

【windows】pgl ImportError: DLL load failed: 找不到指定的模块。

win10+py 3.6+paddle 1.8.0 +pgl 1.1.0
你好，在pip install pgl成功之后;
import pgl 时报错

 import pgl.graph_kernel as graph_kernel
ImportError: DLL load failed: 找不到指定的模块。

请问该如何解决

Unsupervised GraphSAGE 结果

PGL/examples/unsup_graphsage下运行python train.py --data_path ./sample.txt --num_nodes 2000 --phase predict，出来的emb.npy文件里是多个embedding结果。

这个结果怎么和sample.txt中的id一一对应起来？第n行的embedding结果对应id=n的embedding吗？但是没有1164这个id，emb.npy文件第1164行却有embedding结果。

Pinsage Example

请问有个实现Pinsage的例子么

固定随机数种子仍然不能每次稳定复现结果

对于图卷积的几个example（GCN/GAT/GIN），按照如下方式固定paddle和numpy的随机数种子：

seed = 123
train_program.random_seed = seed
startup_program.random_seed = seed
np.random.seed(seed)
random.seed(seed)

在cpu上每次的运行结果完全一样，但是在gpu上运行时每次差别很大；之后尝试把pgl的图卷积层去掉，这样模型在gpu上也可以稳定复现了，想问一下是不是pgl的底层实现对于cuda还存在某些随机性的操作？

ps：尝试过设置export FLAGS_cudnn_deterministic=True ，似乎也没有用

pgl.sample deepwalk_sample报错

输入图为多通道如何实现？

在神经网络中，输入数据可以是多张图片等。
请问在PGL中，是否允许在一个模型中传输进入多个图，并且多个图的size还不一样呢？

GAT 带权重边

请问如果想要在GAT中加入graph边权重的信息，应该在哪里修改呢？

谢谢！

大规模图分析的实践方法

我们目前有几千万的节点，过亿条变，进行图分析（node classification/link prediction）如何进行比较好。

图数据需要从图数据库中导出吗？
这种规模的图分析如何有效地进行？
一些节点信息额增加或者属性的变化，如何高效、增量地进行？

Windows系统下pgl应如何安装

Windows10系统，paddlepaddle版本1.6.1.post107，cudnn是7.3.1，CDUA是10.0.0，python3.7.5。
在conda命令行中输入pip install pgl，错误提示如下：

请问在我上述环境下对应的pgl应该如何安装？

Warning: PaddlePaddle catches a failure signal, it may not work properly

W0702 12:33:15.146848 3068 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0702 12:33:15.146860 3068 init.cc:221] The detail failure signal is:

W0702 12:33:15.146873 3068 init.cc:224] *** Aborted at 1593664395 (unix time) try "date -d @1593664395" if you are using GNU date ***
W0702 12:33:15.149258 3068 init.cc:224] PC: @ 0x0 (unknown)
W0702 12:33:15.150092 3068 init.cc:224] *** SIGSEGV (@0x558e0a54bca0) received by PID 3068 (TID 0x7f4da11da740) from PID 173325472; stack trace: ***
W0702 12:33:15.152169 3068 init.cc:224] @ 0x7f4da0dc45f0 (unknown)
W0702 12:33:15.153133 3068 init.cc:224] @ 0x7f4d5e96b892 (unknown)
W0702 12:33:15.154075 3068 init.cc:224] @ 0x7f4d5e96da6e (unknown)
W0702 12:33:15.154742 3068 init.cc:224] @ 0x558e04472c94 _PyMethodDef_RawFastCallKeywords
W0702 12:33:15.155367 3068 init.cc:224] @ 0x558e04472db1 _PyCFunction_FastCallKeywords
W0702 12:33:15.155993 3068 init.cc:224] @ 0x558e044de5be _PyEval_EvalFrameDefault
W0702 12:33:15.156579 3068 init.cc:224] @ 0x558e044222b9 _PyEval_EvalCodeWithName
W0702 12:33:15.157160 3068 init.cc:224] @ 0x558e04423610 _PyFunction_FastCallDict
W0702 12:33:15.157730 3068 init.cc:224] @ 0x558e04441b93 _PyObject_Call_Prepend
W0702 12:33:15.158006 3068 init.cc:224] @ 0x558e044790aa slot_tp_init
W0702 12:33:15.158625 3068 init.cc:224] @ 0x558e04479ca8 _PyObject_FastCallKeywords
W0702 12:33:15.159250 3068 init.cc:224] @ 0x558e044ded78 _PyEval_EvalFrameDefault
W0702 12:33:15.159832 3068 init.cc:224] @ 0x558e0442331b _PyFunction_FastCallDict
W0702 12:33:15.160137 3068 init.cc:224] @ 0x558e04484dc2 property_descr_get
W0702 12:33:15.160692 3068 init.cc:224] @ 0x558e044369f1 _PyObject_GenericGetAttrWithDict
W0702 12:33:15.161314 3068 init.cc:224] @ 0x558e044da0ba _PyEval_EvalFrameDefault
W0702 12:33:15.161890 3068 init.cc:224] @ 0x558e044222b9 _PyEval_EvalCodeWithName
W0702 12:33:15.162429 3068 init.cc:224] @ 0x558e04472435 _PyFunction_FastCallKeywords
W0702 12:33:15.163048 3068 init.cc:224] @ 0x558e044d9e70 _PyEval_EvalFrameDefault
W0702 12:33:15.163589 3068 init.cc:224] @ 0x558e0447220b _PyFunction_FastCallKeywords
W0702 12:33:15.164211 3068 init.cc:224] @ 0x558e044d9e70 _PyEval_EvalFrameDefault
W0702 12:33:15.164783 3068 init.cc:224] @ 0x558e044222b9 _PyEval_EvalCodeWithName
W0702 12:33:15.165321 3068 init.cc:224] @ 0x558e04472497 _PyFunction_FastCallKeywords
W0702 12:33:15.165946 3068 init.cc:224] @ 0x558e044dacba _PyEval_EvalFrameDefault
W0702 12:33:15.166486 3068 init.cc:224] @ 0x558e0447220b _PyFunction_FastCallKeywords
W0702 12:33:15.167104 3068 init.cc:224] @ 0x558e044d9be6 _PyEval_EvalFrameDefault
W0702 12:33:15.167680 3068 init.cc:224] @ 0x558e044222b9 _PyEval_EvalCodeWithName
W0702 12:33:15.168279 3068 init.cc:224] @ 0x558e044231d4 PyEval_EvalCodeEx
W0702 12:33:15.168853 3068 init.cc:224] @ 0x558e044231fc PyEval_EvalCode
W0702 12:33:15.169260 3068 init.cc:224] @ 0x558e04538f44 run_mod
W0702 12:33:15.169814 3068 init.cc:224] @ 0x558e045432b1 PyRun_FileExFlags
W0702 12:33:15.170437 3068 init.cc:224] @ 0x558e045434a3 PyRun_SimpleFileExFlags
Segmentation fault

GATNE

请问GATNE中的信息聚合过程涉及的邻居层数k是多少，可以怎么设置？

希望加入无监督的例子

ImportError: No module named queue

python2.7
paddle1.7.1
pgl 1.1
在 k8s 队列和本地均会报这个错

python -m paddle.distributed.launch train.py --conf config/erniesage_v2_gpu.yaml
[INFO] 2020-05-25 19:28:03,309 [mp_reader.py: 23]: ujson not install, fail back to use json instead
Traceback (most recent call last):
File "train.py", line 26, in
from dataset.graph_reader import GraphGenerator
File "/root/paddlejob/workspace/env_run/dataset/graph_reader.py", line 17, in
from dataset.base_dataset import BaseDataGenerator
File "/root/paddlejob/workspace/env_run/dataset/base_dataset.py", line 31, in
from pgl.utils import mp_reader
File "/opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/pgl/utils/mp_reader.py", line 28, in
from queue import Queue
ImportError: No module named queue

在本地可以修改为
from multiprocessing import Queue ，然后可以正常运行，但 k8s 队列上的机器我没有修改权限，请问如何解？

paddle1.7多进程问题

我使用python2.7.13+paddle-gpu-1.7+pgl 1.0
测试代码如下

from multiprocessing import Pool
import pgl

def f(x):
    return x*x

p = Pool(1) 
x = [1,2,3,4,5,6]
y = p.map(f, x)
print y
p.terminate()

运行这段程序，会报这个错误

W0507 19:54:19.298550 17687 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0507 19:54:19.298624 17687 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0507 19:54:19.298629 17687 init.cc:214] The detail failure signal is:

W0507 19:54:19.298636 17687 init.cc:217] *** Aborted at 1588852459 (unix time) try "date -d @1588852459" if you are using GNU date ***
W0507 19:54:19.300464 17687 init.cc:217] PC: @                0x0 (unknown)
W0507 19:54:19.300927 17687 init.cc:217] *** SIGTERM (@0x20000004477) received by PID 17687 (TID 0x7ff960b48700) from PID 17527; stack trace: ***
W0507 19:54:19.302687 17687 init.cc:217]     @     0x7ff9602ff160 (unknown)
W0507 19:54:19.304461 17687 init.cc:217]     @     0x7ff960589860 (unknown)
W0507 19:54:19.306123 17687 init.cc:217]     @     0x7ff96061961c PyEval_EvalFrameEx
W0507 19:54:19.307770 17687 init.cc:217]     @     0x7ff96061c0bd PyEval_EvalCodeEx
W0507 19:54:19.309356 17687 init.cc:217]     @     0x7ff960592eb0 function_call
W0507 19:54:19.311004 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.312568 17687 init.cc:217]     @     0x7ff96056f9cd instancemethod_call
W0507 19:54:19.314218 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.315840 17687 init.cc:217]     @     0x7ff9605cebaf slot_tp_init
W0507 19:54:19.317483 17687 init.cc:217]     @     0x7ff9605cb46f type_call
W0507 19:54:19.319144 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.320789 17687 init.cc:217]     @     0x7ff9606164a6 PyEval_EvalFrameEx
W0507 19:54:19.322435 17687 init.cc:217]     @     0x7ff960619460 PyEval_EvalFrameEx
W0507 19:54:19.324079 17687 init.cc:217]     @     0x7ff960619460 PyEval_EvalFrameEx
W0507 19:54:19.325723 17687 init.cc:217]     @     0x7ff96061c0bd PyEval_EvalCodeEx
W0507 19:54:19.327309 17687 init.cc:217]     @     0x7ff960592eb0 function_call
W0507 19:54:19.328956 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.330516 17687 init.cc:217]     @     0x7ff96056f9cd instancemethod_call
W0507 19:54:19.332163 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.333753 17687 init.cc:217]     @     0x7ff9605cebaf slot_tp_init
W0507 19:54:19.335361 17687 init.cc:217]     @     0x7ff9605cb46f type_call
W0507 19:54:19.337008 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.338646 17687 init.cc:217]     @     0x7ff9606164a6 PyEval_EvalFrameEx
W0507 19:54:19.340299 17687 init.cc:217]     @     0x7ff96061c0bd PyEval_EvalCodeEx
W0507 19:54:19.341948 17687 init.cc:217]     @     0x7ff960619345 PyEval_EvalFrameEx
W0507 19:54:19.343593 17687 init.cc:217]     @     0x7ff96061c0bd PyEval_EvalCodeEx
W0507 19:54:19.345240 17687 init.cc:217]     @     0x7ff96061c1f2 PyEval_EvalCode
W0507 19:54:19.346880 17687 init.cc:217]     @     0x7ff960644f42 PyRun_FileExFlags
W0507 19:54:19.348536 17687 init.cc:217]     @     0x7ff9606462d9 PyRun_SimpleFileExFlags
W0507 19:54:19.350172 17687 init.cc:217]     @     0x7ff96065c00d Py_Main
W0507 19:54:19.401824 17687 init.cc:217]     @     0x7ff95f859bd5 __libc_start_main
W0507 19:54:19.420583 17687 init.cc:217]     @           0x4007a1 (unknown)

后来发现把p.terminate()换成p.close()就可以正常跑了

但是，运行命令

 python preprocessing/dump_graph.py -i data.txt -o work_dir --encoding utf-8 -l 40 --vocab_file vocab/vocab.txt

发现会报错，即使把p.terminate()换成p.close()也不行，只有把进程数开的比较小才行（我换成了两个进程）
请问这种情况怎么解决

ModuleNotFoundError: No module named 'pgl.utils.log_writer'

when I run examples/deeper_gcn/train.py shows that"ModuleNotFoundError: No module named 'pgl.utils.log_writer"。And my paddle version is 1.7.2 and PGL version is v1.1

pgl2.1 centos支持

CentOS release 6.3 (Final) 无法使用pgl 2.1 的whl包安装. 我查看了一下, 猜测是因为2.1的whl包是基于manylinux2014打包的, 而之前基于manylinux1打包的2.0a是可以的. 想问一下后续的linux版本都会使用manylinux2014打包么, 以及如果我想在目前这台Centos 上安装pgl2.1 是不是需要自己下载源码编译安装? 谢谢!