Code Monkey home page Code Monkey logo

pgl's Introduction

The logo of Paddle Graph Learning (PGL)

PyPi Latest Release License

DOC | Quick Start | 中文

Breaking News !!

One amazing paper about knowledge representation learning was accepted! (2022.05.06)

  • Simple and Effective Relation-based Embedding Propagation for Knowledge Representation Learning, to appear in IJCAI2022. Code can be found here.

PGL v2.2 2021.12.20

  • Graph4Rec: We released a universal and large-scale toolkit with graph neural networks for recommender systems. Details can be found here.

  • Graph4KG: We released a flexible framework named Graph4KG to learn embeddings of entities and relations in KGs, which supports training on massive KGs. Details can be found here.

  • GNNAutoScale: PGL now supports GNNAutoScale framework, which can scale arbitrary message-passing GNNs to large graphs. Details can be found here.

🔥 🔥 🔥 OGB-LSC KDD CUP 2021 winners announced!! (2021.06.17)

Super excited to announce our PGL team won TWO FIRST place and ONE SECOND place in a total of three track in OGB-LSC KDD CUP 2021. Leaderboards can be found here.

  • First place in MAG240M-LSC track: Code and Technical Report can be found here.

  • First place in WikiKG90M-LSC track: Code and Technical Report can be found here.

  • Second place in PCQM4M-LSC track: Code and Technical Report can be found here.

Two amazing paper using PGL are accepted: (2021.06.17)

  • Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification, to appear in IJCAI2021.
  • HGAMN: Heterogeneous Graph Attention Matching Network for Multilingual POI Retrieval at Baidu Maps, to appear in KDD2021.

PGL Dstributed Graph Engine API released!!

  • Our Dstributed Graph Engine API has been released and we developed a tutorial to show how to launch a graph engine and a demo for training model using graph engine.

Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle.

The Framework of Paddle Graph Learning (PGL)

The newly released PGL supports heterogeneous graph learning on both walk based paradigm and message-passing based paradigm by providing MetaPath sampling and Message Passing mechanism on heterogeneous graph. Furthermor, The newly released PGL also support distributed graph storage and some distributed training algorithms, such as distributed deep walk and distributed graphsage. Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.

One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt Message Passing Paradigm similar to DGL to help to build a customize graph neural network easily. Users only need to write send and recv functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function to send the message from the source to the target node. For the second step, the recv function is responsible for aggregating messages together from different sources.

The basic idea of message passing paradigm

To write a sum aggregator, users only need to write the following codes.

    import pgl
    import paddle
    import numpy as np

    
    num_nodes = 5
    edges = [(0, 1), (1, 2), (3, 4)]
    feature = np.random.randn(5, 100).astype(np.float32)

    g = pgl.Graph(num_nodes=num_nodes,
        edges=edges,
        node_feat={
            "h": feature
        })
    g.tensor()

    def send_func(src_feat, dst_feat, edge_feat):
        return src_feat

    def recv_func(msg):
        return msg.reduce_sum(msg["h"]) 
     
    msg = g.send(send_func, src_feat=g.node_feat)

    ret = g.recv(recv_func, msg)

Highlight: Flexibility - Natively Support Heterogeneous Graph Learning

Graph can conveniently represent the relation between things in the real world, but the categories of things and the relation between things are various. Therefore, in the heterogeneous graph, we need to distinguish the node types and edge types in the graph network. PGL models heterogeneous graphs that contain multiple node types and multiple edge types, and can describe complex connections between different types.

Support meta path walk sampling on heterogeneous graph

The metapath sampling in heterogeneous graph

The left side of the figure above describes a shopping social network. The nodes above have two categories of users and goods, and the relations between users and users, users and goods, and goods and goods. The right of the above figure is a simple sampling process of MetaPath. When you input any MetaPath as UPU (user-product-user), you will find the following results

The metapath result

Then on this basis, and introducing word2vec and other methods to support learning metapath2vec and other algorithms of heterogeneous graph representation.

Support Message Passing mechanism on heterogeneous graph

The message passing mechanism on heterogeneous graph

Because of the different node types on the heterogeneous graph, the message delivery is also different. As shown on the left, it has five neighbors, belonging to two different node types. As shown on the right of the figure above, nodes belonging to different types need to be aggregated separately during message delivery, and then merged into the final message to update the target node. On this basis, PGL supports heterogeneous graph algorithms based on message passing, such as GATNE and other algorithms.

Large-Scale: Support distributed graph storage and distributed training algorithms

In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted PaddleFleet as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so it can easily set up a large scale distributed training algorithm with MPI clusters.

The distributed frame of PGL

Model Zoo

The following graph learning models have been implemented in the framework. You can find more examples and the details here.

Model feature
ERNIESage ERNIE SAmple aggreGatE for Text and Graph
GCN Graph Convolutional Neural Networks
GAT Graph Attention Network
GraphSage Large-scale graph convolution network based on neighborhood sampling
unSup-GraphSage Unsupervised GraphSAGE
LINE Representation learning based on first-order and second-order neighbors
DeepWalk Representation learning by DFS random walk
MetaPath2Vec Representation learning based on metapath
Node2Vec The representation learning Combined with DFS and BFS
Struct2Vec Representation learning based on structural similarity
SGC Simplified graph convolution neural network
GES The graph represents learning method with node features
DGI Unsupervised representation learning based on graph convolution network
GATNE Representation Learning of Heterogeneous Graph based on MessagePassing

The above models consists of three parts, namely, graph representation learning, graph neural network and heterogeneous graph learning, which are also divided into graph representation learning and graph neural network.

System requirements

PGL requires:

  • paddlepaddle >= 2.2.0
  • cython

PGL only supports Python 3

Installation

You can simply install it via pip.

pip install pgl

The Team

PGL is developed and maintained by NLP and Paddle Teams at Baidu

E-mail: nlp-gnn[at]baidu.com

License

PGL uses Apache License 2.0.

pgl's People

Contributors

aurelius84 avatar burness avatar cfangplus avatar dependabot[bot] avatar desmonday avatar dongzhex avatar githubutilities avatar hemingkai avatar hwchen1 avatar ivam-he avatar kirayummy avatar lemonnoel avatar liwb5 avatar qianli-wu avatar raindrops2sea avatar sys1874 avatar veyron95 avatar wawltor avatar weiyuesu avatar wenjinw avatar xjqbest avatar yelrose avatar yinpeiqi avatar zbmain avatar zhui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgl's Issues

GATNE例程有个小bug

GATNE/model.py

trans_weights = fl.create_parameter(
        shape=[
            self.edge_type_count, self.embedding_u_size,
            self.embedding_size // self.att_head
        ],
        attr=fluid.initializer.TruncatedNormalInitializer(
            loc=0.0, scale=1.0 / math.sqrt(self.embedding_size)),
        dtype='float32',
        name='trans_w')

attr应该是default_initializer 吧

out of memory error

Hallo,
am having difficulties running the node2vec example on gpu

the command python node2vec.py --use_cuda

it seems to work fine on cpu

results in an error message
Both GPU I use have 6GB.
am I overseeing something ?

0223 18:59:36.058634 11348 operator.cc:179] lookup_table raises an exception paddle::memory::allocation::BadAlloc,

Out of memory error on GPU 0. Cannot allocate 6.933594GB memory on GPU 0, available memory is only 726.750000MB.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please try one of the following suggestions:
    1. Decrease the batch size of your model.
    2. FLAGS_fraction_of_gpu_memory_to_use is 0.92 now, please set it to a higher value but less than 1.0.
      The command is export FLAGS_fraction_of_gpu_memory_to_use=xxx.

at (/paddle/paddle/fluid/memory/detail/system_allocator.cc:151)

PGL手册里Quick Start Instructions例程报错

使用aistudio飞桨 1.7.2 版本:PaddlePaddle 1.7.2
python3.7

PGL手册里,https://pgl.readthedocs.io/en/latest/quick_start/instruction.html
按照手册里的内容键入notebook,到了这句运行报错:

output = gcn_layer(gw, gw.node_feat['feature'], gw.edge_feat['edge_feature'],
                hidden_size=8, name='gcn_layer_1', activation='relu')
output = gcn_layer(gw, output, gw.edge_feat['edge_feature'],
                hidden_size=1, name='gcn_layer_2', activation=None)

报错信息:

---------------------------------------------------------------------------KeyError Traceback (most recent call last) in
----> 1 output = gcn_layer(gw, gw.node_feat['feature'], gw.edge_feat['edge_feature'],
2 hidden_size=8, name='gcn_layer_1', activation='relu')
3 output = gcn_layer(gw, output, gw.edge_feat['edge_feature'],
4 hidden_size=1, name='gcn_layer_2', activation=None)
KeyError: 'edge_feature'

后来看了一下,print(gw.edge_feat) 显示为{}
g.node_feat_info() 显示为[('feature', [None, 16], dtype('float32'))]

关于调用pgl.Graph.tensor()后计算adj_dst_index报错的问题

您好,我在使用pgl.Graph构建一个存储为numpy.ndarray的图数据之后,调用.tensor()方法将图中的ndarray转换为paddle.Tensor时,会报出如下错误:

7ed401e7daa7b83fe3d8806fc272864

进一步观察发现,没转换为tensor之前的Graph中是包含adj_dst_index属性的

df3f6f1a8c098742441ce290220c508

然而转换之后,在计算这个属性的degree属性时,paddle.scatter()会报错
226174205fb2b77a9c8364a464991f0

同时还发现,如果我分两次分别执行创建Graph以及Graph.tensor()转换(例如在ipython中单步执行两次),那么转换不会出问题;如果我将这两步操作一起执行,那么在计算adj_dst_index时就会出错。

erniesage example raise generator wrong

Traceback (most recent call last):
File "link_predict.py", line 274, in
train(config)
File "link_predict.py", line 169, in train
train_ds = Dataset.from_generator_func(train_iter).repeat(config.epochs)
File "/home/alphis/ws/pei_zhi_huan_jing/ERNIE/propeller/data/functional.py", line 278, in from_generator_func
raise ValueError('expect generator function, got %s' % repr(_gen))

GATNE的异构节点复现

GATNE的示例是同构节点、异构边,好像没找到GATNE在https://github.com/THUDM/GATNE/tree/master/data另外三个数据集上异构节点的复现?

graphsage的fleet分布式的问题

hi, 我跑了PGL提供的graphsage的demo,可以正常跑,然后把本地的程序改成了fleet的分布式。网络结构和超参数都没有变,启动一个pserver和一个worker,发现fleet的分布式程序loss不降,请问事什么问题。
下面是我跑本地版graphsage的log
image
这是我fleet分布式的log
image
下面是main函数的分布式部分,我只修改了main函数

def main(args):
    data = load_data(args.normalize, args.symmetry)
    log.info("preprocess finish")
    log.info("Train Examples: %s" % len(data["train_index"]))
    log.info("Val Examples: %s" % len(data["val_index"]))
    log.info("Test Examples: %s" % len(data["test_index"]))
    log.info("Num nodes %s" % data["graph"].num_nodes)
    log.info("Num edges %s" % data["graph"].num_edges)
    log.info("Average Degree %s" % np.mean(data["graph"].indegree()))

    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
    train_program = fluid.default_main_program()
    startup_program = fluid.default_startup_program()
    samples = []
    if args.samples_1 > 0:
        samples.append(args.samples_1)
    if args.samples_2 > 0:
        samples.append(args.samples_2)

    with fluid.program_guard(train_program, startup_program):
        feature, feature_init = paddle_helper.constant(
            "feat",
            dtype=data['feature'].dtype,
            value=data['feature'],
            hide_batch_size=False)

        graph_wrapper = pgl.graph_wrapper.GraphWrapper(
            "sub_graph", place, node_feat=data['graph'].node_feat_info())
        model_loss, model_acc = build_graph_model(
            graph_wrapper,
            num_class=data["num_class"],
            feature=feature,
            hidden_size=args.hidden_size,
            graphsage_type=args.graphsage_type,
            k_hop=len(samples))

    test_program = train_program.clone(for_test=True)
    
    trainer_id = int(os.environ["PADDLE_TRAINER_ID"])
    trainers = int(os.environ["PADDLE_TRAINERS"])
    training_role = os.environ["PADDLE_TRAINING_ROLE"]
    log.info(training_role )
    training_role = role_maker.Role.WORKER if training_role == "TRAINER" else role_maker.Role.SERVER
    ports = os.getenv("PADDLE_PSERVER_PORTS")
    pserver_ip = os.getenv("PADDLE_PSERVER_IP", "")
    pserver_endpoints = []
    for port in ports.split(","):
        pserver_endpoints.append(':'.join([pserver_ip, port]))
    role = role_maker.UserDefinedRoleMaker(current_id=trainer_id, role=training_role, worker_num=trainers, server_endpoints=pserver_endpoints)
    config = DistributeTranspilerConfig()
    config.sync_mode = True

    fleet.init(role)
    optimizer = fluid.optimizer.SGD(learning_rate=args.lr)
    optimizer = fleet.distributed_optimizer(optimizer, config)
    optimizer.minimize(model_loss)

    exe = fluid.Executor(place)

    if fleet.is_server():
        log.info('running server')
        fleet.init_server()
        fleet.run_server()

    if fleet.is_worker():
        log.info('running worker')
        fleet.init_worker()
        exe.run(fleet.startup_program)
        feature_init(place)

erniesage的样例数据去哪下载?

我按照https://github.com/PaddlePaddle/PGL/tree/master/examples/erniesage上去运行样例,然后报错:

+ python3 ./preprocessing/dump_graph.py -i ./data.txt -o ./workdir --encoding utf8 -l 40 --vocab_file ./vocab.txt
Traceback (most recent call last):
  File "./preprocessing/dump_graph.py", line 121, in <module>
    dump_graph(args)
  File "./preprocessing/dump_graph.py", line 57, in dump_graph
    with io.open(args.inpath, encoding=args.encoding) as f:
FileNotFoundError: [Errno 2] No such file or directory: './data.txt'

这应该是缺了样例数据吧,我翻了下文档也没找到,你们能提供吗?

How were the hyperparameters chosen for OGB?

^

You write in the paper the parameters that were selected, but could you go into more detail about how the hyperparameters were chosen? Like, what ranges were you looking at, etc.?

a question, what’s legacy?

sorry to interrupt. i found some distributed version of pgl, but they are in legacy, so what’s legacy? how can i run demo in legacy example, such like ?

thanks!

不同节点特征维度不同如何处理

您好,我在用PGL解决传感器分类问题建模时遇到了困惑,麻烦问一下图神经网络中如果部分节点特征维度不同,或者某些节点压根就没有特征,这种弄情况下除了采用异质图以外还有其他的方案么?另外我理解同质体不同的节点特征维度必须要相同吧,这个理解对么?谢谢!

distribute_deepwalk的问题:offset is too big

您好,在跑distribute_deepwalk的训练阶段时提示offset is too big,然后训练中断。请问这个要怎么修改呀,多谢了~ paddle的版本是1.6,以下是server具体的报错信息。
terminate called after throwing an instance of 'Xbyak::Error'
what(): offset is too big
W1212 15:27:18.687328 19979 init.cc:212] *** Aborted at 1576135638 (unix time) try "date -d @1576135638" if you are using GNU date ***
W1212 15:27:18.690397 19979 init.cc:212] PC: @ 0x0 (unknown)
W1212 15:27:18.690459 19979 init.cc:212] *** SIGABRT (@0x1f400003fd1) received by PID 16337 (TID 0x7f38bbfff700) from PID 16337; stack trace: ***
W1212 15:27:18.693048 19979 init.cc:212] @ 0x7f3ac3ba5160 (unknown)
W1212 15:27:18.695641 19979 init.cc:212] @ 0x7f3ac31133f7 __GI_raise
W1212 15:27:18.698221 19979 init.cc:212] @ 0x7f3ac31147d8 __GI_abort
W1212 15:27:18.699606 19979 init.cc:212] @ 0x7f3a65bdbc65 __gnu_cxx::__verbose_terminate_handler()
W1212 15:27:18.700861 19979 init.cc:212] @ 0x7f3a65bd9e06 __cxxabiv1::__terminate()
W1212 15:27:18.702225 19979 init.cc:212] @ 0x7f3a65bd9e33 std::terminate()
W1212 15:27:18.703485 19979 init.cc:212] @ 0x7f3a65c2c935 execute_native_thread_routine
W1212 15:27:18.705993 19979 init.cc:212] @ 0x7f3ac3b9d1c3 start_thread
W1212 15:27:18.708621 19979 init.cc:212] @ 0x7f3ac31c512d __clone
W1212 15:27:18.711194 19979 init.cc:212] @ 0x0 (unknown)

另外worker有一处报错为PaddleCheckError: Expected posix_memalign(&p, alignment, size) == 0, but received posix_memalign(&p, alignment, size):12 != 0:0.

pgl 在docker中构建graph报segment fault

报错信息:

W0720 03:52:59.207820 3586 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0720 03:52:59.207885 3586 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0720 03:52:59.207908 3586 init.cc:221] The detail failure signal is:

W0720 03:52:59.207937 3586 init.cc:224] *** Aborted at 1595217179 (unix time) try "date -d @1595217179" if you are using GNU date ***
W0720 03:52:59.209323 3586 init.cc:224] PC: @ 0x0 (unknown)
W0720 03:52:59.209501 3586 init.cc:224] *** SIGSEGV (@0x0) received by PID 3586 (TID 0x7f213102e740) from PID 0; stack trace: ***
W0720 03:52:59.210655 3586 init.cc:224] @ 0x7f2130840630 (unknown)
W0720 03:52:59.211284 3586 init.cc:224] @ 0x7f2104830b62 (unknown)
W0720 03:52:59.211845 3586 init.cc:224] @ 0x7f210483390d (unknown)
W0720 03:52:59.213013 3586 init.cc:224] @ 0x7f2130b32300 PyEval_EvalFrameEx
W0720 03:52:59.214179 3586 init.cc:224] @ 0x7f2130b3464d PyEval_EvalCodeEx
W0720 03:52:59.215322 3586 init.cc:224] @ 0x7f2130abe07d (unknown)
W0720 03:52:59.216428 3586 init.cc:224] @ 0x7f2130a99073 PyObject_Call
W0720 03:52:59.217643 3586 init.cc:224] @ 0x7f2130aa8065 (unknown)
W0720 03:52:59.218773 3586 init.cc:224] @ 0x7f2130a99073 PyObject_Call
W0720 03:52:59.219944 3586 init.cc:224] @ 0x7f2130af0097 (unknown)
W0720 03:52:59.221132 3586 init.cc:224] @ 0x7f2130aeedaf (unknown)
W0720 03:52:59.222276 3586 init.cc:224] @ 0x7f2130a99073 PyObject_Call
W0720 03:52:59.223424 3586 init.cc:224] @ 0x7f2130b2d846 PyEval_EvalFrameEx
W0720 03:52:59.224547 3586 init.cc:224] @ 0x7f2130b3464d PyEval_EvalCodeEx
W0720 03:52:59.225683 3586 init.cc:224] @ 0x7f2130abdf88 (unknown)
W0720 03:52:59.226790 3586 init.cc:224] @ 0x7f2130a99073 PyObject_Call
W0720 03:52:59.227934 3586 init.cc:224] @ 0x7f2130a99155 (unknown)
W0720 03:52:59.229049 3586 init.cc:224] @ 0x7f2130a9923e PyObject_CallFunction
W0720 03:52:59.230197 3586 init.cc:224] @ 0x7f2130ad5561 _PyObject_GenericGetAttrWithDict
W0720 03:52:59.231338 3586 init.cc:224] @ 0x7f2130b2f800 PyEval_EvalFrameEx
W0720 03:52:59.232473 3586 init.cc:224] @ 0x7f2130b3464d PyEval_EvalCodeEx
W0720 03:52:59.233608 3586 init.cc:224] @ 0x7f2130b31b4c PyEval_EvalFrameEx
W0720 03:52:59.234735 3586 init.cc:224] @ 0x7f2130b3464d PyEval_EvalCodeEx
W0720 03:52:59.235885 3586 init.cc:224] @ 0x7f2130abe07d (unknown)
W0720 03:52:59.236999 3586 init.cc:224] @ 0x7f2130a99073 PyObject_Call
W0720 03:52:59.238148 3586 init.cc:224] @ 0x7f2130aa8065 (unknown)
W0720 03:52:59.239271 3586 init.cc:224] @ 0x7f2130a99073 PyObject_Call
W0720 03:52:59.240423 3586 init.cc:224] @ 0x7f2130af0097 (unknown)
W0720 03:52:59.241578 3586 init.cc:224] @ 0x7f2130aeedaf (unknown)
W0720 03:52:59.242700 3586 init.cc:224] @ 0x7f2130a99073 PyObject_Call
W0720 03:52:59.243827 3586 init.cc:224] @ 0x7f2130b2d846 PyEval_EvalFrameEx
W0720 03:52:59.244946 3586 init.cc:224] @ 0x7f2130b3464d PyEval_EvalCodeEx
Segmentation fault

测试代码:
import pgl
from pgl import graph # import pgl module
import numpy as np

edge_list=[]
node_list=[]
filename="./testnode.txt"
#filename="./edgelist.txt"
try:
fp=open(filename,"r")
print("%s open succesfully"%filename)
done=False
while not done:
aline=fp.readline();
if aline!="":
a=long(aline.strip("\n").strip("(").strip(")").split(",")[0])
b=long(aline.strip("\n").strip("(").strip(")").split(",")[1])
edge_list.append((a,b))
node_list.append(a)
node_list.append(b)
else:
done=True
fp.close()
except IOError:
print("%s open failed"%filename)

print type(list)
print(edge_list)
print(node_list)
news_nodes = []

for id in node_list:
if id not in news_nodes:
news_nodes.append(id)
num_node=len(news_nodes)
print("%d nodes"%num_node)
print(news_nodes)

#num_node=359625121
d = 8
feature = np.random.randn(num_node, d).astype("float32")
edge_feature = np.random.randn(len(edge_list), d).astype("float32")

g = graph.Graph(num_nodes = num_node,edges = edge_list,node_feat = {'feature':feature},edge_feat ={'edge_feature': edge_feature})

读入的文件内容:
(-4649263633069986650,-5524713781035048896)
(1271252742248293960,5257009004707542774)
(-7235151214169208912,-801785351457387666)
(3878525930642985553,-4845667399646036208)
(3787724060214927072,2498311633260070737)
(-1926501030799233262,-2531193103782375080)
(5379329040149508336,-1753726466388496271)
(-2731878471209782768,7141705257445771183)
(208737387650153426,-4842986495434924796)
(6859451388928841912,-1849683348999068048)
(-5297793359529213911,-730495328556966270)
(-8990674550612404115,5475799282437796300)
(8281941951883016219,2817935382386340348)
(-3531905366784664905,8330449695852382521)

我的paddle版本是:
PaddlePaddle 1.8.3, compiled with
with_avx: ON
with_gpu: OFF
with_mkl: ON
with_mkldnn: ON
with_python: ON

python版本2.7.5

erniesage的例子

用python ./preprocessing/dump_graph.py --conf='./config/erniesage_link_predict.yaml'运行后,报错:

Traceback (most recent call last):
File "./preprocessing/dump_graph.py", line 221, in
dump_node_feat(config)
File "./preprocessing/dump_graph.py", line 185, in dump_node_feat
tokenizer = ErnieTinyTokenizer.from_pretrained(config.ernie_name)
File "/home/alphis/ws/pei_zhi_huan_jing/ERNIE/ernie/tokenizing_ernie.py", line 223, in from_pretrained
t = cls(vocab_dict, sp_model_path, **kwargs)
File "/home/alphis/ws/pei_zhi_huan_jing/ERNIE/ernie/tokenizing_ernie.py", line 232, in init
self.sp_model.Load(sp_model_path)
File "/home/alphis/anaconda3/envs/nlp/lib/python3.7/site-packages/sentencepiece/init.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/home/alphis/anaconda3/envs/nlp/lib/python3.7/site-packages/sentencepiece/init.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

怎么输入一个batch的图

我想输入一个batch的图,比如batch=32,每个图都很小,独立的。如何像输入图片一样的同时输入,有这样的例子吗

关于分布式版本部署的问题

PGL的分布式版本可以在哪些平台上部署呢?有分布式版本PGL的部署指引可以提供吗?example中提供的deepwalk及graphsage的分布式版本的运行环境可以详细说明下吗?

1-Introduction运行问题

---------------------------------------------------------------------------EnforceNotMet Traceback (most recent call last) in
111 feed_dict['node_label'] = label
112
--> 113 train_loss = exe.run(fluid.default_main_program(), feed=feed_dict, fetch_list=[loss], return_numpy=True)
114 print('Epoch %d | Loss: %f'%(epoch, train_loss[0]))
115 print('1')
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
649 scope=scope,
650 return_numpy=return_numpy,
--> 651 use_program_cache=use_program_cache)
652 else:
653 if fetch_list and program._is_data_parallel and program._program and (
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in _run(self, program, exe, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
747 self._feed_data(program, feed, feed_var_name, scope)
748 if not use_program_cache:
--> 749 exe.run(program.desc, scope, 0, True, True, fetch_var_name)
750 else:
751 exe.run_cached_prepared_ctx(ctx, scope, False, False, False)
EnforceNotMet: Invoke operator sigmoid_cross_entropy_with_logits error.

Windows10系统下pgl无法安装

Windows10系统,paddlepaddle版本1.6.1.post107,cudnn是7.3.1,CDUA是10.0.0,python3.7.5。
在conda命令行中输入pip install pgl,错误提示如下:
image
请问在我上述环境下pgl应该通过何种途径安装?

GES Example,训练提前break后程序会hang住无法正常退出

环境:python3、最新pgl、paddle1.6、gpu环境
和example中的ges代码变动如下,就是step=5的时候提前break,程序会卡住:

def train(train_exe, exe, program, loss, node2vec_pyreader, args, train_steps):
    """ train
    """
    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
    step = 0
    while True:
        try:
            begin_time = time.time()
            loss_val, = train_exe.run(fetch_list=[loss])
            log.info("step %s: loss %.5f speed: %.5f s/step" %
                     (step, np.mean(loss_val), time.time() - begin_time))
            step += 1
            if step == 5:
                break
        except F.core.EOFException:
            node2vec_pyreader.reset()

        if (step % args.steps_per_save == 0 or
                step == train_steps) and trainer_id == 0:

            model_save_dir = args.output_path
            model_path = os.path.join(model_save_dir, str(step))
            if not os.path.exists(model_save_dir):
                os.makedirs(model_save_dir)
            F.io.save_params(exe, model_path, program)

        if step == train_steps:
            break

卡住的显示
image

请问pgl支持paddle2.0吗?

看了一下paddle2.0的介绍,弱化了lodtensor的概念,相关的一些api比如sequence_pool也不能使用了,升级到2.0后要怎么使用pgl呢?

Unsupervised GraphSAGE 结果

PGL/examples/unsup_graphsage下运行python train.py --data_path ./sample.txt --num_nodes 2000 --phase predict, 出来的emb.npy文件里是多个embedding结果。

这个结果怎么和sample.txt中的id一一对应起来?第n行的embedding结果对应id=n的embedding吗?但是没有1164这个id,emb.npy文件第1164行却有embedding结果。

固定随机数种子仍然不能每次稳定复现结果

对于图卷积的几个example(GCN/GAT/GIN),按照如下方式固定paddle和numpy的随机数种子:

seed = 123
train_program.random_seed = seed
startup_program.random_seed = seed
np.random.seed(seed)
random.seed(seed)

在cpu上每次的运行结果完全一样,但是在gpu上运行时每次差别很大;之后尝试把pgl的图卷积层去掉,这样模型在gpu上也可以稳定复现了,想问一下是不是pgl的底层实现对于cuda还存在某些随机性的操作?

ps:尝试过设置export FLAGS_cudnn_deterministic=True ,似乎也没有用

输入图为多通道如何实现?

在神经网络中,输入数据可以是多张图片等。
请问在PGL中,是否允许在一个模型中传输进入多个图,并且多个图的size还不一样呢?

GAT 带权重边

请问如果想要在GAT中加入graph边权重的信息,应该在哪里修改呢?

谢谢!

大规模图分析的实践方法

我们目前有几千万的节点,过亿条变,进行图分析(node classification/link prediction)如何进行比较好。

  1. 图数据需要从图数据库中导出吗?
  2. 这种规模的图分析如何有效地进行?
  3. 一些节点信息额增加或者属性的变化,如何高效、增量地进行?

Windows系统下pgl应如何安装

Windows10系统,paddlepaddle版本1.6.1.post107,cudnn是7.3.1,CDUA是10.0.0,python3.7.5。
在conda命令行中输入pip install pgl,错误提示如下:
image
请问在我上述环境下对应的pgl应该如何安装?

Warning: PaddlePaddle catches a failure signal, it may not work properly

W0702 12:33:15.146848 3068 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0702 12:33:15.146860 3068 init.cc:221] The detail failure signal is:

W0702 12:33:15.146873 3068 init.cc:224] *** Aborted at 1593664395 (unix time) try "date -d @1593664395" if you are using GNU date ***
W0702 12:33:15.149258 3068 init.cc:224] PC: @ 0x0 (unknown)
W0702 12:33:15.150092 3068 init.cc:224] *** SIGSEGV (@0x558e0a54bca0) received by PID 3068 (TID 0x7f4da11da740) from PID 173325472; stack trace: ***
W0702 12:33:15.152169 3068 init.cc:224] @ 0x7f4da0dc45f0 (unknown)
W0702 12:33:15.153133 3068 init.cc:224] @ 0x7f4d5e96b892 (unknown)
W0702 12:33:15.154075 3068 init.cc:224] @ 0x7f4d5e96da6e (unknown)
W0702 12:33:15.154742 3068 init.cc:224] @ 0x558e04472c94 _PyMethodDef_RawFastCallKeywords
W0702 12:33:15.155367 3068 init.cc:224] @ 0x558e04472db1 _PyCFunction_FastCallKeywords
W0702 12:33:15.155993 3068 init.cc:224] @ 0x558e044de5be _PyEval_EvalFrameDefault
W0702 12:33:15.156579 3068 init.cc:224] @ 0x558e044222b9 _PyEval_EvalCodeWithName
W0702 12:33:15.157160 3068 init.cc:224] @ 0x558e04423610 _PyFunction_FastCallDict
W0702 12:33:15.157730 3068 init.cc:224] @ 0x558e04441b93 _PyObject_Call_Prepend
W0702 12:33:15.158006 3068 init.cc:224] @ 0x558e044790aa slot_tp_init
W0702 12:33:15.158625 3068 init.cc:224] @ 0x558e04479ca8 _PyObject_FastCallKeywords
W0702 12:33:15.159250 3068 init.cc:224] @ 0x558e044ded78 _PyEval_EvalFrameDefault
W0702 12:33:15.159832 3068 init.cc:224] @ 0x558e0442331b _PyFunction_FastCallDict
W0702 12:33:15.160137 3068 init.cc:224] @ 0x558e04484dc2 property_descr_get
W0702 12:33:15.160692 3068 init.cc:224] @ 0x558e044369f1 _PyObject_GenericGetAttrWithDict
W0702 12:33:15.161314 3068 init.cc:224] @ 0x558e044da0ba _PyEval_EvalFrameDefault
W0702 12:33:15.161890 3068 init.cc:224] @ 0x558e044222b9 _PyEval_EvalCodeWithName
W0702 12:33:15.162429 3068 init.cc:224] @ 0x558e04472435 _PyFunction_FastCallKeywords
W0702 12:33:15.163048 3068 init.cc:224] @ 0x558e044d9e70 _PyEval_EvalFrameDefault
W0702 12:33:15.163589 3068 init.cc:224] @ 0x558e0447220b _PyFunction_FastCallKeywords
W0702 12:33:15.164211 3068 init.cc:224] @ 0x558e044d9e70 _PyEval_EvalFrameDefault
W0702 12:33:15.164783 3068 init.cc:224] @ 0x558e044222b9 _PyEval_EvalCodeWithName
W0702 12:33:15.165321 3068 init.cc:224] @ 0x558e04472497 _PyFunction_FastCallKeywords
W0702 12:33:15.165946 3068 init.cc:224] @ 0x558e044dacba _PyEval_EvalFrameDefault
W0702 12:33:15.166486 3068 init.cc:224] @ 0x558e0447220b _PyFunction_FastCallKeywords
W0702 12:33:15.167104 3068 init.cc:224] @ 0x558e044d9be6 _PyEval_EvalFrameDefault
W0702 12:33:15.167680 3068 init.cc:224] @ 0x558e044222b9 _PyEval_EvalCodeWithName
W0702 12:33:15.168279 3068 init.cc:224] @ 0x558e044231d4 PyEval_EvalCodeEx
W0702 12:33:15.168853 3068 init.cc:224] @ 0x558e044231fc PyEval_EvalCode
W0702 12:33:15.169260 3068 init.cc:224] @ 0x558e04538f44 run_mod
W0702 12:33:15.169814 3068 init.cc:224] @ 0x558e045432b1 PyRun_FileExFlags
W0702 12:33:15.170437 3068 init.cc:224] @ 0x558e045434a3 PyRun_SimpleFileExFlags
Segmentation fault

GATNE

请问GATNE中的信息聚合过程涉及的邻居层数k是多少,可以怎么设置?

ImportError: No module named queue

python2.7
paddle1.7.1
pgl 1.1
在 k8s 队列和本地均会报这个错

  • python -m paddle.distributed.launch train.py --conf config/erniesage_v2_gpu.yaml
    [INFO] 2020-05-25 19:28:03,309 [mp_reader.py: 23]: ujson not install, fail back to use json instead
    Traceback (most recent call last):
    File "train.py", line 26, in
    from dataset.graph_reader import GraphGenerator
    File "/root/paddlejob/workspace/env_run/dataset/graph_reader.py", line 17, in
    from dataset.base_dataset import BaseDataGenerator
    File "/root/paddlejob/workspace/env_run/dataset/base_dataset.py", line 31, in
    from pgl.utils import mp_reader
    File "/opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/pgl/utils/mp_reader.py", line 28, in
    from queue import Queue
    ImportError: No module named queue

在本地可以修改为
from multiprocessing import Queue ,然后可以正常运行,但 k8s 队列上的机器我没有修改权限,请问如何解?

paddle1.7多进程问题

我使用python2.7.13+paddle-gpu-1.7+pgl 1.0
测试代码如下

from multiprocessing import Pool
import pgl

def f(x):
    return x*x

p = Pool(1) 
x = [1,2,3,4,5,6]
y = p.map(f, x)
print y
p.terminate()

运行这段程序,会报这个错误

W0507 19:54:19.298550 17687 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0507 19:54:19.298624 17687 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0507 19:54:19.298629 17687 init.cc:214] The detail failure signal is:

W0507 19:54:19.298636 17687 init.cc:217] *** Aborted at 1588852459 (unix time) try "date -d @1588852459" if you are using GNU date ***
W0507 19:54:19.300464 17687 init.cc:217] PC: @                0x0 (unknown)
W0507 19:54:19.300927 17687 init.cc:217] *** SIGTERM (@0x20000004477) received by PID 17687 (TID 0x7ff960b48700) from PID 17527; stack trace: ***
W0507 19:54:19.302687 17687 init.cc:217]     @     0x7ff9602ff160 (unknown)
W0507 19:54:19.304461 17687 init.cc:217]     @     0x7ff960589860 (unknown)
W0507 19:54:19.306123 17687 init.cc:217]     @     0x7ff96061961c PyEval_EvalFrameEx
W0507 19:54:19.307770 17687 init.cc:217]     @     0x7ff96061c0bd PyEval_EvalCodeEx
W0507 19:54:19.309356 17687 init.cc:217]     @     0x7ff960592eb0 function_call
W0507 19:54:19.311004 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.312568 17687 init.cc:217]     @     0x7ff96056f9cd instancemethod_call
W0507 19:54:19.314218 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.315840 17687 init.cc:217]     @     0x7ff9605cebaf slot_tp_init
W0507 19:54:19.317483 17687 init.cc:217]     @     0x7ff9605cb46f type_call
W0507 19:54:19.319144 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.320789 17687 init.cc:217]     @     0x7ff9606164a6 PyEval_EvalFrameEx
W0507 19:54:19.322435 17687 init.cc:217]     @     0x7ff960619460 PyEval_EvalFrameEx
W0507 19:54:19.324079 17687 init.cc:217]     @     0x7ff960619460 PyEval_EvalFrameEx
W0507 19:54:19.325723 17687 init.cc:217]     @     0x7ff96061c0bd PyEval_EvalCodeEx
W0507 19:54:19.327309 17687 init.cc:217]     @     0x7ff960592eb0 function_call
W0507 19:54:19.328956 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.330516 17687 init.cc:217]     @     0x7ff96056f9cd instancemethod_call
W0507 19:54:19.332163 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.333753 17687 init.cc:217]     @     0x7ff9605cebaf slot_tp_init
W0507 19:54:19.335361 17687 init.cc:217]     @     0x7ff9605cb46f type_call
W0507 19:54:19.337008 17687 init.cc:217]     @     0x7ff960560df3 PyObject_Call
W0507 19:54:19.338646 17687 init.cc:217]     @     0x7ff9606164a6 PyEval_EvalFrameEx
W0507 19:54:19.340299 17687 init.cc:217]     @     0x7ff96061c0bd PyEval_EvalCodeEx
W0507 19:54:19.341948 17687 init.cc:217]     @     0x7ff960619345 PyEval_EvalFrameEx
W0507 19:54:19.343593 17687 init.cc:217]     @     0x7ff96061c0bd PyEval_EvalCodeEx
W0507 19:54:19.345240 17687 init.cc:217]     @     0x7ff96061c1f2 PyEval_EvalCode
W0507 19:54:19.346880 17687 init.cc:217]     @     0x7ff960644f42 PyRun_FileExFlags
W0507 19:54:19.348536 17687 init.cc:217]     @     0x7ff9606462d9 PyRun_SimpleFileExFlags
W0507 19:54:19.350172 17687 init.cc:217]     @     0x7ff96065c00d Py_Main
W0507 19:54:19.401824 17687 init.cc:217]     @     0x7ff95f859bd5 __libc_start_main
W0507 19:54:19.420583 17687 init.cc:217]     @           0x4007a1 (unknown)

后来发现把p.terminate()换成p.close()就可以正常跑了

但是,运行命令

 python preprocessing/dump_graph.py -i data.txt -o work_dir --encoding utf-8 -l 40 --vocab_file vocab/vocab.txt

发现会报错,即使把p.terminate()换成p.close()也不行,只有把进程数开的比较小才行(我换成了两个进程)
请问这种情况怎么解决

pgl2.1 centos支持

CentOS release 6.3 (Final) 无法使用pgl 2.1 的whl包安装. 我查看了一下, 猜测是因为2.1的whl包是基于manylinux2014打包的, 而之前基于manylinux1打包的2.0a是可以的. 想问一下后续的linux版本都会使用manylinux2014打包么, 以及如果我想在目前这台Centos 上安装pgl2.1 是不是需要自己下载源码编译安装? 谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.