Comments (3)
Hi @jiangtanzju ,
TL;DR. It won't influence the performance.
We once had a set of experiments to try embedding generation with BatchNorm set in the train mode, which computes batch mean and variance on the fly, mitigating the discrepancy between the downstream graphs and the pretraining graphs. In that case, if the instances in the test dataset are not shuffled, it will cause data leaks if bs is set small. This is why we set it to be the dataset set in the first place. In case you want to try the same thing, you can ignore this.
As for the inference time, this line won't have too much effect. I ran time xx
command twice with/without this line, the cpu time is:
- 314.11s system
- 312.81s system
from gcc.
OK, I see.
The speed of inference on my side is about ten times different. And I carefully observe via htop, I find when bs is the length of the dataset, only one dataloader num_woker works. When bs is the length of the dataset // 2, only 2 dataloader num_woker work. When bs is the length of the dataset // 4, only 4 dataloader num_worker work, though my num_worker=12.
Only when bs < dataset // num_worker, all num_worker work normally.
You can see when I set bs=length of the dataset // 8, only 8 num_worker are working, while some other num_worker are touching fish:
Anyway, thanks for your detailed reply ๐ .
from gcc.
Hi @jiangtanzju ,
The computation of each dataloader is mainly on scipy.linalg.eigsh
here https://github.com/THUDM/GCC/blob/master/gcc/datasets/data_util.py#L251.
It seems that in your setup, this function is not parallel, which leads to a single dataloader only utilizing 100% CPU. With MKL LAPACK installed, this function can be parallel. In that case, even if other dataloaders are touching fish, the one dataloader will utilize all of the CPUs so the time doesn't change so much in my setup. Anyway this shouldn't matter since you can simply increase the number of loaders by decreasing batch_size.
from gcc.
Related Issues (20)
- it seems dgl-0.4.1 cannot work properly HOT 1
- How to make data๏ผ
- Running experiments completely on CPU HOT 1
- An error when finetuning on graph classification dataset HOT 3
- Questions about pretraining subgraphs
- Possibly redundant BatchNorm layer? HOT 1
- Can't do node_classification tasks on panther datasets HOT 1
- torch.util.data.IterableDataset HOT 1
- About downstream datasets HOT 2
- When I run the algorithm get an error: "OSError: libcudart.so.10.0: cannot open shared object file: No such file or directory" HOT 3
- finetune HOT 2
- x2dgl.py
- kdd17ๆฐๆฎ้ไธญ็.txt.lpm.lscc
- Please help resolve the problem when the code was running on CUDA11.1 ๏ผ DGL 0.7, HOT 1
- Regarding the choice of subgraph enhancement method.
- how to unzip small.bin HOT 1
- no edge label for the datasets?
- Can`t running this code with low equipments HOT 5
- an error when running x2dgl.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcc.