Code Monkey home page Code Monkey logo

gpcnet's People

Contributors

h4u5 avatar justsz avatar mendygral avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gpcnet's Issues

Linking results in multiple definition errors

The build of GPCNET results in multiple definitions of the symbols table_outerbar, table_innerbar, and print_buffer:

$ make clean
rm -f *.o
rm -f network_test
rm -f network_load_test
$ make
cc -c -o network_test.o network_test.c -I . 
cc -c -o random_ring.o random_ring.c -I . 
cc -c -o collectives.o collectives.c -I . 
cc -c -o subcomms.o subcomms.c -I . 
cc -c -o utils.o utils.c -I . 
cc -o network_test utils.o random_ring.o collectives.o subcomms.o network_test.o -I .  -lm
/usr/bin/ld: random_ring.o:(.bss+0x0): multiple definition of `table_outerbar'; utils.o:(.bss+0x0): first defined here
/usr/bin/ld: random_ring.o:(.bss+0x60): multiple definition of `table_innerbar'; utils.o:(.bss+0x60): first defined here
/usr/bin/ld: random_ring.o:(.bss+0xc0): multiple definition of `print_buffer'; utils.o:(.bss+0xc0): first defined here
...

I suggest the following changes:

$ diff network_test.h.orig network_test.h
34c34
< char table_outerbar[TBLSIZE+1], table_innerbar[TBLSIZE+1], print_buffer[TBLSIZE+1];
---
> extern char table_outerbar[TBLSIZE+1], table_innerbar[TBLSIZE+1], print_buffer[TBLSIZE+1];
$ diff utils.c.orig utils.c
21a22,23
> char table_outerbar[TBLSIZE+1], table_innerbar[TBLSIZE+1], print_buffer[TBLSIZE+1];
> 
$ make clean; make
rm -f *.o
rm -f network_test
rm -f network_load_test
cc -c -o network_test.o network_test.c -I . 
cc -c -o random_ring.o random_ring.c -I . 
cc -c -o collectives.o collectives.c -I . 
cc -c -o subcomms.o subcomms.c -I . 
cc -c -o utils.o utils.c -I . 
cc -o network_test utils.o random_ring.o collectives.o subcomms.o network_test.o -I .  -lm
$

trying to understand gpcnet output

Hello,

I got the table from below after running network_test.

I have two questions:

  1. What is the meaning of Avg(Worst) column
  2. How is possible to have for Multiple Allreduce the 99% and 99.9% percentile values outside the min-max range?

Kind regards,

Lucian Anton

Network Tests v1.3
  Test with 14320 MPI ranks (1790 nodes)

  Legend
   RR = random ring communication pattern
   Nat = natural ring communication pattern
   Lat = latency
   BW = bandwidth
   BW+Sync = bandwidth with barrier
+------------------------------------------------------------------------------------------------------------------------------------------+
|                                                          Isolated Network Tests                                                          |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|                            Name |          Min |          Max |          Avg |   Avg(Worst) |          99% |        99.9% |        Units |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|          RR Two-sided Lat (8 B) |          1.2 |         22.2 |          1.5 |          4.7 |          3.6 |          5.1 |         usec |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|                RR Get Lat (8 B) |          1.3 |         22.3 |          1.9 |          3.7 |          2.2 |          3.6 |         usec |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|      RR Two-sided BW (131072 B) |        549.7 |       3015.1 |       1199.2 |        764.5 |        460.4 |        335.0 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|            RR Put BW (131072 B) |          7.4 |      22134.8 |       2598.8 |          7.4 |          0.9 |          0.9 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| RR Two-sided BW+Sync (131072 B) |        336.2 |       2031.9 |        916.5 |        769.7 |        335.5 |        186.9 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|     Nat Two-sided BW (131072 B) |        650.0 |       4913.7 |       1899.5 |       1124.1 |       1142.5 |        883.4 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|        Multiple Allreduce (8 B) |         37.3 |         78.3 |         45.5 |         78.3 |        113.3 |        999.9 |         usec |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|      Multiple Alltoall (4096 B) |        838.9 |       1003.9 |        901.6 |        838.9 |        479.3 |        186.3 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+

Running gpcnet on ARM with EFA

Hello all,

We are working to run gpcnet on the new ARM offerings for AWS, using EFA, and are getting Segmetation Faults on the congester portion of the tests. First question.....have the benchmarks been successfully compiled/run on ARM by anyone that they can share lessons learned if they have any? Second, any tips on where to start when looking into the Seg Faults for just the congestion portion?

Thanks!

Set minimum time for benchmark

I tried to do this by setting latency iterations, but malloc fails for larger (> 10,000,000) latency iterations.

Failed to allocate perf_vals in random_ring()

GPCNeT results on small 32 node cluster

Dear colleagues,

Thank you for very interesting article about GPCNeT presented at SC19!
In my opinion GPCNeT looks like a good attempt to fill the existing gap in congestion control studies of HPC networking.

I'm not sure is the GitHub is the right place for asking questions regarding GPCNeT, but why not to?

Since congestion control is also a point of my personal research interests, I decided to evaluate the GPCNeT on a typical small cluster system with 32 nodes:

  • Intel Xeons, 18 cores @ 2.30 GHz
  • ConnectX-4 EDR, 100 Gb/s
  • 36 ports Mellanox SB7700, 7.2Tb/s of backplane bandwidth
  • OpenMPI 4.0.3 + UCX

I ran the network_load_test using 28 of 32 nodes with in several scenarious and got the results presented below:

  • 20%vs80% proportion of canaries and congestors, 4 congestors, default msg sizes - no congestion (here I refer to congestion impact metric both for average and tail latency)
  • 50%vs50%, 4 congestors, default msg sizes - no congestion
  • 20%vs80%, 1 congestor, default msg sizes - no congestion when I switch between all available congestors
  • 50%vs50%, 1 congestor, default msg sizes - no congestion when I switch between all available congestors
  • Same picture when I try to change message size of congestors.
    These tests was done both for 18 PPN and 36 PPN (Hyper-Threading).

I'm curious why there is no congestion impact in all scenarios (except some random noise from time to time). I came up with the several gipotheses:

  • Assuming that MPI ranks on hosts utilize full 100Gbit/s link capacity, there is a big head room in switch buffers to process and forward packets fast enough, since I used 28 of 32 available nodes, and 4 ports on switch are not used. So, the recommendation from README is not satisfied: "network_load_test should not be run at much less than full system scale (ie, run on at least 95% of system nodes)"
  • System scale is not big enough. But if so, then what is the baseline system scale for InfiniBand where we could see congestion impact on GPCNeT?
  • For some strange reason MPI ranks aren't able to push enough traffic in the network?

In my opinion the first one gipothesis with head room in the switch is the case here.

What do you think? Maybe there was attemts to run GPCNeT on small clusters not mentioned in the paper?

In any case I would be grateful for any discussion or explanation :-)

Mikhail

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.