Comments (2)
40B model on 1 GPU.
Step 4 elaspe 34.75464367866516 s, 38.27567377588915 Tflops
CHUNK_LIST_prepare_device .... 42.78182029724121, 12.76010964694122 %
CHUNK_allocate_payload_cpu ... 219.98981094360352, 65.61418119535382 %
CLIENT_access ................ 230.78546500205994, 68.83409396529345 %
CLIENT_release ............... 5.45810604095459, 1.627935208537724 %
chunk_cpu_gpu_move ........... 47.5001916885376, 14.167411531003756 %
CLIENT_access_dist ........... 90.68537592887878, 27.047828544621243 %
chunk_gpu_cpu_move ........... 46.541945934295654, 13.881605064427777 %
CHUNK_LIST_chunk_move ........ 46.56918501853943, 13.889729396193417 %
FWD .......................... 44.62372398376465, 13.30947600947208 %
CLIENT_release_dist .......... 0.12889885902404785, 0.03844538551854307 %
BWD .......................... 60.193546295166016, 17.95333264054873 %
ADAM_prepare_data_grad_copy .. 18.356434106826782, 5.47499172084236 %
ADAM_prepare_data ............ 155.6247365474701, 46.41664820068707 %
ADAM_compute ................. 43.16696763038635, 12.874983725861044 %
ADAM_param_fp32_to_fp16 ...... 30.386908769607544, 9.063202197518356 %
ADAM_release_data ............ 0.3651151657104492, 0.1088992828228694 %
ADAM ......................... 230.46057200431824, 68.73719134997918 %
CHUNK_LIST_make_room ......... 5.053736686706543, 1.507327967840193 %
TOTAL ........................ 335.2778422832489
------------- DATA MOVE RESULTS --------------
chunk_cpu_gpu_move: 1122048.0 MB, 1461 times, 23621.967830305926 MB/s
chunk_gpu_cpu_move: 1095168.0 MB, 1426 times, 23530.77375720547 MB/s
ADAM_prepare_data_grad_copy: 387155.8593940735 MB, 4045 times, 21091.016759627062 MB/s
ADAM_param_fp32_to_fp16: 774311.718788147 MB, 4045 times, 25481.75349651229 MB/s
******************** LOSS ********************
[0.69580078125, 3.775390625, 164.875, 60.75, 6.90234375]
from patrickstar.
I made a mistake the above statists involves warmup iteration....
from patrickstar.
Related Issues (20)
- Memory-centric tiling HOT 1
- Support both dynamic model data partition and static model data partition. HOT 1
- Polish memory and speed profiler.
- PatrickStar's Performance in Models Like GANs HOT 2
- Support NVMe HOT 1
- 运行报错 HOT 1
- Proposal: overlap NVMe read and write with computing. HOT 2
- Skipping ADAM in warmup affects the overall performance.
- Support communication config before training
- Search the best chunk size. HOT 1
- Accelerate Chunk List Construction Speed. HOT 1
- support using PatrickStar on MegatronDeepSpeed? HOT 3
- Error when install under python3.6 HOT 1
- FP32ChunkReadBuffer throw errors for vit training.
- 希望能够保持特定层的 weight 仍为 float32 HOT 5
- A major refactor to sacrifice some performance for flexiblity and simplicity HOT 1
- RuntimeError: chunk move failed. HOT 3
- install issue HOT 2
- Hi,What is the offset do?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from patrickstar.