Code Monkey home page Code Monkey logo

differentsc.github.io's Introduction

About Me

I am an AI system researcher working at FriendliAI. I received my PhD at Seoul National University (SNU) Software Platform Lab (SPL), advised by Prof. Byung-Gon Chun. My interest lies in (but not limited to) optimizing training & inference of large-scale deep learning models, developing large-scale natural language models, and building distributed data processing systems. I've also been participating many open-source projects including Apache Nemo and Apache REEF.

Career

  • 2022-Present AI system researcher, FriendliAI
  • 2015-2022, Ph.D. in Computer Science and Engineering, Seoul National University
    • Dissertation: Semantic-Aware Data Management for Data Processing and Deep Learining
  • 2015 Summer, Research Intern, Microsoft Research Asia
  • 2011-2015, B.S. in Computer Science and Engineering, Seoul National University

Technical & Research Interests

  • Systematic Optimization of DL pipelines
    • Multi-GPU & Multi-Node Training (PyTorch DDP, GPipe, Megatron-LM)
    • Data Preprocessing and Augmentation (tf.data, PyTorch Dataloader, DALI)
    • Multiprocessing for Python-based DL frameworks
  • Large-Scale Deep Neural Networks
    • Pretrained language models (GPT-3, DALL-E, Codex)
  • Large-Scale Data Processing
    • Distributed Data Processing (Spark, Nemo, REEF)
    • Real-Time Stream Processing (Flink, Storm)
    • Persistent KV stores (RocksDB, Microsoft FASTER) on SSDs

Featured Publications

  • Gyewon Lee, Jaewoo Maeng, Jinsol Park, Jangho Seo, Haeyoon Cho, Youngseok Yang, Taegeum Um, Jongsung Lee, Jae W. Lee, Byung-Gon Chun (2023). FlowKV: A Semantic-Aware Store for Large-Scale State Management of Stream Processing Engines. ACM EuroSys 2023. [Paper]
  • Dohyeon Lee, Jaeseong Lee, Gyewon Lee, Seung-Won Hwang, Byung-Gon Chun (2021). SCOPA : Soft Code-Switching and Pairwise Alignment for Zero-Shot Cross-lingual Transfer. ACM CIKM 2021. [Paper]
  • Won Wook Song, Youngseok Yang, Jeongyoon Eo, Jangho Seo, Joo Yeon Kim, Sanha Lee, Gyewon Lee, Taegeon Um, Haeyoon Cho, Byung-Gon Chun (2021). Apache Nemo: A Framework for Optimizing Distributed Data Processing. ACM TOCS 2021. [Paper]
  • Gyewon Lee, Irene Lee, Hyeonmin Ha, Kyunggeun Lee, Hwarim Hyun, Ahnjae Shin, Byung-Gon Chun (2021). Refurbish Your Training Data: Reusing Partially Augmented Samples for Faster Deep Neural Network Training. USENIX ATC 2021. [Paper]
  • Taegeon Um, Gyewon Lee, Byung-Gon Chun (2021). Pluto: High-Performance IoT-Aware Stream Processing. IEEE ICDCS 2021. [Paper]
  • Gyewon Lee, Jeongyoon Eo, Jangho Seo, Taegeon Um, Byung-Gon Chun (2018). High-Performance Stateful Stream Processing on Solid-State Drives, ACM APSys 2018. [Paper]
  • Byung-Gon Chun, Tyson Condie, Yingda Chen, Brian Cho, Andrew Chung, Carlo Curino, Chris Douglas, Matteo Interlandi, Beomyeol Jeon, Joo Seong Jeong, Gyewon Lee, Yunseong Lee, Tony Majestro, Dahlia Malkhi, Sergiy Matusevych, Brandon Myers, Mariia Mykhailova, Shravan M. Narayanamurthy, Joseph Noor, Raghu Ramakrishnan, Sriram Rao, Russell Sears, Beysim Sezgin, Taegeon Um, Julia Wang, Markus Weimer, Youngseok Yang (2017). Apache REEF: Retainable Evaluator Execution Framework. ACM TOCS 2017. [Paper]
  • Taegeon Um, Gyewon Lee, Sanha Lee, Kyungtae Kim, Byung-Gon Chun (2017). Scaling Up IoT Stream Processing. ACM APSys 2017. [Paper]

Talks

  • 2023, ACM EuroSys, Rome, Italy, FlowKV: A Semantic-Aware Store for Large-Scale State Management of Stream Processing Engines. [Program]
  • 2022, Naver Techtalk, Pangyo, Korea, Revamper: A Smart Caching System for Faster DNN Training with Data Augmentation.
  • 2021, USENIX ATC, Online, Refurbish Your Training Data: Reusing Partially Augmented Samples for Faster Deep Neural Network Training. [Slides] [Video]
  • 2020, Hyperconnect Seminar, Seoul, Korea, Effective State Management in Stream Analytics using Persistent Storage.
  • 2019, Naver Techtalk, Pangyo, Korea, Data Access-Pattern Aware Streaming Analytics on SSDs.
  • 2018, ACM APSys, Jeju, Korea, High-Performance Stateful Stream Processing on Solid-State Drives.
  • 2017, Naver Deview, Seoul, Korea, MIST: 고성능 IoT 스트림 처리 시스템.

Open-Source Projects

  • 2018-Present, Apache Nemo (incubating), PMC and committer [GitHub]
  • 2019-2021, Google Summer of Code mentor (The Apache Software Foundation)
  • 2014-2018, Apache REEF, PMC and committer [GitHub]
  • 2016-2018, MIST, core developer [GitHub]

Teaching Assistant

  • 2018-2019, DS2 (Spark SQL, Spark Streaming) @Samsung Electronics
  • 2016, Operating Systems (Tizen, Linux Kernel) @Seoul National University

differentsc.github.io's People

Contributors

differentsc avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.