Code Monkey home page Code Monkey logo

pyengnet's Introduction

pyEnGNet: optimized reconstruction of gene co-expression networks using multi-GPU

Deployment & Documentation & Stats

PyPI version Documentation Status GitHub stars GitHub forks License

Gene co-expression networks are valuable tools for discovering biologically relevant information within gene expression data. However, analysing large datasets presents challenges due to the identification of nonlinear gene–gene associations and the need to process an ever-growing number of gene pairs and their potential network connections. These challenges mean that some experiments are discarded because the techniques do not support these intense workloads. This paper presents pyEnGNet, a Python library that can generate gene co-expression networks in High-performance computing environments. To do this, pyEnGNet harnesses CPU and multi-GPU parallel computing resources, efficiently handling large datasets. These implementations have optimised memory management and processing, delivering timely results. We have used synthetic datasets to prove the runtime and intensive workload improvements. In addition, pyEnGNet was used in a real-life study of patients after allogeneic stem cell transplantation with invasive aspergillosis and was able to detect biological perspectives in the study.

pyengnet is featured for:

  • Unified APIs, detailed documentation, and interactive examples available to the community.
  • Complete coverage for reconstruction of massive gene co-expression networks.
  • Optimized models to generate results in the shortest possible time.
  • Optimization of a High-Performance Computing (HPC) and Big Data ecosystem, using cuda and multiprocess.

API Demo:

import os
from pyengnet.File import File
from pyengnet.Engnet import Engnet

if __name__ == "__main__":

   # Load dataset
   dataset = File.load(path=os.getcwd()+"/datasets/Spellman.csv", separator=",", nmi_th=0.6, spearman_th=0.7, kendall_th=0.7, readded_th=0.7, hub_th = 3)

   # Run pyEnGNet on CPUs
   graphFiltered, infoGraphFiltered, graphComplete, infoGraphComplete = Engnet.process(dataset, saveComplete = True)

   # Run pyEnGNet on GPU devices
   # graphFiltered, infoGraphFiltered, graphComplete, infoGraphComplete = Engnet.process(dataset, saveComplete = True, numGpus = 2, computeCapability = 61)

   # Save gene co-expression networks and additional information
   File.saveFile(path='/home/principalpc/Escritorio/graphComplete.csv',graph=infoGraphComplete) # Full network
   File.saveFile(path='/home/principalpc/Escritorio/graphFiltered.csv',graph=infoGraphFiltered) # Filtered network

   # Print gene co-expression networks
   File.showGraph(graph=graphComplete,title='Complete graph') # Full network
   File.showGraph(graph=graphFiltered,title="Filtered graph") # Filtered network

Citing pyEnGNet:

pyEnGNet paper is published in The Journal of Supercomputing. If you use pyEnGNet in a scientific publication, we would appreciate citations to the following paper:

López-Fernández, A., Gómez-Vela, F. A., del Saz-Navarro, M., Delgado-Chaves, F. M., & Rodríguez-Baena, D. S. (2024). Optimized Python library for reconstruction of ensemble-based gene co-expression networks using multi-GPU. The Journal of Supercomputing, 1-35.

Key Links and Resources:


Installation

It is recommended to use pip for installation. Please make sure the latest version is installed, as pyengnet is updated frequently:

pip install pyengnet            # normal install
pip install --upgrade pyengnet  # or update if needed
pip install --pre pyengnet      # or include pre-release version for new features

Alternatively, you could clone and run setup.py file:

git clone https://github.com/aureliolfdez/pyEnGNet.git
pip install .

Required Dependencies:

  • Python>=3.10
  • numpy>=1.24.0
  • tqdm>=4.64.0
  • multiprocess>=0.70.14
  • pandas>=1.5.3
  • matplotlib>=3.6.3
  • networkx>=3.0
  • scipy>=1.10.0

API Reference

I/O Management

  • pyengnet.File: Class used to manage file I/O operations and data visualization.
  • pyengnet.File.load(): Load dataset from a txt or csv file.
  • pyengnet.File.saveFile(): Save network to file (can be used to store full and/or pruned networks)
  • pyengnet.File.showGraph(): Display a specific network

Ensemble

  • pyengnet.Engnet: Class in charge of controlling the execution of the EnGNet algorithm.
  • pyengnet.Engnet.process(): Function that runs the EngNet algorithm. Depending on the parameters of this function, the algorithm will be executed in parallel with CPU processors or GPU devices.
  • pyengnet.Kendall: Kendall measurement is coded in a parallel ecosystem with CPUs.
  • pyengnet.NMI: NMI measurement is coded in a parallel ecosystem with CPUs.
  • pyengnet.Spearman: Spearman measurement is coded in a parallel ecosystem with CPUs.
  • pyengnet.src.correlations: Execution of Kendall, NMI, and Spearman measures under a parallel multi-GPU ecosystem (CUDA). In addition, it detects those pairs of genes that exceed the threshold for major voting.

Examples by Tasks

All implemented modes are associated with examples, check "pyEnGNet examples" for more information.


Run on CPU

"tests/test_integration/test_cpu.py" demonstrates the basic API for the generation of co-expression gene networks using CPUs.

  1. Load gene co-expression dataset from input file

    from pyengnet.File import File
    
    dataset = File.load(path=os.getcwd()+"/datasets/Spellman.csv", separator=",", nmi_th=0.6, spearman_th=0.7, kendall_th=0.7, readded_th=0.7, hub_th = 3)
  2. Run pyEnGNet based on CPUs.

    from pyengnet.Engnet import Engnet
    
    graphFiltered, infoGraphFiltered, graphComplete, infoGraphComplete = Engnet.process(dataset, saveComplete = True)
  3. Save gene co-expression networks output (optional)

    from pyengnet.File import File
    
    File.saveFile(path='/home/user/Desktop/graphComplete.csv',graph=infoGraphComplete)
    File.saveFile(path='/home/user/Desktop/graphFiltered.csv',graph=infoGraphFiltered)
  4. Print gene co-expression networks output (optional)

    from pyengnet.File import File
    
    File.showGraph(graph=graphComplete,title='Complete graph')
    File.showGraph(graph=graphFiltered,title="Filtered graph")

Run on GPU devices

"tests/test_integration/test_gpu.py" demonstrates the basic API for the generation of co-expression gene networks using GPU devices.

  1. Load gene co-expression dataset from input file

    from pyengnet.File import File
    
    dataset = File.load(path=os.getcwd()+"/datasets/Spellman.csv", separator=",", nmi_th=0.6, spearman_th=0.7, kendall_th=0.7, readded_th=0.7, hub_th = 3)
  2. Run pyEnGNet based on CPUs.

    from pyengnet.Engnet import Engnet
    
    graphFiltered, infoGraphFiltered, graphComplete, infoGraphComplete = Engnet.process(dataset, saveComplete = True, numGpus = 2, computeCapability = 61)
  3. Save gene co-expression networks output (optional)

    from pyengnet.File import File
    
    File.saveFile(path='/home/user/Desktop/graphComplete.csv',graph=infoGraphComplete)
    File.saveFile(path='/home/user/Desktop/graphFiltered.csv',graph=infoGraphFiltered)
  4. Print gene co-expression networks output (optional)

    from pyengnet.File import File
    
    File.showGraph(graph=graphComplete,title='Complete graph')
    File.showGraph(graph=graphFiltered,title="Filtered graph")

pyengnet's People

Contributors

aureliolfdez avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.