Code Monkey home page Code Monkey logo

Comments (7)

mmstick avatar mmstick commented on May 29, 2024 1

I'll have to investigate this when I have time to put on this project. I'm still heavily engaged in Ion Shell development, which takes priority over this. I believe this may have to do with Parallel making a copy of the input file even though the input is already a file. A fix could be to check if the stdin is a file, and then using that directly. I'd need to get some perf profiling done to find the exact cause.

Once Ion is complete, I will be integrating it directly into Parallel, as I'll ensure that Ion can be called as a library. Then there won't be a need to call an external shell to execute commands, and it will be able to use Ion as a scripting language in the same way that GNU Parallel uses Perl. Will be a major performance and feature win, given that Ion is drastically superior to Dash, both in performance and feature set.

Something else you can try though is to compile Parallel with MUSL. It eliminates the shared dependencies on glibc, which has a high cost to short-lived parallel tasks.

rustup component add target x86_64-unknown-linux-musl
cargo build --release --target x86_64-unknown-linux-musl

from parallel.

mmstick avatar mmstick commented on May 29, 2024

from parallel.

d33tah avatar d33tah commented on May 29, 2024

@mmstick

Looks like that's not the case:

[15:48:09] ➜  /tmp  cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

Installed via cargo install.

from parallel.

d33tah avatar d33tah commented on May 29, 2024

@mmstick any other ideas then?

from parallel.

amosbird avatar amosbird commented on May 29, 2024

Hi, just out of curiousity, why would transparent_hugepages set to always hurt parallel's performance?

from parallel.

mmstick avatar mmstick commented on May 29, 2024

@amosbird It's because THP has an issue where it majorly ruins memory-related performance when a binary is using jemalloc. Especially so when that program performs a lot of forks, such as this program, where most of it's time is spent forking. If set to always, it will always enact and aggressively purge caches that are used by jemalloc.

from parallel.

mmstick avatar mmstick commented on May 29, 2024

So I have a new project -- concurr. Still in it's early stages, but it has a service (concurr-jobsd) and associated client for controlling nodes with that service running (concurr). Syntax will be very similar to Parallel, but it won't be drop-in compatible -- taking a different route.

The server is built using Tokio, and executes each command within embedded instances of the Ion shell. The client sends a command template to each configured node (which can contain multiple commands), and then asynchronously submits inputs to execute to each slot on each node, and then reads the results back in the order of submission. So distributed computing capabilities are a big feature with the new solution.

The client is currently very basic though. Syntax is as follows:

concurr 'COMMAND TO EXECUTE {}' : arg1 arg2 arg3 arg4
concurr ' COMMAND {}' :: file1 file2 file3

It doesn't yet support reading from stdin, or permutating inputs, or any of the more advanced optional features of Parallel (only on day 3 of development). I'll be working on that shortly. But it does offer TOML configuration and XDG app dir support. Example config:

# A list of nodes that the client will connect to.
nodes = [
    "127.0.0.1:31514",
    "192.168.1.3:31514",
    "192.168.1.194:31514"
]

# Defines whether the client should request outputs of inputs.
outputs = true

from parallel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.