Belfast LEWGI: Make all concepts sets of named requirements instead

Specifying a concept, even an exposition only one, gives implementers no leeway. We must use a table of named requirements instead, unfortunately.

Cologne LEWGI: remove environment class

of course we now need to list our requirements for this.

Unifying pipe IO

I'm working on byte IO proposal and hope to get R0 in Prague. I've actually thought about adding pipes before I found this proposal. I hope to share some experience and make us sync the API.

Right now there 2 functions for IO in the process proposal:

size_t read(span<byte> data);
size_t write(span<const byte> data);

while my API uses this:

streamsize read_some(span<byte> buffer);
streamsize write_some(span<const byte> buffer);

As you can see, the API is pretty similar but there are important differences. I return streamsize because it is defined to hold the size of IO operation. This type is also signed which is important in template code where unsigned types ruin arithmetic.

2nd - naming. The names were taken from Networking TS because they point out that that buffer may not be transferred fully. For that reason I provide high level std::io::read and std::io::write customization points that call read_some and write_some in a loop until the buffer is transferred.

Also note on vectored IO which is not in my proposal yet. From my understanding of MSDN, Windows requires perfectly aligned and sized pages so span<span<byte>> doesn't work in most cases. Also POSIX uses iovec and I've found I'd need to inherit my custom span class from iovec to pass it to POSIX functions without any memory allocations. What I'm saying is I think Networking TS design for vectored IO needs to be tweaked before it works in intuitive way.

Cancel async_wait & remove wait_for & wait_until

I have been repeatetly run into issues regarding the timed wait function on posix. I think it might be a good idea to remove them, but add a cancel function that interrupts the async_wait. This would allow a simple implementation of a timed wait without getting into the nitty-gritty of OS behaviour.

Cologne SG1: process destructor should not terminate program

    // The destructor terminates 
    ~process();

Feedback was clear on this point. std::thread was done wrong and that's part of the reason we have jthread now.

The logic of ~jthread amounts to checking if the thread is joinable and then calling request_stop() followed by join() on the thread.

So processes don't have join -- will just be wait, but could call wait.

Another option that was suggested was to call process::terminate on the child

So will make this an open issue for further committee discussion. As I see it the options are

call process::terminate on child
call wait() in the destructor
add some sort of request_stop interface to process to mirror jthread api -- this could then do either 1 or 2
a constructor option to pick from pre-defined behavior

LEWGI Cologne: add survey of other C++ process libs

Folly was brought up. I've also used the old school ACE::process in the past.

LEWGI Cologne: remove the terminate(pid) from process class interface

Reason is mostly simplification, but also some mild concerns about security issues that might arise.

LEWGI Belfast: add member function requesting orderly shutdown

Implemented with e.g. SIGSTOP or SIGPIPE. Could potentially fall back onto SIGKILL or equivalent if unimplementable on a specific platform.

We were also instructed to bikeshed names for terminate and the functionality described above (called stop for now). Suggestions include request_terminate, or request_PLACEHOLDER, where PLACEHOLDER is whatever terminate is bikeshedded to.

TODOs, notes and comments extracted from the paper

Oh man there's a lot of stuff to go through...

List of TODOs:

Wording:

"Start of execution on process create" under "Design"
"wait or join" under "Design"
"on_success" and "on_setup" under "Design"

Code:

Define concepts ProcessReadableStream and ProcessWritableStream
"Clean" process::terminate(), process_group::terminate()

TODOs added by me (@eliaskosunen):

Add rationale for not making process SemiRegular (not DefaultConstructible and Copyable)
Add rationale for making the behavior of process::~process() inconsistent with std::thread

My (@eliaskosunen) comments:

Existing comments by others from the paper:

@JeffGarland: In "Design" -> "Core language impact": "Bryce to provide information to Jeff"
@JeffGarland: In "Header <process> synopsis": "I (Jeff G) basically lifted this out of boost.process and updated"
@klemens-morgenstern: In process_io::process_io: "Additionally, a path should be possible to open a file just for the child process"
@JeffGarland: In "Open Questions -> Can we piggyback on the thread's forward progress stuff for process as well? Can we assume all threads on the system behave like C++ threads?": "Seems doubtful -- network TS"
@klemens-morgenstern: In "Open Questions -> environment and command line": "std::this_process::environment though. That would make a few things more obvious, because we have an environment class too, that shuold just be used to set it up for the subprocess."

Comments from Isabella Muerte regarding std::environment and https://wg21.link/p1275 etc:

I'd recommend you look into why the Rust community decided to deprecate things like home(), shell(), etc. due to various CVE related issues on some operating systems.
Also, the find_executable function is typically called which because it can find non-executables that can be run via the OS execute process machinery, such as a script or custom registered file type such as windows.

One thing I've considered is having std::environment be the type and then a std::env static inline instance.

LEWGI Cologne: Do we need both basic_pipe and pipe streams?

Can we just have one or the other.

LEWGI Belfast: more minor issues

constexpr default constructor
Audit noexceptness, keeping in mind the Lakos rule
process::native_exit_code return type should be implementation-defined
Remove process(pid) constructor
Error handling exploration paper (I have a number, D1977)

Cologne SG1: is std::process thread safe

We should note it as an open question, whether std::process should be thread safe.

The question started with the a discussion of if multiple threads could call 'wait' and the fact that it might update the object means synchronization. Also suggested that wait could be implemented as wait_for(0) to improve this (Chris K suggested).

After more discussion of posix and other apis seemed that the general conclusion is that it doesn't need to be thread safe.

Not adding to anything to the paper just yet.

async_pipe still has references to basic_pipe

Which has been removed from the paper.

LEWGI Cologne:tweaks to pipe interfaces (constructors, native_handle)

LEWGI suggesions:

Make basic_pipe's default constructor construct it closed.
Remove basic_pipe's copy constructor.
Remove basic_pipe::native_handle_type.

Roadmap

Just to have it in one place, the question is what phases we want to have in the evolution of the paper. Based on the assumption that we want to do async late, I put that together - we could also keep the async in from phase 1.

std::process with launcher/Initializer, without async but a simple env initializer (from char**), std::process_io, std::process_start_dir
std::process_group
std::this_process without environment
std::environment && std::this_process::environment
std::pstream (std::pstreambuf)
async_wait for std::process & std::process_group
std::pipe_end & std::pipe (the async ones)

As an alternative, I think we could split the paper up into three separate ones:

Process
Pipes
Environment

Pipes & Processes are often mentioned together, BUT there is not reason they need to be – I can use processes without pipes and I can use pipes for inter-thread communication. If we consider this code:

auto [p_in, p_out] = std::pstream();
std::process proc("foo", {}, std::process_io(p_out, {}, {}));

The only actual interaction we have between the pipe and thre process library is that std::process_io accepts and argument of type std::pstream. Now, the same is (or at least should be) true for std::fstream.

The same goes for the environment:

std::environment env = std::this_process::environment();
std::process proc("foo"; {}, env);

The only interaction (besides the namespace this_process is that environment implements the concept ProcessInitializer.

So the actual overlap in terms of specification is actually rather small. Now, please note that we have one async function in the process, namely async_wait.

LEWGI Belfast: remove `process::async_wait`

Unsure about this one, might contradict SG1 recommendations from Cologne

enhanced system -- should we remove?

So this is an issue we discussed some in Aspen. I don't think we reached consensus on what to do about it. I'm seeing issues in the paper that Elias is taking but maybe that's due to not seeing that this is a new form?

Anyway my inclination is to add to open issues/help needed or remove altogether. Objections?

// EK: Change signature; system takes a string, not an argument pack
template<class... Args>
int system(Args&&... args)
{
// EK: Change constructor call to taking a path, arguments and something similar to s
// EK: also, a rogue comma at the end
process c(forward(args)...,);
if (!c.valid())
return -1;
c.wait();
return c.exit_code();
}

LEWGI Cologne: Why don't write and read take a span?

Elias has already solved this one in his push. read/write now take span. I'm adding the issue so I have a track of the LEWGI issues and their resolutions.

Terminate in destructor

As a trivial example, why we must terminate in the constructor, the following code causes a deadlock if we wait in the destructor.

std::opstream pin:
std::ipstream pout;

std::process proc{"python", std::process_io(pin, pout)};
throw 42;

The python process will wait for the pipe to close (or the exit command), but the pipe will be closed after the process is destruced.

LEWGI Cologne: Remove process::operator bool.

not sure what the context of that was now.

representation of the launch?

So trying to document the process a bit more. Is this a fair representation?

[MSVC] Template resolution

MSVC has an issue in the current version, with the process signatures, where the process_launcher is placed at the end. The is the following code is not working on msvc, but on gcc:

proc::process proc1{target_path, {"--exit-code", "40", "--wait", "100"}, proc::default_process_launcher{}};

MSVC gives a C2440 error.

I do not know if this is an issue with MSVC or with my design, we might need to move the position of the customer initializer. I am doing the latter for some experimentation on process_groups. Either way, this should be mentioned.

Edit: I had a few more issues and it seems that the MSVC concept support is just buggy as of now.

LEWGI Cologne: Are pids fundamental?

Must we have them in version 1?

My take on this is that yes, they are fundamental - effectively like thread::id. That is to say to interact with the native OS api's it is needed.

Other rationale appreciated.

LEWGI Belfast: `std::process` constructor should take a callback called by the destructor

Note, that this direction has not been approved by SG1, which is still undecided on the matter.

[Design consideration] process_waitee

Basesd on #57 i would argue, that we need an async_wait functionality.

One design question is if we turn the process into a waitable object, like so:

std::process proc{...};

io_context ctx;
proc.async_wait(ctx, ...);
proc.cancel_async_wait();

The downside is that we need to pass the executor into the async_wait function, which is inconsisten with the networking TS design. Secondly, the process class will need data members that hold the async wait implementation, currently asio::signal_set and object_handle.

Alternatively use an object holding all the asio stuff named process_waitee or something.

std::process proc{...};

io_context ctx;

std::process_waitee w{ctx, proc};
w.async_wait(...);
w.cancel();

The second approach has a clearer design with an io object, but has a sharing problem. That is: in order for the process object to have the right exit_code, the waitee would need to keep a reference to it, or make the internal exit_code a shared_pointer. If not, the process might not only have the wrong exit code, but might also be unaware that the process ended.
Thus the first approach has a slightly unusual function signature, but the second solution introduces a bunch of complexity.

LEWGI Belfast: security review

I really have no idea how we're supposed to approach this

fstream extension

If this isn't fundamental I'd also like to remove it and possibly bring it forward as a separate small proposal. While I understand that it might be needed by the implementation, they are free to do that anyway they like. My concern is that there will be the 'abi breaking' bikeshed on this -- which currently can kill a proposal quickly.

** Extension of ~~fstream~~

The standard file streams (~~fstream~~/~~ofstream~~/~~ifstream~~) shall have a member function that return the native handle of the opened file, so that process can use those for forwarding.

I've moved it to the 'proposal_cuts.org' -- if we must have it I can restore.

wchar_t

A bit ignored, but important: windows uses wchar_t for most things. This should be supported; I think going for generic string types in the paper to have clear conversion rules is the way to go.

I think we can add this later, so that for not std::string is fine. Just something to remember.

shell_launcher

Executor thoughts:

using the shell directly is considered insecure. So std::system should in general not be used unless the programmers knows his security stuff (which most don't, sadly). Now: I did propose to keep the shell thing going for compatiblity with std::system. Another thought to consider though is: maybe don't do that and rather deprecate std::system, because of the security issue. Instead double down on the ProcessLauncher concept.

so we only have :

std::process(executable /*string or path*/, args, ... /* other stuff*/);

In which case executable must refer to an absolute path. In most cases people will then need to write

std::process(std::this_process::environment::find_exe("gcc"), {"--version"});

instead of

std::process("git", {});

The thing is: this is annoying, and I love to add shortcuts, so I added a bunch of those in boost.process. BUT how much effort is it really to write a global function in a project that uses std::process, that looks like this?

template<typename ... Args>
std::process launch(const std::string & file, typename ... Args) {return std::process(std::this_process::environment::find_exe(file), std::forward<Args>(args)...);}

I sort of changed my mind on that issue: short doesn't mean better and I rather have something expressive that's a bit longer, than a shortcut that's ambiguous. In C++ that is, don't look at my Javascript...

With that in mind, the flipside is to empower the ProcessLauncher. So, the default starts the process directly and requires a full path, BUT we can provide different launchers even in the STL. Namely the shell_launcher:

std::process("gcc --version", { /* we could pass additional arguments to the launcher, maybe */}, shell_launcher());
//same as 
std::system("gcc --version");

So the shell launcher grabs the shell command from the environment and does it's thing.

Now based on that, people can go nuts with more launchers, for example (as platform extentions).

the vfork_launcher we already discussed, which fundamentally changes the way you start the process.

You could just replace the shell command:

power_shell_launcher for windows, run processes through the powershell
bash_launcher for linux, processes through bash instead of through sh

Or alternatively, the shell_launcher can take an argument for the shell, that defaults to std::this_process::environment::shell(), i.e. I could to this:

std::process("foo", {}, std::shell_launcher("/bin/bash"));

One advantage for the shell_launcher would also be, that you could inherit from it, and implement some security rules on the inherited one. That might help a lot to enforce rules against shell injection in a large project.

Furthermore tools & libraries could provide their own launchers, like these:

docker_launcher, a launcher to inject a process into a running docker container:

docker_launcher dl("my_awesome_docker"); //starts it

const auto proc1 = std::process("foo", {}, dl);
const auto proc2 = std::process("foo", {}, dl);

python_multiprocess_launcher, allowing you to use this: https://docs.python.org/3/library/multiprocessing.html

Please let me know if that makes sense. The basic question is if we want to have the shell_launcher, skip shell execution or built it into the core, i.e. into every launch.

LEWGI Cologne: Investigate non-exception based error-handling and its implications on the process creation API.

So as we've probably already discussed, I believe the crux of this comes down to not using the process constructor to spawn. So you'd always have to construct and then call a function like 'spawn' which could report an error with a return. Something along the lines of

  process p (...);
  optional<system_error> err = p.spawn(); //noexcept

Things I can see against this:

inconsistent with std::thread or std::jthread api design
if the invariants of process are violated in constructor exception is more consistent
makes user code longer - trivially, but still

LEWGI Cologne: Investigate combining exit_code and native_exit_code.

I think we don't want to do this. The reason for exit_code is so you can write portable code. The reason for native_exit_code is so you can write platform specific code.

LEWGI cologne: remove process group from proposal

This is an easy one, just drop the process group. It can always be brought back later.

Cologne: SG1 question - does terminate on a process block?

This one is for @klemens-morgenstern to answer.

Mattias G: If you call terminate on a process does it block?
JG: I'm not sure, will get back to you.

Alternative options in text for start_dir and process_io

// Provides an initializers for the standard io. Alternative: nested as std::process::io
class process_io;

// Provides a way to set the starting directory of the new process. Alternative: nested as std::process::start_dir

What you have looks good to me - going to drop the alternative text.

LEWGI Cologne: Is there a way to get the name of a basic_pipe?

If there is I'm missing it.

Locale & wchar_t

The other pattern dealing with wchar_t seems to be in path, which allows to pass a locale in in order to modify the char conversion. We should consider if we need something similar for passing in the argument-vector.

the proposal is large...

So I pushed an early version of the pdf. I did this just so others could have a look at how things are laying out. This is not nearly as important as fixing content at this stage as I can tweak this right up to the end without external help. I clearly need to tweak the page formatting and a number of other usage factors which should improve the layout.

All that said, it quite a long proposal so we should think of what may be expendable. In my mind I'm considering removing all of the technical specification with the exception of 'header process' and 'header pstream'. So what would remain is basically a list of all the classes, functions, and types without all the details. Opinions and thoughts appreciated.

I'm a huge fan of what is now called Design Discussion and Examples. That's basically this part of the paper. Really I think that documents 99% of what people need to know.

Design Discussion & Examples
** Concept ProcessLauncher
** Concept ProcessInitializer
** Class process
** Class process_group
** Class process_io
** Class environment

[Stdio Design] Designates Initializers & Extensions

@JeffGarland Should ramblings like this go into the std proposal?

1 Designated initializers

We COULD use designated initializers like so:

template<typename In = FILE*, typename Out = FILE*, typename Err = FILE*>
struct process_io
{
    In  in  = stdin;
    Out out = stdout;
    Err err = stderr;
};

auto test() 
{
    return process_io{.in= stdin, .err = stderr};
}

This easens the syntax, but captures by copy. In boost::process everything is captured by reference, but conceptually, i actually don't dislike this. If we use this with a pipe, what you usually want to to assign one pipe end to the child, and then close it on the father. This would look like this:

auto [in, out] std::pstream{}:
auto init = std::process_io{.out = std::move(in)};

The downside is that a pstream is copyable (which clones the handle), so that the naive solution would lead to potential deadlocks:

auto [in, out] std::pstream{}:
auto init = std::process_io{.out = in};

This could be avoided by making pipes not-copyable, which might be the best solution, since fstream can't be copied either. There is however a difference between pipes & files, in that files may be reopened by their name, while there's no way to do so with a pipe. But, I don't see any application for this - plus, if we can get the native_handle from a pipe and can construct another pipe from the handle, you can easily implement this corner case on your own.

LEWGI Cologne: Why isn't is_open const?

I believe the answer here is that the process can complete during the is_open check and hence the process object needs to be updated.

@klemens-morgenstern can you confirm?

[Note] process_group is easy

I implemented a process_group (which we removed from the proposal) on posix & windows with the following API for the reference implementaiton.

struct process_group
{

    typedef ... native_handle_type;
    native_handle_type native_handle() const { return _group.native_handle(); }


    process_group() = default;
    explicit process_group(native_handle_type handle);
    process_group(const process_group &) = delete;
    process_group(process_group && ) = default;

    process_group &operator=(const process_group &) = delete;
    process_group &operator=(process_group && ) = default;


    template<detail::process_initializer<default_process_launcher>  ... Inits>
    pid_type emplace(const std::filesystem::path& exe, std::initializer_list<std::wstring_view> args, Inits&&... inits);

    // Construct a child from a property list and launch it with a custom process launcher
    template<process_launcher Launcher, detail::process_initializer<Launcher> ... Inits>
    pid_type emplace(const std::filesystem::path& exe,
                     std::initializer_list<std::wstring_view> args,
                     Inits&&... inits,
                     Launcher&& launcher);

    // Construct a child from a property list and launch it.
    template<detail::process_initializer<default_process_launcher>  ... Inits>
    pid_type emplace(const std::filesystem::path& exe, std::initializer_list<std::string_view> args, Inits&&... inits);

    // Construct a child from a property list and launch it with a custom process launcher
    template<process_launcher Launcher, detail::process_initializer<Launcher> ... Inits>
    pid_type emplace(const std::filesystem::path& exe,
                     std::initializer_list<std::string_view> args,
                     Inits&&... inits,
                     Launcher&& launcher);

    // Construct a child from a property list and launch it.
    template<typename Args, detail::process_initializer<default_process_launcher>  ... Inits>
    pid_type emplace(const std::filesystem::path& exe, Args&& args, Inits&&... inits);

    // Construct a child from a property list and launch it with a custom process launcher
    template<process_launcher Launcher, typename Args,
            detail::process_initializer<Launcher> ... Inits>
    pid_type emplace(const std::filesystem::path& exe,
                     Args&& args,
                     Inits&&... inits,
                     Launcher&& launcher);

    pid_type attach(process && proc);
    bool contains(pid_type proc);

    void terminate();

    void wait();
    pid_type wait_one();
};

The implementation is rather easy, the only thing I have not done is the async_wait since this would require a new IO object for asio. I don't want to spend too much time on that, since this was just a PoC for me.

TLDR: process_group is easy enough, but I do see why it does not need to be part of the core proposal.

Note: The process.pdf does contain this class, p1750r1.pdf not.

this_process::environment

In boost.process as well as in P1276R0 the environment of the current object is an object. I am not sure this is the right model, since the functions do actually manipulate a global value of the process. I think we could also design those as global functions in a namespace, like so:

std::this_process::environment::get("foo") -> string;//get an environment variable
std::this_process::environment::set("foo", "bar"); //set one
std::this_process::environment::erase("foo"); //delete it

//Get an object with the actual handle of the global object. In windows this has to be created & delete by a function; on posix it's a global char**
std::this_process::environment::raw() -> (char**  for posix | smart_ptr with customer deleter on windows);


std::this_process::environment::path() -> std::vector<std::filesystem::path>; 
std::this_process::environment::shell() -> std::filesystem; 
//Search through the path -> on windows, take extensions into account.
std::this_process::environment::find_executable(std::string & name) -> std::filesystem::path;

//Construct a copy we use in the child process.
std::environment env = std::this_process::environment::raw();

Thoughts?

async_pipe still refers to Networking TS buffer concepts

@eliaskosunen I think we want these now to be span correct?

  // Read some data from the handle.
    // See the Networking TS for more details.
    template<class MutableBufferSequence>
    size_t read_some(const MutableBufferSequence& buffers);

and so on...

[Environment] Generic approach with process_env

I don't think we can rely on P1275R0 to be standardized (I think the global std::arguments will not be accepted) I think we should have a more generic approach to passing environments into a process, than relying on once class. We could add an initializer, e.g. process_env or have a full featured environment class.

The big need of environment functions arises on the side of the child process, i.e. when a child queries it's own environment. For example, searching for an executable that emulates the shell behaviour would be quite common for starting processes, but is quite different between platforms:

On windows, a certain file name is search in the PATH (or Path sometimes) variable with the addition of the extensions available in PATHEXT. On posix no such extensions exist, thus a function like std::vector<std::filesystem::path> std::this_process::env::find_executable(std::string & name).
Similarly, one might want to get the default shell of the OS, which on posix is stored in SHELL while it's deducable through windir on windows. This could be handled by a std::filesystem::path std::this_process::env::shell() function.

It is to now however, that this information is queried before process start and thus usually queried from the father environment. Thus adding those specialized functions to an environment that is used to initialize a child process, should be considered of little to no priority. If another paper does provide such a class, we should support it.

If we discount specific values in an environment, what are the semantics ?

It's a list of strings, in which every string contains at least one =
The part before the first = is a unique key
The part after the first = is the value, that is a : (posix) or ; (windows) seperated list.

With these three characteristics we can define what types are acceptable as an environment to be set by the process_env:

range<convertible_to<string_view>>
range<convertible_to<pair<string_view, string_view>>
range<convertible_to<pair<string_view, range<string_view>>>

struct process_env {

    //string range
    template<ranges::range T>
        requires (
            requires(T t) {{*ranges::begin(t)} -> convertible_to<string_view>;}
            )
    process_env(T && t);


    //string pair range
    template<ranges::range T>
        requires (
            requires(T t) {{get<0>(*ranges::begin(t))} -> convertible_to<string_view>;}
         && requires(T t) {{get<1>(*ranges::begin(t))} -> convertible_to<string_view>;}
            )
    process_env(T && t);

    //string pair vector range
    template<ranges::range T>
        requires (
            requires(T t) {{get<0>(*ranges::begin(t))} -> convertible_to<string_view>;}
         && requires(T t) {{get<1>(*ranges::begin(t))} -> ranges::range;}
         && requires(T t) {{*ranges::begin(get<1>(*ranges::begin(t)))} -> ranges::range;}
            )
    process_env(T && t);

};

This this overloaded we can now use different containers, e.g. vector<string>, unordered_map<string, string> and unordered_map<string, vector<string>>.
The environment class proposed in P1275R0 does fulfill the first constraint as well.

LEWGI Cologne: Are named pipes a version 1 feature?

Unfortunately I'm not sure now if the question was 'are pipes needed' in V1. Basically there's nothing else on Windows so the use of named pipes is more for consistency than anything else if I'm not mistaken.

jeffgarland / liaw2019-process Goto Github PK

liaw2019-process's People

Contributors

Stargazers

Watchers

Forkers

liaw2019-process's Issues

1 Designated initializers

Recommend Projects

Recommend Topics

Recommend Org