jeffgarland / liaw2019-process Goto Github PK
View Code? Open in Web Editor NEWRepository for initial drafting of boost.process standards paper
License: MIT License
Repository for initial drafting of boost.process standards paper
License: MIT License
Specifying a concept, even an exposition only one, gives implementers no leeway. We must use a table of named requirements instead, unfortunately.
of course we now need to list our requirements for this.
I'm working on byte IO proposal and hope to get R0 in Prague. I've actually thought about adding pipes before I found this proposal. I hope to share some experience and make us sync the API.
Right now there 2 functions for IO in the process proposal:
size_t read(span<byte> data);
size_t write(span<const byte> data);
while my API uses this:
streamsize read_some(span<byte> buffer);
streamsize write_some(span<const byte> buffer);
As you can see, the API is pretty similar but there are important differences. I return streamsize
because it is defined to hold the size of IO operation. This type is also signed which is important in template code where unsigned types ruin arithmetic.
2nd - naming. The names were taken from Networking TS because they point out that that buffer may not be transferred fully. For that reason I provide high level std::io::read
and std::io::write
customization points that call read_some
and write_some
in a loop until the buffer is transferred.
Also note on vectored IO which is not in my proposal yet. From my understanding of MSDN, Windows requires perfectly aligned and sized pages so span<span<byte>>
doesn't work in most cases. Also POSIX uses iovec
and I've found I'd need to inherit my custom span class from iovec
to pass it to POSIX functions without any memory allocations. What I'm saying is I think Networking TS design for vectored IO needs to be tweaked before it works in intuitive way.
I have been repeatetly run into issues regarding the timed wait function on posix. I think it might be a good idea to remove them, but add a cancel
function that interrupts the async_wait
. This would allow a simple implementation of a timed wait without getting into the nitty-gritty of OS behaviour.
// The destructor terminates
~process();
Feedback was clear on this point. std::thread was done wrong and that's part of the reason we have jthread now.
The logic of ~jthread amounts to checking if the thread is joinable and then calling request_stop() followed by join() on the thread.
So processes don't have join -- will just be wait, but could call wait.
Another option that was suggested was to call process::terminate on the child
So will make this an open issue for further committee discussion. As I see it the options are
Folly was brought up. I've also used the old school ACE::process in the past.
Reason is mostly simplification, but also some mild concerns about security issues that might arise.
Implemented with e.g. SIGSTOP
or SIGPIPE
. Could potentially fall back onto SIGKILL
or equivalent if unimplementable on a specific platform.
We were also instructed to bikeshed names for terminate
and the functionality described above (called stop
for now). Suggestions include request_terminate
, or request_PLACEHOLDER
, where PLACEHOLDER
is whatever terminate
is bikeshedded to.
Oh man there's a lot of stuff to go through...
List of TODOs:
Wording:
wait
or join
" under "Design"on_success
" and "on_setup
" under "Design"Code:
ProcessReadableStream
and ProcessWritableStream
process::terminate()
, process_group::terminate()
TODOs added by me (@eliaskosunen):
process
SemiRegular
(not DefaultConstructible
and Copyable
)process::~process()
inconsistent with std::thread
My (@eliaskosunen) comments:
pid_t
as defined in POSIX, because I think that'd be misleading on non-POSIX platforms. I've replaced it with pid_type
. pid
could also be an option.ProcessReadableStream
mean the child can read from it, or that we can read from it? In other words, is it the child's stdin
or stdout
/stderr
?pid_type
satisfy TriviallyCopyable
and/or Regular
? The underlying type does satisfy both on POSIX and Windows, so it probably should.environment
to maybe process_environment
or something similar? This would prevent collisions with P1275 and avoid using up a very useful name in std
.process::native_handle_type
? TriviallyCopyable
? StandardLayout
? (Semi)Regular
? Copyable
?process
and process_group::emplace
. The parameter pack should always be the final parameter.process_group
Copyable
?environment::native_environment_type
cheap to copy, since there's a constructor taking it by value?environment::get
, set
and reset
have overloads taking params by rvalue reference?ForwardRange
the right choice for the return value of environment::keys
and path
, and environment::entry::as_vector
? I think taking an OutputIterator
or OutputRange
and returning void
or the past-the-end iterator would be a more idiomatic choice, and would allow the user to control allocations better.const string&
(e.g. environment::find_executable
, or environment::entry::entry
). Would string_view
be a better option?system
details", the system
function is defined as taking a parameter pack. It should take a string. Also, in the function body, a non-existent variadic constructor of process
is used.fstream
" be added to basic_filebuf
as well? I think it should.u8pipe
, u16pipe
and u32pipe
in addition to pipe
and wpipe
(same applies for other types templated on character type). Maybe add SG16 to Audience?basic_pipebuf
deleted, but copy constructor defaulted? This isn't consistent with basic_filebuf
and basic_stringbuf
.std::this_process::native_handle_type
and pid_type
be aliases of process::native_handle_type
and process::pid_type
, respectively? I can't see the reason to not do this.Existing comments by others from the paper:
<process>
synopsis": "I (Jeff G) basically lifted this out of boost.process and updated"process_io::process_io
: "Additionally, a path should be possible to open a file just for the child process"Comments from Isabella Muerte regarding std::environment
and https://wg21.link/p1275 etc:
I'd recommend you look into why the Rust community decided to deprecate things like
home()
,shell()
, etc. due to various CVE related issues on some operating systems.
Also, thefind_executable
function is typically called which because it can find non-executables that can be run via the OS execute process machinery, such as a script or custom registered file type such as windows.
One thing I've considered is having
std::environment
be the type and then astd::env
static inline instance.
Can we just have one or the other.
constexpr
default constructornoexcept
ness, keeping in mind the Lakos ruleprocess::native_exit_code
return type should be implementation-definedprocess(pid)
constructorWe should note it as an open question, whether std::process should be thread safe.
The question started with the a discussion of if multiple threads could call 'wait' and the fact that it might update the object means synchronization. Also suggested that wait could be implemented as wait_for(0) to improve this (Chris K suggested).
After more discussion of posix and other apis seemed that the general conclusion is that it doesn't need to be thread safe.
Not adding to anything to the paper just yet.
Which has been removed from the paper.
LEWGI suggesions:
Just to have it in one place, the question is what phases we want to have in the evolution of the paper. Based on the assumption that we want to do async late, I put that together - we could also keep the async in from phase 1.
As an alternative, I think we could split the paper up into three separate ones:
Pipes & Processes are often mentioned together, BUT there is not reason they need to be โ I can use processes without pipes and I can use pipes for inter-thread communication. If we consider this code:
auto [p_in, p_out] = std::pstream();
std::process proc("foo", {}, std::process_io(p_out, {}, {}));
The only actual interaction we have between the pipe and thre process library is that std::process_io
accepts and argument of type std::pstream
. Now, the same is (or at least should be) true for std::fstream
.
The same goes for the environment
:
std::environment env = std::this_process::environment();
std::process proc("foo"; {}, env);
The only interaction (besides the namespace this_process
is that environment
implements the concept ProcessInitializer
.
So the actual overlap in terms of specification is actually rather small. Now, please note that we have one async function in the process, namely async_wait
.
Unsure about this one, might contradict SG1 recommendations from Cologne
So this is an issue we discussed some in Aspen. I don't think we reached consensus on what to do about it. I'm seeing issues in the paper that Elias is taking but maybe that's due to not seeing that this is a new form?
Anyway my inclination is to add to open issues/help needed or remove altogether. Objections?
// EK: Change signature; system takes a string, not an argument pack
template<class... Args>
int system(Args&&... args)
{
// EK: Change constructor call to taking a path, arguments and something similar to s
// EK: also, a rogue comma at the end
process c(forward(args)...,);
if (!c.valid())
return -1;
c.wait();
return c.exit_code();
}
Elias has already solved this one in his push. read/write now take span. I'm adding the issue so I have a track of the LEWGI issues and their resolutions.
As a trivial example, why we must terminate in the constructor, the following code causes a deadlock if we wait in the destructor.
std::opstream pin:
std::ipstream pout;
std::process proc{"python", std::process_io(pin, pout)};
throw 42;
The python process will wait for the pipe to close (or the exit command), but the pipe will be closed after the process is destruced.
not sure what the context of that was now.
MSVC has an issue in the current version, with the process signatures, where the process_launcher
is placed at the end. The is the following code is not working on msvc, but on gcc:
proc::process proc1{target_path, {"--exit-code", "40", "--wait", "100"}, proc::default_process_launcher{}};
MSVC gives a C2440
error.
I do not know if this is an issue with MSVC or with my design, we might need to move the position of the customer initializer. I am doing the latter for some experimentation on process_group
s. Either way, this should be mentioned.
Edit: I had a few more issues and it seems that the MSVC concept support is just buggy as of now.
Must we have them in version 1?
My take on this is that yes, they are fundamental - effectively like thread::id. That is to say to interact with the native OS api's it is needed.
Other rationale appreciated.
Note, that this direction has not been approved by SG1, which is still undecided on the matter.
Basesd on #57 i would argue, that we need an async_wait
functionality.
One design question is if we turn the process
into a waitable object, like so:
std::process proc{...};
io_context ctx;
proc.async_wait(ctx, ...);
proc.cancel_async_wait();
The downside is that we need to pass the executor into the async_wait
function, which is inconsisten with the networking TS design. Secondly, the process
class will need data members that hold the async wait
implementation, currently asio::signal_set
and object_handle
.
Alternatively use an object holding all the asio stuff named process_waitee
or something.
std::process proc{...};
io_context ctx;
std::process_waitee w{ctx, proc};
w.async_wait(...);
w.cancel();
The second approach has a clearer design with an io object, but has a sharing problem. That is: in order for the process
object to have the right exit_code
, the waitee
would need to keep a reference to it, or make the internal exit_code
a shared_pointer
. If not, the process
might not only have the wrong exit code, but might also be unaware that the process ended.
Thus the first approach has a slightly unusual function signature, but the second solution introduces a bunch of complexity.
I really have no idea how we're supposed to approach this
If this isn't fundamental I'd also like to remove it and possibly bring it forward as a separate small proposal. While I understand that it might be needed by the implementation, they are free to do that anyway they like. My concern is that there will be the 'abi breaking' bikeshed on this -- which currently can kill a proposal quickly.
** Extension of fstream
The standard file streams (fstream/ofstream/ifstream) shall have a member function that return the native handle of the opened file, so that process can use those for forwarding.
I've moved it to the 'proposal_cuts.org' -- if we must have it I can restore.
A bit ignored, but important: windows uses wchar_t for most things. This should be supported; I think going for generic string types in the paper to have clear conversion rules is the way to go.
I think we can add this later, so that for not std::string
is fine. Just something to remember.
Executor thoughts:
using the shell directly is considered insecure. So std::system
should in general not be used unless the programmers knows his security stuff (which most don't, sadly). Now: I did propose to keep the shell thing going for compatiblity with std::system
. Another thought to consider though is: maybe don't do that and rather deprecate std::system, because of the security issue. Instead double down on the ProcessLauncher concept.
so we only have :
std::process(executable /*string or path*/, args, ... /* other stuff*/);
In which case executable
must refer to an absolute path. In most cases people will then need to write
std::process(std::this_process::environment::find_exe("gcc"), {"--version"});
instead of
std::process("git", {});
The thing is: this is annoying, and I love to add shortcuts, so I added a bunch of those in boost.process. BUT how much effort is it really to write a global function in a project that uses std::process, that looks like this?
template<typename ... Args>
std::process launch(const std::string & file, typename ... Args) {return std::process(std::this_process::environment::find_exe(file), std::forward<Args>(args)...);}
I sort of changed my mind on that issue: short doesn't mean better and I rather have something expressive that's a bit longer, than a shortcut that's ambiguous. In C++ that is, don't look at my Javascript...
With that in mind, the flipside is to empower the ProcessLauncher
. So, the default starts the process directly and requires a full path, BUT we can provide different launchers even in the STL. Namely the shell_launcher
:
std::process("gcc --version", { /* we could pass additional arguments to the launcher, maybe */}, shell_launcher());
//same as
std::system("gcc --version");
So the shell launcher grabs the shell command from the environment and does it's thing.
Now based on that, people can go nuts with more launchers, for example (as platform extentions).
the vfork_launcher
we already discussed, which fundamentally changes the way you start the process.
You could just replace the shell command:
power_shell_launcher
for windows, run processes through the powershellbash_launcher
for linux, processes through bash instead of through sh
Or alternatively, the shell_launcher
can take an argument for the shell, that defaults to std::this_process::environment::shell()
, i.e. I could to this:
std::process("foo", {}, std::shell_launcher("/bin/bash"));
One advantage for the shell_launcher
would also be, that you could inherit from it, and implement some security rules on the inherited one. That might help a lot to enforce rules against shell injection in a large project.
Furthermore tools & libraries could provide their own launchers, like these:
docker_launcher
, a launcher to inject a process into a running docker container:
docker_launcher dl("my_awesome_docker"); //starts it
const auto proc1 = std::process("foo", {}, dl);
const auto proc2 = std::process("foo", {}, dl);
python_multiprocess_launcher
, allowing you to use this: https://docs.python.org/3/library/multiprocessing.html
Please let me know if that makes sense. The basic question is if we want to have the shell_launcher
, skip shell execution or built it into the core, i.e. into every launch.
So as we've probably already discussed, I believe the crux of this comes down to not using the process constructor to spawn. So you'd always have to construct and then call a function like 'spawn' which could report an error with a return. Something along the lines of
process p (...);
optional<system_error> err = p.spawn(); //noexcept
Things I can see against this:
I think we don't want to do this. The reason for exit_code is so you can write portable code. The reason for native_exit_code is so you can write platform specific code.
This is an easy one, just drop the process group. It can always be brought back later.
This one is for @klemens-morgenstern to answer.
Mattias G: If you call terminate on a process does it block?
JG: I'm not sure, will get back to you.
// Provides an initializers for the standard io. Alternative: nested as std::process::io
class process_io;
// Provides a way to set the starting directory of the new process. Alternative: nested as std::process::start_dir
What you have looks good to me - going to drop the alternative text.
If there is I'm missing it.
The other pattern dealing with wchar_t
seems to be in path
, which allows to pass a locale
in in order to modify the char conversion. We should consider if we need something similar for passing in the argument-vector.
So I pushed an early version of the pdf. I did this just so others could have a look at how things are laying out. This is not nearly as important as fixing content at this stage as I can tweak this right up to the end without external help. I clearly need to tweak the page formatting and a number of other usage factors which should improve the layout.
All that said, it quite a long proposal so we should think of what may be expendable. In my mind I'm considering removing all of the technical specification with the exception of 'header process' and 'header pstream'. So what would remain is basically a list of all the classes, functions, and types without all the details. Opinions and thoughts appreciated.
I'm a huge fan of what is now called Design Discussion and Examples. That's basically this part of the paper. Really I think that documents 99% of what people need to know.
@JeffGarland Should ramblings like this go into the std proposal?
We COULD use designated initializers like so:
template<typename In = FILE*, typename Out = FILE*, typename Err = FILE*>
struct process_io
{
In in = stdin;
Out out = stdout;
Err err = stderr;
};
auto test()
{
return process_io{.in= stdin, .err = stderr};
}
This easens the syntax, but captures by copy. In boost::process
everything is captured by reference, but conceptually, i actually don't dislike this. If we use this with a pipe, what you usually want to to assign one pipe end to the child, and then close it on the father. This would look like this:
auto [in, out] std::pstream{}:
auto init = std::process_io{.out = std::move(in)};
The downside is that a pstream
is copyable (which clones the handle), so that the naive solution would lead to potential deadlocks:
auto [in, out] std::pstream{}:
auto init = std::process_io{.out = in};
This could be avoided by making pipes not-copyable, which might be the best solution, since fstream
can't be copied either. There is however a difference between pipes & files, in that files may be reopened by their name, while there's no way to do so with a pipe. But, I don't see any application for this - plus, if we can get the native_handle from a pipe and can construct another pipe from the handle, you can easily implement this corner case on your own.
I believe the answer here is that the process can complete during the is_open check and hence the process object needs to be updated.
@klemens-morgenstern can you confirm?
I implemented a process_group (which we removed from the proposal) on posix & windows with the following API for the reference implementaiton.
struct process_group
{
typedef ... native_handle_type;
native_handle_type native_handle() const { return _group.native_handle(); }
process_group() = default;
explicit process_group(native_handle_type handle);
process_group(const process_group &) = delete;
process_group(process_group && ) = default;
process_group &operator=(const process_group &) = delete;
process_group &operator=(process_group && ) = default;
template<detail::process_initializer<default_process_launcher> ... Inits>
pid_type emplace(const std::filesystem::path& exe, std::initializer_list<std::wstring_view> args, Inits&&... inits);
// Construct a child from a property list and launch it with a custom process launcher
template<process_launcher Launcher, detail::process_initializer<Launcher> ... Inits>
pid_type emplace(const std::filesystem::path& exe,
std::initializer_list<std::wstring_view> args,
Inits&&... inits,
Launcher&& launcher);
// Construct a child from a property list and launch it.
template<detail::process_initializer<default_process_launcher> ... Inits>
pid_type emplace(const std::filesystem::path& exe, std::initializer_list<std::string_view> args, Inits&&... inits);
// Construct a child from a property list and launch it with a custom process launcher
template<process_launcher Launcher, detail::process_initializer<Launcher> ... Inits>
pid_type emplace(const std::filesystem::path& exe,
std::initializer_list<std::string_view> args,
Inits&&... inits,
Launcher&& launcher);
// Construct a child from a property list and launch it.
template<typename Args, detail::process_initializer<default_process_launcher> ... Inits>
pid_type emplace(const std::filesystem::path& exe, Args&& args, Inits&&... inits);
// Construct a child from a property list and launch it with a custom process launcher
template<process_launcher Launcher, typename Args,
detail::process_initializer<Launcher> ... Inits>
pid_type emplace(const std::filesystem::path& exe,
Args&& args,
Inits&&... inits,
Launcher&& launcher);
pid_type attach(process && proc);
bool contains(pid_type proc);
void terminate();
void wait();
pid_type wait_one();
};
The implementation is rather easy, the only thing I have not done is the async_wait
since this would require a new IO object for asio. I don't want to spend too much time on that, since this was just a PoC for me.
TLDR: process_group
is easy enough, but I do see why it does not need to be part of the core proposal.
Note: The process.pdf does contain this class, p1750r1.pdf not.
In boost.process as well as in P1276R0 the environment of the current object is an object. I am not sure this is the right model, since the functions do actually manipulate a global value of the process. I think we could also design those as global functions in a namespace, like so:
std::this_process::environment::get("foo") -> string;//get an environment variable
std::this_process::environment::set("foo", "bar"); //set one
std::this_process::environment::erase("foo"); //delete it
//Get an object with the actual handle of the global object. In windows this has to be created & delete by a function; on posix it's a global char**
std::this_process::environment::raw() -> (char** for posix | smart_ptr with customer deleter on windows);
std::this_process::environment::path() -> std::vector<std::filesystem::path>;
std::this_process::environment::shell() -> std::filesystem;
//Search through the path -> on windows, take extensions into account.
std::this_process::environment::find_executable(std::string & name) -> std::filesystem::path;
//Construct a copy we use in the child process.
std::environment env = std::this_process::environment::raw();
Thoughts?
@eliaskosunen I think we want these now to be span correct?
// Read some data from the handle.
// See the Networking TS for more details.
template<class MutableBufferSequence>
size_t read_some(const MutableBufferSequence& buffers);
and so on...
I don't think we can rely on P1275R0
to be standardized (I think the global std::arguments
will not be accepted) I think we should have a more generic approach to passing environments into a process, than relying on once class. We could add an initializer
, e.g. process_env
or have a full featured environment
class.
The big need of environment functions arises on the side of the child process, i.e. when a child queries it's own environment. For example, searching for an executable that emulates the shell behaviour would be quite common for starting processes, but is quite different between platforms:
On windows, a certain file name is search in the PATH
(or Path
sometimes) variable with the addition of the extensions available in PATHEXT
. On posix no such extensions exist, thus a function like std::vector<std::filesystem::path> std::this_process::env::find_executable(std::string & name)
.
Similarly, one might want to get the default shell of the OS, which on posix is stored in SHELL
while it's deducable through windir
on windows. This could be handled by a std::filesystem::path std::this_process::env::shell()
function.
It is to now however, that this information is queried before process start and thus usually queried from the father environment. Thus adding those specialized functions to an environment that is used to initialize a child process, should be considered of little to no priority. If another paper does provide such a class, we should support it.
If we discount specific values in an environment, what are the semantics ?
=
=
is a unique key=
is the value, that is a :
(posix) or ;
(windows) seperated list.With these three characteristics we can define what types are acceptable as an environment to be set by the process_env
:
range<convertible_to<string_view>>
range<convertible_to<pair<string_view, string_view>>
range<convertible_to<pair<string_view, range<string_view>>>
struct process_env {
//string range
template<ranges::range T>
requires (
requires(T t) {{*ranges::begin(t)} -> convertible_to<string_view>;}
)
process_env(T && t);
//string pair range
template<ranges::range T>
requires (
requires(T t) {{get<0>(*ranges::begin(t))} -> convertible_to<string_view>;}
&& requires(T t) {{get<1>(*ranges::begin(t))} -> convertible_to<string_view>;}
)
process_env(T && t);
//string pair vector range
template<ranges::range T>
requires (
requires(T t) {{get<0>(*ranges::begin(t))} -> convertible_to<string_view>;}
&& requires(T t) {{get<1>(*ranges::begin(t))} -> ranges::range;}
&& requires(T t) {{*ranges::begin(get<1>(*ranges::begin(t)))} -> ranges::range;}
)
process_env(T && t);
};
This this overloaded we can now use different containers, e.g. vector<string>
, unordered_map<string, string>
and unordered_map<string, vector<string>>
.
The environment class proposed in P1275R0 does fulfill the first constraint as well.
Unfortunately I'm not sure now if the question was 'are pipes needed' in V1. Basically there's nothing else on Windows so the use of named pipes is more for consistency than anything else if I'm not mistaken.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.