connelldave / botocove Goto Github PK

A simple decorator to run Python functions across multiple AWS accounts, OUs and/or regions, with or without an AWS Organization.

Home Page: https://pypi.org/project/botocove/

License: GNU Lesser General Public License v3.0

Python 100.00%

botocove's People

Contributors

Stargazers

Watchers

Forkers

iainelder aperov9 ranman kjm0001 tomekklas jonathanhle doytsujin ryno75

botocove's Issues

Why are JoinedTimestamp and JoinedMethod missing from the results?

For the same reason I closed #34, I'm fine with this extra metadata not being handled by Botocove. Botocove should focus on the iteration of accounts and the collection of function results.

But I wanted to check because the data is right there so maybe it was just missed.

I've just run into a situation where I need the JoinedDate of the account to explain some behavior.

For now I can solve it by listing the accounts separately and joining by the account ID to compare the JoinedDate to the function result.

Make the progress bar smoother

If one of the accounts takes a long time to return a result, it holds up the progress bar. In extreme cases a user may believe that cove has frozen.

This can happen if you are scanning an organization for one type of resource and you happen upon one account that has an abundance of them.

The progress bar will jump ahead when the slow account finally returns.

I've put together an experiment to test the user interface. It run across an account with just two threads. It simulates an AWS org by repeatedly logging into one account. The function passed to cove has a short delay before returning some timing info. About halfway through the account iteration, one of the accounts will have a long pause.

With the current implementation it shows that despite the lack of feedback cove still works away with the unblocked thread.

Instead I would like to see the progress bar increment as soon as any account returns a result.

You can see how it works in this asciinema terminal capture.

Here's the experimental timing code. To run it, paste it into a file called cove_demo.py at the project root and export the the COVE_ACCOUNT_ID environment variable to simulate an org using just that account.

import os
from time import perf_counter
from typing import Final
from time import sleep
import sys
from threading import Lock
from pprint import pprint
from itertools import chain
from boto3 import Session
from botocove import cove
import botocove

botocove.cove_host_account.CoveHostAccount._resolve_target_accounts = lambda self, targets: targets

lock = Lock()
i = 0
t_0 = perf_counter()

ORG_SIZE: Final = 40
SLOW_i: Final = ORG_SIZE // 2

def counter():
    global i
    with lock:
        i += 1
        return i

def delay(session: Session):

    t_start = round(perf_counter() - t_0, 2)

    i = counter()
    
    if i == SLOW_i:
        speed = "SLOW"
        pause = 10
    else:
        speed = "FAST"
        pause = 0.5
    
    sleep(pause)

    t_end = round(perf_counter() - t_0, 2)
    
    duration = round(t_end - t_start, 2)

    return i, speed, t_start, duration


def main():
    account_id = os.environ["COVE_ACCOUNT_ID"]
    result = cove(
        delay,
        target_ids=[account_id] * ORG_SIZE,
        thread_workers=2
    )()
    for r in chain.from_iterable(result.values()):
        for k in list(r.keys()):
            if k not in ["Result", "ExceptionDetails"]:
                del r[k]
    pprint(result)


if __name__ == "__main__":
    main()

Improve visibility while botocove runs

Can we log per-thread? Both wrapping the async func call or patching the singleton logger in async func calls?
Rework the progress bar: look at how we have N threads (20 default): break out info mid-run. Can't give progress on func very easily, but we could report running threads, target account and success/fail?
In a "session plus work" queue as per #17 - we could report first pass success and then break out a retry bar?

A PyTest-style CLI to run cove-annotated functions

To make botocove useful as a data collection tool you still need to write a Python program around it to format and output the results. I believe it would be helpful to have a generalized implementation of this so that as a user I just have to think about the results I want to return from the botocove function.

PyTest is a good example of how to make the domain code clean and declarative because all the plumbing of detecting tests, running them, and presenting the results is in the PyTest CLI itself. All we as users have to provide are some test functions and fixtures.

This would be a generalization of what I'm trying to achieve with my tool aws-org-inventory.

The command line interface of aws-org-inventory is convenient, but it can query only one API at a time. This makes it really slow to collect info on multiple resources because it requires multiple scans of the organization.

Another approach could be a general CLI interface to botocove. It would run the cove-annotated function in a given file and output a JSON representation of the main result object. This would allow me to write a function that collects all the data in a single session and would allow me to format it appropriately for a posterior JSON-to-CSV convertion.

Add ability to set session policies for assumed roles

The default OrganizationAccountAccessRole has the AdministratorAccess policy.

For collecting inventory, I need at most a read-only policy such as ReadOnlyAccess or SecurityAudit.

Botocove supports assuming other roles, but it doesn't help if we don't already have a role that attaches the read-only policies.

This could also be solved on botocove's side by adding options to set the Policy and PolicyArns parameters of assume_role.

botocove/botocove/cove_decorator.py

Lines 57 to 59 in 77b015b

    
           creds = sts_client.assume_role( 
        
               RoleArn=role_arn, RoleSessionName=role_session_name 
        
           )["Credentials"]

I see that the RoleSessionName parameter gained support in a similar way.

#9 (issue)
#10 (pull request)

I think I would be able to contribute a PR for this functionality. My main doubt is about the test coverage. Can you give some guidance here?

How to reduce memory usage?

When I run botocove in an interactive session, or when I run it as part of aws-org-inventory, and query a sufficiently large organization, the process consumes so much memory so as to destabilize the operating system. (Ubuntu gets ugly when it runs out of memory.)

It appears that the amount of memory required is proportional to the number of accounts in the organization.

I haven't had time to study it carefully, but today I did watch the system activity monitor while running aws-org-inventory across a large organization (around 1000 accounts).

The memory consumption started in the low MBs, which is normal for Python, and increased steadily as it queried each account. When it finally completed processing, it had consumed 4GB of memory. (The thing is that my work machine has only 8GB of memory 😅 )

Is there a different programming technique I should use as a client to avoid this?

Or can something be changed in the library, such as using a generator to yield results instead of collecting a huge object?

Raises TypeError for some custom exceptions

I run botocove with a input function that raises some custom exceptions.

When the custom exception has the same initializer signature as Exception botocove works as expected.

But one of the exceptions has a different initializer signature. And when it is raised, botocove fails with a TypeError when constructing the main CoveOutput object.

An exception like this will cause the problem:

class FancyException(Exception):
    def __init__(self, *, fancy_thing):
        self.fancy_thing = fancy_thing
        Exception.__init__(self, f"Problem with the {fancy_thing=}")

The problem occurs in the dataclass_converter function. See the TypeError in the output below.

I see the intention of the dataclass_converter is to convert all the output types to dicts, but I'm not sure why it needs to do that. A comment in the decorator function suggests it exists for backwards compatibility.

So you may reproduce the result, I have provided some demo code that defines two input functions and two dataclass_converter implementations. Botocove runs once for all possible function-converter pairs and prints the results.

The input function normal_fail raises an exception with the usual initializer signature.

The input function fancy_fail raises an exception whose initializer signature has a required keyword parameter.

The converter dataclass_converter is the same one used currently by botocove.

The converter identity returns its input unchanged.

The test also disables tqdm to avoid distractions in the output.

Demo code:

from functools import partial, update_wrapper
from itertools import product, starmap
from pprint import pprint
import sys
import traceback

import boto3
from botocove import cove
import botocove.cove_decorator


class NormalException(Exception):
    pass

class FancyException(Exception):
    def __init__(self, *, fancy_thing):
        self.fancy_thing = fancy_thing
        Exception.__init__(self, f"Problem with the {fancy_thing=}")


def normal_fail(session):
    raise NormalException("normal error")


def fancy_fail(session):
    raise FancyException(fancy_thing="fancy error")


def run_cove_and_return_result_or_exception(target_account_id, func, converter):

    botocove.cove_decorator.dataclass_converter = converter

    try:
        result = cove(func, target_ids=[target_account_id])()
        return result
    except Exception as ex:
        return traceback.format_exc()


dataclass_converter = botocove.cove_decorator.dataclass_converter


def identity(i):
    return i


def disable_tqdm():

    def tqdm_passthrough(iterable, **kwargs):
        return iterable

    botocove.cove_sessions.tqdm = tqdm_passthrough
    botocove.cove_runner.tqdm = tqdm_passthrough


def present_func_output_for_args(cove, func, conv):
    output = cove(func, conv)
    case_header = f"Output for case {func.__name__}, {conv.__name__}:"
    print(case_header)
    print("=" * len(case_header))
    print()
    print(output)
    print()


def main():

    account_id = sys.argv[1]

    disable_tqdm()

    run_cove_in_account = update_wrapper(
        partial(run_cove_and_return_result_or_exception, account_id),
        run_cove_and_return_result_or_exception
    )

    cases = product(
        (normal_fail, fancy_fail),
        (dataclass_converter, identity)
    )

    for func, conv in cases:
        present_func_output_for_args(run_cove_in_account, func, conv)


if __name__ == "__main__":
    main()

Output of demo code:

$ python demo.py 111111111111
Output for case normal_fail, dataclass_converter:
=================================================

{'FailedAssumeRole': [], 'Results': [], 'Exceptions': [{'Id': '111111111111', 'Arn': 'arn:aws:organizations::222222222222:account/o-aaaaaaaaaa/111111111111', 'Email': '[email protected]', 'Name': 'Target 1', 'Status': 'ACTIVE', 'AssumeRoleSuccess': True, 'RoleSessionName': 'OrganizationAccountAccessRole', 'ExceptionDetails': NormalException('normal error')}]}

Output for case normal_fail, identity:
======================================

{'Exceptions': [CoveSessionInformation(Id='111111111111', Arn='arn:aws:organizations::222222222222:account/o-aaaaaaaaaa/111111111111', Email='[email protected]', Name='Target 1', Status='ACTIVE', AssumeRoleSuccess=True, RoleSessionName='OrganizationAccountAccessRole', Policy=None, PolicyArns=None, Result=None, ExceptionDetails=NormalException('normal error'))],
 'FailedAssumeRole': [],
 'Results': []}

Output for case fancy_fail, dataclass_converter:
================================================

Traceback (most recent call last):
  File "demo.py", line 36, in run_cove_and_return_result_or_exception
    result = cove(func, target_ids=[target_account_id])()
  File "/home/isme/.local/share/virtualenvs/faa89da738127ef/lib/python3.8/site-packages/botocove/cove_decorator.py", line 61, in wrapper
    Exceptions=[dataclass_converter(e) for e in output["Exceptions"]],
  File "/home/isme/.local/share/virtualenvs/faa89da738127ef/lib/python3.8/site-packages/botocove/cove_decorator.py", line 61, in <listcomp>
    Exceptions=[dataclass_converter(e) for e in output["Exceptions"]],
  File "/home/isme/.local/share/virtualenvs/faa89da738127ef/lib/python3.8/site-packages/botocove/cove_decorator.py", line 17, in dataclass_converter
    return {k: v for k, v in asdict(d).items() if v}
  File "/usr/lib/python3.8/dataclasses.py", line 1073, in asdict
    return _asdict_inner(obj, dict_factory)
  File "/usr/lib/python3.8/dataclasses.py", line 1080, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
  File "/usr/lib/python3.8/dataclasses.py", line 1114, in _asdict_inner
    return copy.deepcopy(obj)
  File "/usr/lib/python3.8/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/lib/python3.8/copy.py", line 264, in _reconstruct
    y = func(*args)
TypeError: __init__() takes 1 positional argument but 2 were given

Output for case fancy_fail, identity:
=====================================

{'Exceptions': [CoveSessionInformation(Id='111111111111', Arn='arn:aws:organizations::222222222222:account/o-aaaaaaaaaa/111111111111', Email='[email protected]', Name='Target 1', Status='ACTIVE', AssumeRoleSuccess=True, RoleSessionName='OrganizationAccountAccessRole', Policy=None, PolicyArns=None, Result=None, ExceptionDetails=FancyException("Problem with the fancy_thing='fancy error'"))],
 'FailedAssumeRole': [],
 'Results': []}

Can we do without DescribeAccount?

CoveSession calls DescribeAccount once for each member account in the organization.

But CoveHostAccount also calls ListAccount and pages it out fully, although everything except the account ID gets discarded.

DescribeAccount and ListAccount return the same attributes for each account.

I'm wondering if it would be possible to retain the list of AccountTypeDef instead of just a set of IDs so that CoveSessionInformation objects can be built without calling DescribeAccount.

The practical reason for doing that would be to avoid DescribeAccount's throttling errors (see #12 and #17). They happen less now, but you can still get them if you give cove a function that returns quickly enough.

Once, to avoid errors in downstream tooling, I needed to filter my organization's account list to only those that have a working OrganizationAccountAccessRole. Cove does that almost for free :-) But because my function was almost a no-op, the DescribeAccount API would sometimes fail.

Would it be possible to use the decorator session within @cove() annotation to load the regions that are relevant for session account?

Would it be possible to use the decorator session within @cove() annotation to load the regions that are relevant for session account?

It is common-place to have regions disabled from the account configuration (opt-in regions) and using the Management Account (or any other) regions as the list often result in An error occurred (UnrecognizedClientException) when calling the ListTrails operation: The security token included in the request is invalid exceptions due to region being disabled/not opted-in in Account -> AWS Regions AND/OR due to Global STS Endpoint issued tokens being only valid on regions enabled by default unless explicitly changed by the user in IAM -> Security Token Service (STS) -> Global endpoint

Another side effect of not using the account's enabled regions is that, you can miss regions that are not enabled/opted-in in the account, .i.e. Management Account.

There are currently ~10 regions that requires opt-in
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html?icmpid=docs_iam_console#id_credentials_region-endpoints

Originally posted by @alencar in #55 (comment)

Add a progress bar for loading organization accounts

In a big org, loading the organization accounts takes several seconds.

In that time cove provides no feedback, so it's hard to know how much progress is being made.

If I know how big my org is and I can see how many accounts are loaded so far, I then have a better idea of how long to wait before my function starts running.

I think we can use tqdm for this by wrapping it around the iterator over the organization accounts.

Enabled opt-in region gives AuthFailure

Alexandre Alencar points out in issue 74 that Botocove can't access any opt-in region. The original issue got muddled so I'm restating the problem here with a simple repro so that I may fix it.

The target account's eu-central-1 region has opt-in status ENABLED_BY_DEFAULT and its eu-central-2 region has opt-in status ENABLED (was previously DISABLED). See Appendix 1 for how to query the opt-in status of the regions.

Use botocove to echo the region name given by the EC2.DescribeAvailabilityZones API.

from botocove import cove
from itertools import chain

response = cove(
    lambda s: (
        s.client("ec2").describe_availability_zones()
        ["AvailabilityZones"][0]["RegionName"]
    ),
    target_ids=["111111111111"],
    regions=["eu-central-1", "eu-central-2"]
)()

for result in chain(
            response["FailedAssumeRole"],
            response["Exceptions"],
            response["Results"],
        ):
    print(
        repr(
            {
                k: v
                for k, v in result.items()
                if k in {"Id", "Region", "Result", "ExceptionDetails"}
            }
        )
    )

Region eu-central-1 echoes its name and region eu-central-2 gives an AuthFailure error.

{'Id': '111111111111', 'Region': 'eu-central-2', 'ExceptionDetails': ClientError('An error occurred (AuthFailure) when calling the DescribeAvailabilityZones operation: AWS was not able to validate the provided access credentials')}
{'Id': '111111111111', 'Region': 'eu-central-1', 'Result': 'eu-central-1'}

The expected behavior is that region eu-central-2 also echo its name.

Appendix 1: Query region opt-in status

You can use the Account parameter of the Accounts.ListRegions API but you first need to enable trusted access in the organization. If you haven't done that you can instead use botocove to check the opt-in status of the target regions.

from botocove import cove

response = cove(
    lambda s: s.client("account").list_regions(), target_ids=["111111111111"]
)()

[
    r
    for r in response["Results"][0]["Result"]["Regions"]
    if r["RegionName"] in {"eu-central-1", "eu-central-2"}
]

The eu-central-1 region has opt-in status ENABLED_BY_DEFAULT, which means it is always enabled. The eu-central-2region has opt-in status ENABLED, which means it was DISABLED until I changed it.

[{'RegionName': 'eu-central-1', 'RegionOptStatus': 'ENABLED_BY_DEFAULT'},
 {'RegionName': 'eu-central-2', 'RegionOptStatus': 'ENABLED'}]

Validate regions from a known set

Today I discovered my botocove run failed because I misspelled a region name: eu-noth-1. But I only discovered it after a long run across the whole org.

I'd like to discover silly mistakes like this as early as possible in the process.

We can use boto3's model to discover the set of known regions.

I think emitting a warning would be more appropriate than raising an exception here because the model may be out of date, i.e., the named region may be so new that the user's boto3 version doesn't yet include it in the model.

You would get the set of known regions from the model like this:

s = boto3.session.Session()
{r for p in s.get_available_partitions() for r in s.get_available_regions("ec2", p)}

Alternatively, you could get the set of known regions by called EC2's DescribeRegions API. There is even an example of this in the README. It would require extra permissions in the cove host account.

Run Actions for all contributions

Last week's PR #64 had a bit of back-and-forth because of delayed CI feedback to a new contributor.

Dave explained that we currently err on the side of safety over convenience

To avoid being able to hijack Actions, they only run for commits directly to this repo by maintainers. Otherwise someone could fork and cat out the deploy token for pypi :)

I'd like to find a way to improve the CI experience for all contributors without compromising the project's safety.

Today I learned that there is a name for the threat: Poisoned Pipeline Execution.

Omer Gil of Cider Security has a good write up of the threat and some recommendations for how to handle it, including how to handle access to secrets.

I don't remember where I first read about the problem or potential solutions, but it's clearly a common one. There may be some existing solutions for Github actions that we can reuse.

Continue to support Python 3.8

I see that the minimum Python version supported by botocove is now 3.9.

botocove/pyproject.toml

Line 12 in 2998813

python = "^3.9"

This breaks botocove for me.

I've been using Python 3.8 for everything right now because that's what ships with Ubuntu 20.

Why has the minimum version been increased to Python 3.9?

If it has to be that way, how do I set up Python 3.9 in my projects?

I get an errors like the below now when developing botocove.

$ poetry run pre-commit run --all-files

Current Python version (3.8.10) is not allowed by the project (^3.9).
Please change python executable via the "env use" command.
$ poetry env use 3.9
pyenv: python3.9: command not found

The `python3.9' command exists in these Python versions:
  3.9.9

Note: See 'pyenv help global' for tips on allowing both
      python2 and python3 to be found.

Command python3.9 -c "import sys; print(sys.executable)" errored with the following return code 127, and output:

As a workaround for now I'll pin my other projects that use botocove to version 1.7.0.

Add some helper functions or functionality

Having commonly wrapped functions importable, as well as more tricky ones, might be useful.

Some ideas:

A check-func that just tries to assume roles in accounts given func args
A multi-region helper that runs a Cove func in every region? Depends if a Boto session can inherit a regional default?

botocove sometimes crashes when describing accounts

Sometimes I get an error like this when I run botocove:

TooManyRequestsException: An error occurred (TooManyRequestsException) when calling the DescribeAccount operation (reached max retries: 4): AWS Organizations can't complete your request because another request is already in progress. Try again later.

I'm running it in an organization with 915 accounts.

Can we do something to make it more robust here?

Do you need a complete example of what I'm doing? I haven't included example code here because I'm under the impression that this could happen with any query at the right scale.

Support alternative AWS Paritions (e.g. GovCloud `aws-us-gov`)

The current version of botocove is unable to run against any non-default (i.e. aws) AWS Paritions.
This means that neither AWS GovCloud or China Partitions/Regions are supported.

Recommending adding a cove decorator argument to allow for an override with a default value derived from the default session (i.e. host account) called identity.

For further detail on AWS Partitions see: https://docs.amazonaws.cn/en_us/general/latest/gr/aws-arns-and-namespaces.html

How to track query progress?

First let me thank you for this tool. It's a game changer! botocove is the best tool I know for ad-hoc analysis across an organization.

Currently I'm working with two organizations that each have in the order of 500 to 1000 accounts.

Across such large organizations, botocove takes hundreds of seconds to return a result. Anecdotally, depending on network conditions, I can wait between 120 and 300 seconds to get a result.

That's still good enough for interactive use, but it would be helpful to get some kind of "loading bar"-style feedback to know how long I should expect to wait.

I've considered adding a counter to the function wrapped by botocove. I've not tried it yet, but I guess it would work. I would need to run botocove in a second thread to be able to check the counter value.

Another solution could be to make botocove return immediately and run in the background. It would return an object with a blocking call to get the result and other calls to get the number of queries in progress, the number completed, the number remaining, and so on.

Is that something you have already considered?

Allow custom RoleSessionName

Hello,

It would be quite useful for me if I was able to pass a custom RoleSessionName to the cove decorator instead of using the default (which is the rolename variable). My organization has defined some conditions on assuming roles and using the RoleName as the RoleSessionName does not satisfy these conditions.

I took a look at the code and it seems that this could be a minor change, I could make a small PR with the changes to the code and documentation if you feel like this would be welcome.

Include OrganizationId in each CoveSessionInformation

I'm aggregating inventory from multiple organizations. I do that by running cove in a loop over each management account.

To keep the information about which account came from which organization I use another loop like this over each cove result:

        for r in chain.from_iterable(result.values()):
            r["OrganizationId"] = r["Arn"][44:56]

It modifies all the CoveSessionInformation objects to include an OrganizationId key of the form o-1111111111. The value is extracted from the ARN of the account.

Would you consider adding something like this to the CoveSessionInformation itself?

I suppose it wouldn't make sense when org_master == False.

Bump moto to 4.x

Upgrading moto to 4.x broke the test suite intermittently, and seemingly only in CI or on first run from a fresh container.

I lost far too much time working out what was even happening to persist any further, so moto is now pinned to a 3.x version that worked.

One day we should bump to the new moto release, but maybe after giving it a few months.

Is botocove compatible with Python 3.11, 3.12?

Sceptre wears a badge in its README to show its Python feature version compatibility.

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sceptre?logo=pypi)](https://pypi.org/project/sceptre/?logo=pypi)

The badge today shows correctly that Sceptre supports Python feature versions 3.7, 3.8, 3.9, 3.10, and 3.11.

A site called shields.io hosts the API to generate the badge. I used it to generate a badge for botocove.

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/botocove)

The result surprised me. The badge shows that botocove is compatible with Python feature versions 3.8, 3.9, and 3.10.

What about 3.11 and 3.12?

I don't see anything obvious in pyproject.toml that would block these versions, but there may be a dependency that causes it. (My guess is something unmaintained such as flakeheaven is dragging us down.)

Add support for multiple regions

I find myself doing a lot of multi-region work these days.

So in all my multi-region botocove scripts I've started using a variation of helper functions like run_across_org and run_in_each_region in the demo script below.

When it works, it's great, because you can get the output from each region in each account's result key something like this.

{'Exceptions': [],
 'FailedAssumeRole': [],
 'Results': [{'Arn': 'arn:aws:organizations::111111111111:account/o-aaaaaaaaaa/222222222222',
              'AssumeRoleSuccess': True,
              'Email': '[email protected]',
              'Id': '222222222222',
              'Name': 'Demo Account 1',
              'PolicyArns': [{'arn': 'arn:aws:iam::aws:policy/AmazonVPCReadOnlyAccess'}],
              'Result': [{'RegionName': 'eu-west-1',
                          'Result': 'vpc-22222222222222222'},
                         {'RegionName': 'us-east-1',
                          'Result': 'vpc-11111111111111111'}],
              'RoleName': 'OrganizationAccountAccessRole',
              'RoleSessionName': 'OrganizationAccountAccessRole',
              'Status': 'ACTIVE'},

from botocove import cove
from pprint import pprint


def main():
    results = get_org_default_vpc_ids()
    pprint(results)


def get_org_default_vpc_ids():
    org_func = run_across_org(
        get_default_vpc_id,
        regions=["eu-west-1", "us-east-1"],
        policy_arns=[{"arn": "arn:aws:iam::aws:policy/AmazonVPCReadOnlyAccess"}],
    )
    return org_func()


def get_default_vpc_id(session, region):
    ec2 = session.client("ec2", region_name=region)
    resp = ec2.describe_vpcs(Filters=[{"Name": "is-default", "Values": ["true"]}])
    if len(resp["Vpcs"]) > 0:
        return resp["Vpcs"][0]["VpcId"]
    return None


def run_across_org(func, target_ids=None, regions=None, policy_arns=None):
    def _across_org(*args, **kwargs):
        return cove(
            run_in_each_region(func, regions),
            target_ids=target_ids,
            policy_arns=policy_arns,
        )(*args, **kwargs)
    return _across_org


def run_in_each_region(func, region_names):

    def _in_each_region(session, *args, **kwargs):

        return [
            {
                "RegionName": rn,
                "Result": func(session, rn, *args, **kwargs)
            }
            for rn in region_names
        ]

    return _in_each_region


if __name__ == "__main__":
    main()

The main problem with this implementation is that it's not very robust. An error in any region will scupper results from all the other regions and all you'll be left with is an exception.

A lesser problem is performance because each region is accessed serially, although this is probably mitigated by the account-level concurrency.

Another problem with this implementation (perhaps more theoretical than practical) is the lack of flexibility. It will always iterate over the cartesian product accounts and regions. It may be useful to be able to iterate over a different set of regions per account. (Perhaps using an input mapping of account IDs to region names.)

It would be great if the cove function would handle this automatically. So I'm sharing my rough implementation here to get some feedback in the hope that it can be improved upon and ultimately integrated into the library.

The get_default_vpc_id function in the demo code shows how new client code would work. The client function would take a session and a region, and it would be the client's responsibility to create a boto3 client with the correct region set. Botocove itself would be responsible for ensuring the function gets called once for all the account-region pairs.

To be backwards compatible with client code that doesn't pass any region info or a function that takes a region parameter, I suppose the default behavior would be to behave as before and use only the main boto3 session's region.

Handle 5k account limit throttling against org-level API calls (sts:assumerole and org:describeaccount)

If I remember correctly the hard limit for AWS accounts in an org is 5k.

Refactor the seperate "get sessions" -> "run func" into threading "sessions and work" per call: changes from many tiny calls -> many long calls to spread tiny:long pairs, should take pressure off AWS side API throttles
Retry queue driver rather than just working through list once and potentially hitting failures
More in-flight visibility? Per-thread logging?

Latest CI is slow and fails

The latest CI run took nearly 4 hours and ended up installing ancient versions moto==1.3.6 and boto3==1.7.84.

I think deleting the lock file may have caused more problems than it solved!

I've started a discussion in the Poetry project to understand the behavior.

Someone there may be able to advise on a better way to constrain the versions.

Add contributor documentation

There was a bit of back-and-forth on last week's PR #64 because the README lacks a "developer" or "contributor" quick start guide.

We should write and link to the standard CONTRIBUTING file format that covers at least the following:

How to set up the development environment (Install Poetry and what else?)
How to install the project (poetry install)
How to run the tests (pre-commit)
What to include in the PR (e.g. no changelog required)

botocore WaiterError causes TypeError

One of my scripts runs a CloudFormation update and uses a waiter to make the operation synchronous.

Botocove appears to iterate over all the accounts, but it crashes before I can print its output.

It turns out that WaiterError is not a copyable exception (see #25 for the analysis about copyable exceptions.)

I can't share the code or the complete trace of what I'm working on. Here's the trace that's relevant to botocove.

Executing function: 100%|██████████████████████████████████████████████| 289/289 [08:52<00:00,  1.84s/it]
Traceback (most recent call last):
[...]
  File "/home/isme/.cache/pypoetry/virtualenvs/fakeproject/lib/python3.8/site-packages/botocove/cove_decorator.py", line 63, in wrapper
    Exceptions=[
  File "/home/isme/.cache/pypoetry/virtualenvs/fakeproject/lib/python3.8/site-packages/botocove/cove_decorator.py", line 64, in <listcomp>
    dataclass_converter(e)
  File "/home/isme/.cache/pypoetry/virtualenvs/fakeproject/lib/python3.8/site-packages/botocove/cove_decorator.py", line 17, in dataclass_converter
    return {k: v for k, v in asdict(d).items() if v is not None}
  File "/usr/lib/python3.8/dataclasses.py", line 1073, in asdict
    return _asdict_inner(obj, dict_factory)
  File "/usr/lib/python3.8/dataclasses.py", line 1080, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
  File "/usr/lib/python3.8/dataclasses.py", line 1114, in _asdict_inner
    return copy.deepcopy(obj)
  File "/usr/lib/python3.8/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/lib/python3.8/copy.py", line 264, in _reconstruct
    y = func(*args)
  File "/home/isme/.cache/pypoetry/virtualenvs/fakeproject/lib/python3.8/site-packages/botocore/exceptions.py", line 28, in _exception_from_packed_args
    return exception_cls(*args, **kwargs)
TypeError: __init__() missing 1 required positional argument: 'last_response'

You can reproduce it easily enough in the REPL:

In [1]: from botocore.exceptions import WaiterError

In [2]: error = WaiterError(
   ...:     name="FakeWaiter", reason="FakeReason", last_response="FakeResponse"
   ...: )

In [3]: from copy import copy

In [4]: copy(error)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [4], in <module>
----> 1 copy(error)

File /usr/lib/python3.8/copy.py:102, in copy(x)
    100 if isinstance(rv, str):
    101     return x
--> 102 return _reconstruct(x, None, *rv)

File /usr/lib/python3.8/copy.py:264, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    262 if deep and args:
    263     args = (deepcopy(arg, memo) for arg in args)
--> 264 y = func(*args)
    265 if deep:
    266     memo[id(x)] = y

File ~/.cache/pypoetry/virtualenvs/bp-1082-udpate-apt-002-product-kAdUhv53-py3.8/lib/python3.8/site-packages/botocore/exceptions.py:28, in _exception_from_packed_args(exception_cls, args, kwargs)
     26 if kwargs is None:
     27     kwargs = {}
---> 28 return exception_cls(*args, **kwargs)

TypeError: __init__() missing 1 required positional argument: 'last_response'

I'm going to raise an issue in the botocore repo to ask that the WaiterError be made copyable like the other client errors.

Why does botocore use the asdict method? It seems unnecessary to recursively copy and covert dataclasses It requires that all output and errors be copyable (and as #25 shows, for exceptions it is non-obvious to implement!) and (although this is another issue it has surprised me a couple of times when the dataclass object I retied was converted to a dict!

Would you consider changing this part to remove the use of the asdict method?

botocove/botocove/cove_decorator.py

Lines 15 to 17 in b36164b

    
           def dataclass_converter(d: CoveSessionInformation) -> Dict[str, Any]: 
        
               """Unpack dataclass into dict and remove None values""" 
        
               return {k: v for k, v in asdict(d).items() if v is not None}

It requires that all the user's output and errors be copyable (and as #25 shows, for exceptions it is non-obvious to implement!).

	creds = sts_client.assume_role(
	RoleArn=role_arn, RoleSessionName=role_session_name
	)["Credentials"]

	def dataclass_converter(d: CoveSessionInformation) -> Dict[str, Any]:
	"""Unpack dataclass into dict and remove None values"""
	return {k: v for k, v in asdict(d).items() if v is not None}