azure / azure-init Goto Github PK

A minimal provisioning agent designed for Azure Linux VMs.

License: MIT License

Rust 95.26% Makefile 1.19% Shell 3.55%

azure-init's Issues

[RFE] retrieve the correct path for authorized key file from sshd config

Current situation

azure-init assumes that the authorized key file is $HOME/.ssh/authorized_keys which might not be what the sshd is configured to look for (/etc/sshd/config/AuthorizedKeysFile)

Impact

azure-init might write the provided public key to a place that is not expected by sshd, which will render the VM inaccessible via the provided key

Ideal future situation

azure-init should process the sshd config to correctly determine where to write the authorized key file.

**Implementation options

We likely need a module to handle various ssh related functionality. #67 will also benefit from such module

libazureinit currently does not bounce DHCP to publish hostname to Azure platform DNS

Description

Due to starting late (after network-online), azure-init ends up setting the computer name after DHCP. This has the side effect of not registering the correct hostname to DNS (as the image's stale hostname was what got sent in DHCP).

Impact

Hostname lookup for this VM will fail for other VMs that are in the same VNET or part of VNET peering.

Additional Information

This won't be an issue if we solve #35. Otherwise we'll have to restart network stack (e.g., NetworkManager)

Refactor to support testing

Fix unit test failures of test_get_ovf_env_missing_{password,three}

Running cargo test results in failures like:

test media::tests::test_get_ovf_env_missing_password ... FAILED
test media::tests::test_get_ovf_env_missing_three ... FAILED

It should be fixed before working on GitHub Actions CI.

[RFE] support for defining default user groups during build time

Current situation

Package maintainers are unable to set default user groups during build time.

Impact

Distros vary on what groups they assign to users by default. For example, Ubuntu, grants the groups: adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, and video. While Azure Linux grants the groups: wheel and sudo.

The recent Builder style API revisions have opened up the ability to define custom user groups when provisioning a user, via the .with_groups() method. However, the solution is not exposed at build time for distro maintainers to utilize this new feature.

Ideal future situation

During build time of azure-init, distro maintainers would be able to specify a list of default groups via build arguments or environment variables.

**Implementation options

Option 1)
Introduce a new env variable DEFAULT_USERADD_GROUPS, that is ingested by azure-init's main.rs and pass this value into a call to .with_groups() during provision(). If DEFAULT_USERADD_GROUPS is empty, continue to default to the singular group "wheel".

Option 2)
Introduce a new build argument to the cargo build command, that is ingested by azure-init's main.rs and pass this value into a call to .with_groups() during provision(). If the build argument is empty, continue to default to the singular group "wheel".

Additional information

[RFE] Provide a mechanism to configure behavior of azure-init at runtime

Currently this is not a major issue as we don't have a lot of configurable options. However, as we start adding more features to make azure-init more useful and robust (such as logging, retries), we might need to provide a way for the users to configure azure-init behavior (such as where to put logging, the level of logging that should go to console, how many minutes to retry, etc...)

We can consider providing a config file in /etc/azure-init/azure-init.conf

CONTRIBUTING.md file

Create a CONTRIBUTING.md file

reference Microsoft CLA
guidelines for building, contributing, etc.
set expectations for review of pull request, responses to issues

SUPPORT.md

Update this file with expectations for support when reporting issues with the project, etc.

image creation script needs to retry for storage blob access error when trying to retrieve boot diagnostic

The error the script throws when it times out during storage blob access is:

az vm boot-diagnostics get-boot-log -g testagent-1709140193 -n testvm-1709140193
ERROR: Client-Request-ID=7771dfa0-d65c-11ee-bf75-95e10ccbd332 Retry policy did not allow for a retry: Server-Timestamp=Wed, 28 Feb 2024 17:11:51 GMT, Server-Request-ID=f31a688f-601e-006a-5069-6a9417000000, HTTP status code=404, Exception=The specified blob does not exist. ErrorCode: BlobNotFoundBlobNotFoundThe specified blob does not exist.RequestId:f31a688f-601e-006a-5069-6a9417000000Time:2024-02-28T17:11:52.0165015Z.
ERROR: The specified blob does not exist. ErrorCode: BlobNotFound

BlobNotFoundThe specified blob does not exist.
RequestId:f31a688f-601e-006a-5069-6a9417000000
Time:2024-02-28T17:11:52.0165015Z

[RFE] Create testing framework to run before changes are merged

Issue for creating some kind of testing framework or pipeline that can be used to ensure changes don't break the agent's functionality ahead of merging them into main.

Ideas discussed so far:

Using a tool like mkosi to build an image locally , then run it as a VM and wiring that up to some IMDS server run locally and test functionality.
Quite literally run the "manual" testing mechanism via some kind of github actions pipeline (spin up VM, run the changes and the image creation script, and then check to see that any VM created with the subsequent image provisions successfully)

[RFE] consumable library for integration with other provisioning agents

Current situation

The provisioning agent implements all features in a single application.

Impact

Re-using the agent's functionality from other agents like e.g. Afterburn is challenging.
This impacts adoption of the agent by the larger Linux ecosystem and hinders its mission of being a reference implementation for a minimal Azure provisioning guest agent.

Ideal future situation

The provisioning agent provides a library (rust crate) which is consumable by other projects. (#39)
The library is built and tested in CI, versioned releases are published regularly.

Follow-up tasks

The library is integrated with Afterburn (possibly in a WIP fork).
Afterburn using the library is integrated with Flatcar to move away from wa-agent supplied guest configuration.
Extended Afterburn support of Azure using the library is contributed upstream.

[RFE] passwordless sudo

I believe currently the admin user is configured to be part of sudoers group, but not passwordless sudo. If we want to maintain consistency with current behaviors from cloud-init and walinuxagent, this should be set. We can also provide this as configurable option (#65)

Add robust error handling

Include proper handling of various timeouts for IMDS and wireserver communications

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

staging : This repository will ship as Open Source or go public

collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.

delete : This repository will be deleted because it is no longer needed.

other : Other reasons not specified

Need more help? 🖐️

Email [email protected]. ✉️
Post your questions in GitHub inside Microsoft Team in Microsoft Teams. 🗨️

Test coverage for all endorsed Linux distributions on Azure

Fully test azure-init on all endorsed Linux distributions on Azure

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

staging : This repository will ship as Open Source or go public

collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.

delete : This repository will be deleted because it is no longer needed.

other : Other reasons not specified

Need more help? 🖐️

Email [email protected]. ✉️
Post your questions in GitHub inside Microsoft Team in Microsoft Teams. 🗨️

Report KVP telemetry to assist with troubleshooting

[RFE] Support pre-provisioning on Azure

Current situation

Currently azure-init does not support Azure pre-provisioning.

Impact

Lack of support for pre-provisioning will affect adoption as it does not allow the customer to benefit from better reliability and performance gained from Azure pre-provisioning service.

Ideal future situation

Add support for Azure pre-provisioning.

Additional information

Currently the agent does not support Azure pre-provisioning (PPS). Since PPS is not a customer-facing feature, there's no public documentation for it, unfortunately.
cloud-init started supporting Azure PPS since version 18.5. Here's the commit that started the support: link
Other relevant commits as reference:
Network event detection support via netlink: commit
PPS v2 support via nic attach/detach: commit

[RFE] Send user-agent as part of header for IMDS calls

Current situation

Currently azure-init does not set the user agent (User-Agent) string when communicating to IMDS.

Impact

Sending user agent header will make it easier to track the request from provisioning from IMDS, which will allow Azure to troubleshoot provisioning issues faster

Ideal future situation

Send user agent string as part of header. When sending requests to IMDS, send the user agent azure-init (ideally with the version) as part of the property "User-Agent"

[RFE] Support for custom data (or document why it is not supported)

Current situation

azure-init does not do anything with custom-data. See this article for custom-data usage on Azure.

Impact

Unlike user-data, custom-data is only available during provisioning phase because it's only available in the ovf-env.xml file. If the provisioning agent doesn't take action on custom-data, it will not be available again.

Ideal future situation

Provide some options for how customers want to handle custom-data. A couple options

We save custom-data to a file and the customer chooses what to do with it.
We assume it's a script and execute custom-data like a script.
Detect that custom-data exists and print a WARNING message to recommend that customer use user-data instead.

**Implementation options

[ Optional: please provide one or more options for implementing the feature requested ]

Additional information

Differences between custom-data and user-data: custom-data is only available to privileged users as it's only available in the ovf-env.xml in a limited window during provisioning (the ovf-env.xml is only available through mounting of the provisioning ISO, which is a privileged action). user-data is available throughout the lifetime of the VM, and is accessible by anyone who has access to the VM. user-data can be updated throughout the lifetime of the VM while custom-data cannot be updated.

[RFE] Report failures to Azure when there's an unrecoverable error

Current situation

Currently azure-init does not report any failure to Azure. If it can't finish provisioning, it will return with an error code. From a user perspective, provisioning will eventually fail with OS provisioning timeout due to Azure platform not receiving a provisioning complete signal.

In many cases the user might not be able to access the VM if provisioning fails and as such, might have a very hard time figuring out why provisioning failed

Ideal future situation

Have the azure-init report failures to Azure, which will then fail provisioning with a useful error message indicating why provisioning failed.

**Implementation options

These are not two mutually exclusive options, but rather complimenting each other.

Use wireserver to report errors to the platform. Here is how cloud-init is doing it. Essentially azure-init will need to construct a health report similar to reporting provisioning complete, but indicating the report status as NotReady, a substatus of ProvisioningFailed, and a meaningful description that will eventually show up as an error message back to the user.
I would strongly encourage azure-init to follow the error messages used by cloud-init, because we have post-processing, monitoring, and alerting mechanism built around the errors returned by cloud-init.
A sample error returned by cloud-init

result=error|reason=http error querying IMDS|agent=Cloud-Init/23.3.3-0ubuntu0~20.04.1|http_code=410|duration=300.2051315307617|'exception=UrlError(''410 Client Error: Gone for url: http://169.254.169.254/metadata/instance?api-version=2021-08-01&extended=true'')'|url=http://169.254.169.254/metadata/instance?api-version=2021-08-01&extended=true|vm_id=e76f68ac-04a8-4069-be7c-7f04b01f520f|timestamp=2024-03-12T09:39:16.373226|documentation_url=https://aka.ms/linuxprovisioningerror

The failure reporting via wireserver only works if azure-init can establish communication to wireserver and can successfully post the error. In the cases where it's not working, the other option is to write a KVP with the error and Azure platform will process it.
See cloud-init implementation as reference

[RFE] decouple mount/ovf logic out of get_username

Current situation

@cjp256 pointed out in a comment that the mount/ovf logic is in get_username().
That is not ideal, as CDROM mount should be independent of authentication via IMDS.

Ideal future situation

We should consider decoupling the mount/ovf logic out of get_username().

Best usage practices: Starting early in boot

Current situation

The systemd unit provided looks like it would run in the final system.

Impact

The instance configuration can be racy because while the system is set up by the agent other services will already start and, e.g., an SSH provisioning helper or cloud-init would race with the agent.

Debatable whether this is desired behavior: The current usage also overwrites a static hostname because the unit is running late and because the unit runs at every boot and ignores a previously set static hostname.

Ideal future situation

Provide a systemd unit that runs in the initrd, and possibly a dracut module to pull it in. This is how Afterburn is used, too, e.g., when setting up the hostname. Similar is also how Ignition is used, which does the creation of user accounts from the initrd.

Then document how the unit should be installed in the initrd and that this is the recommended way compared to a unit on the final system.

**Implementation options

Additional information

Create script to run some basic e2e tests on a newly spun up Azure VM

Script should:

Use az cli to spin up a new Ubuntu VM
Once the VM is running, scp a locally built binary to the VM
Install that binary
Run basic e2e tests on the VM (either from the local dev machine or the VM itself)

Ensure chpasswd and useradd work with FreeBSD

Vincenzo mentioned that certain smaller linux distros may not support commands like chpasswd and useradd. The code should have some way of checking if these commands work before running them through process::Command

[RFE] Output log to a known (perhaps configurable) location

All logs from azure-init currently are sent to console/journal log. We should channel logs to a file (INFO level logs can go to both file and console) so that it's easier to investigate issue should we end up having a complex issue.

Log can default to /var/log/azure-init.log. The exact file path should be configurable (once we support config).

Add MIT Headers

Add MIT license header to each source file

VM provisioning succeeds even when azure-init returns error

Description

From the testing done in #57 it looks like provisioning still succeeds even when azure-init was returning error. The issue needs investigation to avoid false testing results.

[RFE] Don't hardcode distros

Current situation

Only select distros are supported with code like Distributions::Debian | Distributions::Ubuntu => {.

Impact

Most distros don't work.

Ideal future situation

Instead of detecting the distro and hardcoding cases, detect whether a needed command is there. E.g., if hostnamectl is there, use it, if not, write to /etc/hostname directly. Similar for adding users: While there are preferred user commands for some distros, there are common ones to use otherwise, and when nothing was found, writing to /etc/passwd could be tried.

**Implementation options

Additional information

[RFE] Disable provisioning with password

Current situation

azure-init allows customers to provision Linux VMs with an admin password.

Impact

Password is not as secure as ssh-key. Using password leaves the VM more vulnerable to brute-force attack.

Ideal future situation

Not supporting password provisioning.

**Implementation options

A couple options

Disable password support completely. Note that Azure does allow customers to provide password to provision VM. In that case azure-init should fail provisioning if password is given.
Allow the customer to choose to keep password support as a compile-time configurable option (but disable it by default)

[RFE] Select a MSRV

Current situation

Prior to #84 we didn't declare a Minimum Supported Rust Version. Based on #92 1.76 is too new. We should select a MSRV deemed acceptably old and add it to CI to ensure we work with it.

Impact

While it's fairly easy to get the latest versions of Rust via rustup, distributions often lag behind current Rust releases by a lot. It's possible to build with a new toolchain and use the result on older distros, but if we want to build with the distribution toolchains we'll need to be more conservative about using new stdlib features.

Implementation options

Debian 12 looks to ship 1.63. I believe the latest RHEL 9 minor version includes 1.75.

If the goal is to build with distro-shipped toolchains we'll need to go back to at least 1.63 which shipped in August of 2022.

If we're okay not building with distro toolchains we could pick whatever the project currently builds with and call it a day. That way we are at least explicit about what we support.

[BUG] media::get_environment attempts to locate a non-existent CDROM device on AzureLinux

Description

When testing #92 on Azure Linux 3.0, azure-init encounters an issue attempting to retrieve block devices using media::get_environment(). This causes azure-init to report an error and exit pre-maturely.

Impact

This is part of work to get azure-init supported on Azure Linux 3.0 (as well as many other distros)

Environment and steps to reproduce

Set-up:

Boot an Azure Linux 3.0 VM, if you need an image, please reach out.

Action(s):

tdnf install libudev-devel git -y
git clone https://github.com/SeanDougherty/azure-init.git
cd azure-init
git checkout azl
cargo build --all
./target/debug/azure-init

Error:
Unable to get list of block devices

Expected behavior

Azure-init can get a mountable device for its use.

Additional information

I explored more in my branch, and I can see that there are devices available, they just might not be CDROM devices. (See photo)

For reference, this JSON is the configuration file used by AzL build tools to compose the image. You can see the devices enumerated under Disks:.

libazureinit needs to update the sshd config when it provisions a user with password

Description

Most images by default have sshd's config PasswordAuthentication set to no. If the user indicates that disablePasswordAuthentication should be "False", we should update this field to true by writing an additional config to /etc/ssh/sshd_config.d/

Keep in mind that if the service starts later in the process, updating this config might require a restart of ssh service

[RFE] Interaction with Linux Guest/Extension Agent (walinuxagent)

Current situation

Currently walinuxagent checks whether cloud-init is enabled. If yes, it will wait for cloud-init to finish provisioning. Otherwise, it will proceed with provisioning.

Impact

If walinuxagent is install/enabled in the system it will race with azure-init to accomplish provisioning.

Ideal future situation

walinuxagent should not assume provisioning if azure-init is expected to do provisioning.

**Implementation options

We should work with walinuxagent team to determine the best approach here. One option is to override the configuration in /etc/waagent.conf to ask walinuxagent to not provision at all.

Additional information

Add instructions for compiling and running this agent to the README

Instructions should cover what someone would need to do from a vanilla Ubuntu VM, including:

Installing Rust
Pulling down the source code
Building and running the source code
Running some basic tests

Additions to README file

Difference between this project and cloud-init
Trademark notice

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

[RFE]: imds test coverage for functional_tests

Current situation

functional_tests doesn't have coverage for imds module

Impact

Lack of testing for changes against imds

Ideal future situation

Adequate test coverage for imds within functional tests

Understand Type mismatch for coref 'Clause' vs 'Description'

User > find me coref tests from 2022 that contain Locking Ring in the description 
Copilot > I'm sorry, but there seems to be no coref tests from 2022 that contain "Locking Ring" in the description.

Clause is incorrectly described in the schema, but this highlighted a bug - it should have queried 'description' but then it came back with type mismatch...

KDBIE Request:

{"table":"coref","startTS":"2022-01-01T00:00:00","endTS":"2023-01-01T00:00:00","filter":[["like","Clause",["%Locking Ring%"]]]}

    "header": {
        "rcvTS": "2024-04-11T15:46:39.893000000",
        "corr": "dcfe3b44-0acc-4505-90a0-99886a8e0d55",
        "protocol": "gw",
        "logCorr": "dcfe3b44-0acc-4505-90a0-99886a8e0d55",
        "client": ":10.0.131.15:5050",
        "http": "json",
        "api": ".kxi.getData",
        "userName": "automation-client",
        "userID": "018cd2cf-3c31-4495-97f5-f9357f1afeac",
        "retryCount": 0,
        "to": "2024-04-11T15:47:39.893000000",
        "agg": ":10.0.132.155:5070",
        "pvVer": 135,
        "rpID": 0,
        "refVintage": -9223372036854775807,
        "rc": 10,
        "ac": 10,
        "ai": "Unexpected error (type_mismatch) encountered executing .kxi.getData"
    },
    "payload": []
}

KDBIE Request:

{"table":"coref","startTS":"2022-01-01T00:00:00","endTS":"2023-01-01T00:00:00","filter":[["like","Description",["*Locking Ring*"]]]}

{
    "header": {
        "rcvTS": "2024-04-11T15:48:36.883000000",
        "corr": "cb9ad89a-f700-48bc-87a4-b3114ab8a4b4",
        "protocol": "gw",
        "logCorr": "cb9ad89a-f700-48bc-87a4-b3114ab8a4b4",
        "client": ":10.0.132.50:5050",
        "http": "json",
        "api": ".kxi.getData",
        "userName": "automation-client",
        "userID": "018cd2cf-3c31-4495-97f5-f9357f1afeac",
        "retryCount": 0,
        "to": "2024-04-11T15:49:36.883000000",
        "agg": ":10.0.132.29:5070",
        "pvVer": 135,
        "rpID": 0,
        "refVintage": -9223372036854775807,
        "rc": 10,
        "ac": 10,
        "ai": "Unexpected error (type_mismatch) encountered executing .kxi.getData"
    },
    "payload": []
}

[RFE] Setup ephemeral networking to report provisioning complete

Current situation

Currently azure-init relies on the VM's network to pull provisioning metadata from IMDS and to report provisioning complete to Azure platform

Impact

azure-init is vulnerable to guest network issues that could prevent it from getting provisioning metadata and reporting provisioning complete. The most common scenario is the default route being setup over non-primary interface, which will block traffic to IMDS/wireserver. There will also be other issues such as handling the VF (Virtual Function) network interface for VMs with accelerated networking enabled.

Ideal future situation

azure-init should setup an ephemeral DHCP lease over the correct primary nic to pull information from IMDS and also uses the same lease to report provisioning complete. The lease should be released once done.

Update crates h2, regex, thread_local, term

cargo audit gives 3 security vulnerabilities and 1 warning.
As this repo does not have an automated dependabot alert, a manual update of the packages is necessary.

    Fetching advisory database from `https://github.com/RustSec/advisory-db.git`
      Loaded 605 security advisories (from /home/dpark/.cargo/advisory-db)
    Updating crates.io index
    Scanning Cargo.lock for vulnerabilities (158 crate dependencies)
Crate:     h2
Version:   0.3.21
Title:     Resource exhaustion vulnerability in h2 may lead to Denial of Service (DoS)
Date:      2024-01-17
ID:        RUSTSEC-2024-0003
URL:       https://rustsec.org/advisories/RUSTSEC-2024-0003
Solution:  Upgrade to ^0.3.24 OR >=0.4.2
Dependency tree:
h2 0.3.21
├── reqwest 0.11.22
│   └── libazureinit 0.1.1
│       └── azure-init 0.1.1
└── hyper 0.14.27
    └── reqwest 0.11.22

Crate:     regex
Version:   0.2.11
Title:     Regexes with large repetitions on empty sub-expressions take a very long time to parse
Date:      2022-03-08
ID:        RUSTSEC-2022-0013
URL:       https://rustsec.org/advisories/RUSTSEC-2022-0013
Severity:  7.5 (high)
Solution:  Upgrade to >=1.5.5
Dependency tree:
regex 0.2.11
├── rustfmt 0.10.0
│   └── libazureinit 0.1.1
│       └── azure-init 0.1.1
└── env_logger 0.4.3
    └── rustfmt 0.10.0

Crate:     thread_local
Version:   0.3.6
Title:     Data race in `Iter` and `IterMut`
Date:      2022-01-23
ID:        RUSTSEC-2022-0006
URL:       https://rustsec.org/advisories/RUSTSEC-2022-0006
Solution:  Upgrade to >=1.1.4
Dependency tree:
thread_local 0.3.6
└── regex 0.2.11
    ├── rustfmt 0.10.0
    │   └── libazureinit 0.1.1
    │       └── azure-init 0.1.1
    └── env_logger 0.4.3
        └── rustfmt 0.10.0

Crate:     term
Version:   0.4.6
Warning:   unmaintained
Title:     term is looking for a new maintainer
Date:      2018-11-19
ID:        RUSTSEC-2018-0015
URL:       https://rustsec.org/advisories/RUSTSEC-2018-0015
Dependency tree:
term 0.4.6
├── syntex_errors 0.59.1
│   ├── syntex_syntax 0.59.1
│   │   └── rustfmt 0.10.0
│   │       └── libazureinit 0.1.1
│   │           └── azure-init 0.1.1
│   └── rustfmt 0.10.0
└── rustfmt 0.10.0

error: 3 vulnerabilities found!
warning: 1 allowed warning found

[RFE] azure-init should add retries around IMDS and Wireserver operations

Current situation

There's no retry when REST API calls to IMDS or wireserver (goal_state, report_health)

Impact

Without retry, if there's a transient issue from Azure platform, provisioning will fail

Additional information

When to retry and how many times/how long to retry is a complex topic, especially when IMDS/Wireserver does not provide any guidance. This is the current behavior from cloud-init (ref, ref), which we can use as a reference (or perhaps we can provide this as a config that can be configured within the image? e.g., /etc/azure-init/azure-init.conf)

Total retrying time for IMDS should total no more than 5 minutes, for Wireserver 20 minutes.
Retry around Connection timeout/Read timeout: timeout for rest call should be set at 30s
Retry around non-200 http error codes (410, 404, 503, 400, 500, 429): timeout should be set at 2s, with backoff of 1s

azure-init should not assume provisioning media is always at /dev/sr0

Description

Currently azure-init assumes that /dev/sr0 location is where the provisioning iso will surface. While this is true many of the times, it's not always the case. For example, in FreeBSD it will be at /dev/cd0. In some special environment, it might show up as /dev/vda1

The right mechanism to find the device is to enumerate block devices and process all devices that have fstype of iso9660 and udf. Because /dev/sr0 will be the correct choice for > 99% of cases, we might default to /dev/sr0 and fall back to enumeration if /dev/sr0 isn't the right choice. See cloud-init handling of this issue for reference

Impact

azure-init won't be able to mount the provisioning iso if the iso isn't showing up at /dev/sr0

Expected behavior

azure-init should be able to find the provisioning iso in all Azure environments

Add GitHub Actions for CI

Should run unit, functional (if applicable), and e2e tests

Basic GitHub Actions workflow with cargo build (#34)
Run unit tests (#34)
Run cargo fmt for coding style (#34)
Run cargo clippy for linting (#34)
Run functional tests
Run e2e tests

Report basic failures to the platform

[RFE] improve way how build-time configuration variables are handled

Quote from a comment by @jeremycline:

I don't think we should have these configuration options at all. The operating systems already provide a well-known tunable for executable discovery (PATH) so providing a second way to do it feels unnecessary and potentially surprising.

If we add support for other tools some settings becomes irrelevant and confusing. I don't think FreeBSD has hostnamectl. From a library-user perspective, set_hostname() isn't abstracting how it sets the hostname as it takes path_hostnamectl as an argument, so it's not clear how the API can accommodate alternate tools. I think it would be better to drop these environment variables and document that tools need to be on the PATH.

The question remains on how you want to handle the library side of this. Does the library API let you select a backend to use, or do we make users select at compile time? I think it makes sense to design the API to allow callers to select a backend if we're providing a library.

Reconsider use of structs for serde XML parsing

Other options may be better:
https://crates.io/crates/libxml
sxd_xpath - Rust (docs.rs)
https://kwarc.github.io/rust-libxml/libxml/xpath/index.html

[RFE] improve image creation script to support multiple base image choices

Current situation

image_creation test script was built based on limited testing with Ubuntu as the base image with some image-specific assumptions (e.g., netplan networking). This might cause issues when used with other base images.

Impact

Testing might have undefined/unexpected behaviors when used with other distros as base images.

Ideal future situation

Test script should work against different distros as base images. It also should document which base images have been tested to work well.

Implementation options

Two important issues to handle:

Networking: we should identify which networking software is used with the existing image (ifup-down, netplan/systemd-networkd, systemd-networkd, NetworkManager, wicked, etc...) and ensure that the basic network configuration to perform dhcp on the primary interface is written since azure-init relies on primary nic's wireserver/imds routing to report provisioning complete.
Package management system: apt/yum/dnf to install required software and/or to remove existing packages (walinuxagent/cloud-init)