azure / azure-init Goto Github PK
View Code? Open in Web Editor NEWA minimal provisioning agent designed for Azure Linux VMs.
License: MIT License
A minimal provisioning agent designed for Azure Linux VMs.
License: MIT License
azure-init assumes that the authorized key file is $HOME/.ssh/authorized_keys which might not be what the sshd is configured to look for (/etc/sshd/config/AuthorizedKeysFile)
azure-init might write the provided public key to a place that is not expected by sshd, which will render the VM inaccessible via the provided key
azure-init should process the sshd config to correctly determine where to write the authorized key file.
We likely need a module to handle various ssh related functionality. #67 will also benefit from such module
Due to starting late (after network-online), azure-init ends up setting the computer name after DHCP. This has the side effect of not registering the correct hostname to DNS (as the image's stale hostname was what got sent in DHCP).
Hostname lookup for this VM will fail for other VMs that are in the same VNET or part of VNET peering.
This won't be an issue if we solve #35. Otherwise we'll have to restart network stack (e.g., NetworkManager)
Running cargo test
results in failures like:
test media::tests::test_get_ovf_env_missing_password ... FAILED
test media::tests::test_get_ovf_env_missing_three ... FAILED
It should be fixed before working on GitHub Actions CI.
Package maintainers are unable to set default user groups during build time.
Distros vary on what groups they assign to users by default. For example, Ubuntu, grants the groups: adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, and video. While Azure Linux grants the groups: wheel and sudo.
The recent Builder style API revisions have opened up the ability to define custom user groups when provisioning a user, via the .with_groups() method. However, the solution is not exposed at build time for distro maintainers to utilize this new feature.
During build time of azure-init, distro maintainers would be able to specify a list of default groups via build arguments or environment variables.
Option 1)
Introduce a new env variable DEFAULT_USERADD_GROUPS
, that is ingested by azure-init's main.rs and pass this value into a call to .with_groups()
during provision()
. If DEFAULT_USERADD_GROUPS is empty, continue to default to the singular group "wheel".
Option 2)
Introduce a new build argument to the cargo build command, that is ingested by azure-init's main.rs and pass this value into a call to .with_groups()
during provision()
. If the build argument is empty, continue to default to the singular group "wheel".
Currently this is not a major issue as we don't have a lot of configurable options. However, as we start adding more features to make azure-init more useful and robust (such as logging, retries), we might need to provide a way for the users to configure azure-init behavior (such as where to put logging, the level of logging that should go to console, how many minutes to retry, etc...)
We can consider providing a config file in /etc/azure-init/azure-init.conf
Create a CONTRIBUTING.md file
Update this file with expectations for support when reporting issues with the project, etc.
The error the script throws when it times out during storage blob access is:
az vm boot-diagnostics get-boot-log -g testagent-1709140193 -n testvm-1709140193
ERROR: Client-Request-ID=7771dfa0-d65c-11ee-bf75-95e10ccbd332 Retry policy did not allow for a retry: Server-Timestamp=Wed, 28 Feb 2024 17:11:51 GMT, Server-Request-ID=f31a688f-601e-006a-5069-6a9417000000, HTTP status code=404, Exception=The specified blob does not exist. ErrorCode: BlobNotFoundBlobNotFoundThe specified blob does not exist.RequestId:f31a688f-601e-006a-5069-6a9417000000Time:2024-02-28T17:11:52.0165015Z.
ERROR: The specified blob does not exist. ErrorCode: BlobNotFound
BlobNotFoundThe specified blob does not exist.
RequestId:f31a688f-601e-006a-5069-6a9417000000
Time:2024-02-28T17:11:52.0165015Z
Issue for creating some kind of testing framework or pipeline that can be used to ensure changes don't break the agent's functionality ahead of merging them into main.
Ideas discussed so far:
The provisioning agent implements all features in a single application.
Re-using the agent's functionality from other agents like e.g. Afterburn is challenging.
This impacts adoption of the agent by the larger Linux ecosystem and hinders its mission of being a reference implementation for a minimal Azure provisioning guest agent.
I believe currently the admin user is configured to be part of sudoers group, but not passwordless sudo. If we want to maintain consistency with current behaviors from cloud-init and walinuxagent, this should be set. We can also provide this as configurable option (#65)
Include proper handling of various timeouts for IMDS and wireserver communications
In order to protect and secure Microsoft, private
or internal
repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).
✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.
❗Only users with admin
permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒
Reply with a comment on this issue containing one of the following optin
or optout
command options below.
✅ Opt-in to migrate
@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>
Example:
@gimsvc optin --date 03-15-2023
OR
❌ Opt-out of migration
@gimsvc optout --reason <staging|collaboration|delete|other>
Example:
@gimsvc optout --reason staging
Options:
staging
: This repository will ship as Open Source or gopublic
collaboration
: Used for external or 3rd party collaboration with customers, partners, suppliers, etc.delete
: This repository will be deleted because it is no longer needed.other
: Other reasons not specified
Fully test azure-init on all endorsed Linux distributions on Azure
In order to protect and secure Microsoft, private
or internal
repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).
✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.
❗Only users with admin
permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒
Reply with a comment on this issue containing one of the following optin
or optout
command options below.
✅ Opt-in to migrate
@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>
Example:
@gimsvc optin --date 03-15-2023
OR
❌ Opt-out of migration
@gimsvc optout --reason <staging|collaboration|delete|other>
Example:
@gimsvc optout --reason staging
Options:
staging
: This repository will ship as Open Source or gopublic
collaboration
: Used for external or 3rd party collaboration with customers, partners, suppliers, etc.delete
: This repository will be deleted because it is no longer needed.other
: Other reasons not specified
Currently azure-init does not support Azure pre-provisioning.
Lack of support for pre-provisioning will affect adoption as it does not allow the customer to benefit from better reliability and performance gained from Azure pre-provisioning service.
Add support for Azure pre-provisioning.
Currently the agent does not support Azure pre-provisioning (PPS). Since PPS is not a customer-facing feature, there's no public documentation for it, unfortunately.
cloud-init started supporting Azure PPS since version 18.5. Here's the commit that started the support: link
Other relevant commits as reference:
Network event detection support via netlink: commit
PPS v2 support via nic attach/detach: commit
Currently azure-init does not set the user agent (User-Agent) string when communicating to IMDS.
Sending user agent header will make it easier to track the request from provisioning from IMDS, which will allow Azure to troubleshoot provisioning issues faster
Send user agent string as part of header. When sending requests to IMDS, send the user agent azure-init (ideally with the version) as part of the property "User-Agent"
azure-init does not do anything with custom-data. See this article for custom-data usage on Azure.
Unlike user-data, custom-data is only available during provisioning phase because it's only available in the ovf-env.xml file. If the provisioning agent doesn't take action on custom-data, it will not be available again.
Provide some options for how customers want to handle custom-data. A couple options
[ Optional: please provide one or more options for implementing the feature requested ]
Differences between custom-data and user-data: custom-data is only available to privileged users as it's only available in the ovf-env.xml in a limited window during provisioning (the ovf-env.xml is only available through mounting of the provisioning ISO, which is a privileged action). user-data is available throughout the lifetime of the VM, and is accessible by anyone who has access to the VM. user-data can be updated throughout the lifetime of the VM while custom-data cannot be updated.
Currently azure-init does not report any failure to Azure. If it can't finish provisioning, it will return with an error code. From a user perspective, provisioning will eventually fail with OS provisioning timeout due to Azure platform not receiving a provisioning complete signal.
In many cases the user might not be able to access the VM if provisioning fails and as such, might have a very hard time figuring out why provisioning failed
Have the azure-init report failures to Azure, which will then fail provisioning with a useful error message indicating why provisioning failed.
These are not two mutually exclusive options, but rather complimenting each other.
result=error|reason=http error querying IMDS|agent=Cloud-Init/23.3.3-0ubuntu0~20.04.1|http_code=410|duration=300.2051315307617|'exception=UrlError(''410 Client Error: Gone for url: http://169.254.169.254/metadata/instance?api-version=2021-08-01&extended=true'')'|url=http://169.254.169.254/metadata/instance?api-version=2021-08-01&extended=true|vm_id=e76f68ac-04a8-4069-be7c-7f04b01f520f|timestamp=2024-03-12T09:39:16.373226|documentation_url=https://aka.ms/linuxprovisioningerror
The systemd unit provided looks like it would run in the final system.
The instance configuration can be racy because while the system is set up by the agent other services will already start and, e.g., an SSH provisioning helper or cloud-init would race with the agent.
Debatable whether this is desired behavior: The current usage also overwrites a static hostname because the unit is running late and because the unit runs at every boot and ignores a previously set static hostname.
Provide a systemd unit that runs in the initrd, and possibly a dracut module to pull it in. This is how Afterburn is used, too, e.g., when setting up the hostname. Similar is also how Ignition is used, which does the creation of user accounts from the initrd.
Then document how the unit should be installed in the initrd and that this is the recommended way compared to a unit on the final system.
Script should:
Vincenzo mentioned that certain smaller linux distros may not support commands like chpasswd
and useradd
. The code should have some way of checking if these commands work before running them through process::Command
All logs from azure-init currently are sent to console/journal log. We should channel logs to a file (INFO level logs can go to both file and console) so that it's easier to investigate issue should we end up having a complex issue.
Log can default to /var/log/azure-init.log. The exact file path should be configurable (once we support config).
Add MIT license header to each source file
From the testing done in #57 it looks like provisioning still succeeds even when azure-init was returning error. The issue needs investigation to avoid false testing results.
Only select distros are supported with code like Distributions::Debian | Distributions::Ubuntu => {
.
Most distros don't work.
Instead of detecting the distro and hardcoding cases, detect whether a needed command is there. E.g., if hostnamectl
is there, use it, if not, write to /etc/hostname
directly. Similar for adding users: While there are preferred user commands for some distros, there are common ones to use otherwise, and when nothing was found, writing to /etc/passwd
could be tried.
azure-init allows customers to provision Linux VMs with an admin password.
Password is not as secure as ssh-key. Using password leaves the VM more vulnerable to brute-force attack.
Not supporting password provisioning.
A couple options
Prior to #84 we didn't declare a Minimum Supported Rust Version. Based on #92 1.76 is too new. We should select a MSRV deemed acceptably old and add it to CI to ensure we work with it.
While it's fairly easy to get the latest versions of Rust via rustup, distributions often lag behind current Rust releases by a lot. It's possible to build with a new toolchain and use the result on older distros, but if we want to build with the distribution toolchains we'll need to be more conservative about using new stdlib features.
Debian 12 looks to ship 1.63. I believe the latest RHEL 9 minor version includes 1.75.
If the goal is to build with distro-shipped toolchains we'll need to go back to at least 1.63 which shipped in August of 2022.
If we're okay not building with distro toolchains we could pick whatever the project currently builds with and call it a day. That way we are at least explicit about what we support.
When testing #92 on Azure Linux 3.0, azure-init encounters an issue attempting to retrieve block devices using media::get_environment()
. This causes azure-init to report an error and exit pre-maturely.
This is part of work to get azure-init supported on Azure Linux 3.0 (as well as many other distros)
Boot an Azure Linux 3.0 VM, if you need an image, please reach out.
tdnf install libudev-devel git -y
git clone https://github.com/SeanDougherty/azure-init.git
cd azure-init
git checkout azl
cargo build --all
./target/debug/azure-init
Unable to get list of block devices
Azure-init can get a mountable device for its use.
I explored more in my branch, and I can see that there are devices available, they just might not be CDROM devices. (See photo)
For reference, this JSON is the configuration file used by AzL build tools to compose the image. You can see the devices enumerated under Disks:
.
Most images by default have sshd's config PasswordAuthentication set to no. If the user indicates that disablePasswordAuthentication should be "False", we should update this field to true by writing an additional config to /etc/ssh/sshd_config.d/
Keep in mind that if the service starts later in the process, updating this config might require a restart of ssh service
Currently walinuxagent checks whether cloud-init is enabled. If yes, it will wait for cloud-init to finish provisioning. Otherwise, it will proceed with provisioning.
If walinuxagent is install/enabled in the system it will race with azure-init to accomplish provisioning.
walinuxagent should not assume provisioning if azure-init is expected to do provisioning.
We should work with walinuxagent team to determine the best approach here. One option is to override the configuration in /etc/waagent.conf to ask walinuxagent to not provision at all.
Instructions should cover what someone would need to do from a vanilla Ubuntu VM, including:
There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.
Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.
functional_tests doesn't have coverage for imds module
Lack of testing for changes against imds
Adequate test coverage for imds within functional tests
User > find me coref tests from 2022 that contain Locking Ring in the description
Copilot > I'm sorry, but there seems to be no coref tests from 2022 that contain "Locking Ring" in the description.
Clause is incorrectly described in the schema, but this highlighted a bug - it should have queried 'description' but then it came back with type mismatch...
KDBIE Request:
{"table":"coref","startTS":"2022-01-01T00:00:00","endTS":"2023-01-01T00:00:00","filter":[["like","Clause",["%Locking Ring%"]]]}
"header": {
"rcvTS": "2024-04-11T15:46:39.893000000",
"corr": "dcfe3b44-0acc-4505-90a0-99886a8e0d55",
"protocol": "gw",
"logCorr": "dcfe3b44-0acc-4505-90a0-99886a8e0d55",
"client": ":10.0.131.15:5050",
"http": "json",
"api": ".kxi.getData",
"userName": "automation-client",
"userID": "018cd2cf-3c31-4495-97f5-f9357f1afeac",
"retryCount": 0,
"to": "2024-04-11T15:47:39.893000000",
"agg": ":10.0.132.155:5070",
"pvVer": 135,
"rpID": 0,
"refVintage": -9223372036854775807,
"rc": 10,
"ac": 10,
"ai": "Unexpected error (type_mismatch) encountered executing .kxi.getData"
},
"payload": []
}
KDBIE Request:
{"table":"coref","startTS":"2022-01-01T00:00:00","endTS":"2023-01-01T00:00:00","filter":[["like","Description",["*Locking Ring*"]]]}
{
"header": {
"rcvTS": "2024-04-11T15:48:36.883000000",
"corr": "cb9ad89a-f700-48bc-87a4-b3114ab8a4b4",
"protocol": "gw",
"logCorr": "cb9ad89a-f700-48bc-87a4-b3114ab8a4b4",
"client": ":10.0.132.50:5050",
"http": "json",
"api": ".kxi.getData",
"userName": "automation-client",
"userID": "018cd2cf-3c31-4495-97f5-f9357f1afeac",
"retryCount": 0,
"to": "2024-04-11T15:49:36.883000000",
"agg": ":10.0.132.29:5070",
"pvVer": 135,
"rpID": 0,
"refVintage": -9223372036854775807,
"rc": 10,
"ac": 10,
"ai": "Unexpected error (type_mismatch) encountered executing .kxi.getData"
},
"payload": []
}
Currently azure-init relies on the VM's network to pull provisioning metadata from IMDS and to report provisioning complete to Azure platform
azure-init is vulnerable to guest network issues that could prevent it from getting provisioning metadata and reporting provisioning complete. The most common scenario is the default route being setup over non-primary interface, which will block traffic to IMDS/wireserver. There will also be other issues such as handling the VF (Virtual Function) network interface for VMs with accelerated networking enabled.
azure-init should setup an ephemeral DHCP lease over the correct primary nic to pull information from IMDS and also uses the same lease to report provisioning complete. The lease should be released once done.
cargo audit
gives 3 security vulnerabilities and 1 warning.
As this repo does not have an automated dependabot alert, a manual update of the packages is necessary.
Fetching advisory database from `https://github.com/RustSec/advisory-db.git`
Loaded 605 security advisories (from /home/dpark/.cargo/advisory-db)
Updating crates.io index
Scanning Cargo.lock for vulnerabilities (158 crate dependencies)
Crate: h2
Version: 0.3.21
Title: Resource exhaustion vulnerability in h2 may lead to Denial of Service (DoS)
Date: 2024-01-17
ID: RUSTSEC-2024-0003
URL: https://rustsec.org/advisories/RUSTSEC-2024-0003
Solution: Upgrade to ^0.3.24 OR >=0.4.2
Dependency tree:
h2 0.3.21
├── reqwest 0.11.22
│ └── libazureinit 0.1.1
│ └── azure-init 0.1.1
└── hyper 0.14.27
└── reqwest 0.11.22
Crate: regex
Version: 0.2.11
Title: Regexes with large repetitions on empty sub-expressions take a very long time to parse
Date: 2022-03-08
ID: RUSTSEC-2022-0013
URL: https://rustsec.org/advisories/RUSTSEC-2022-0013
Severity: 7.5 (high)
Solution: Upgrade to >=1.5.5
Dependency tree:
regex 0.2.11
├── rustfmt 0.10.0
│ └── libazureinit 0.1.1
│ └── azure-init 0.1.1
└── env_logger 0.4.3
└── rustfmt 0.10.0
Crate: thread_local
Version: 0.3.6
Title: Data race in `Iter` and `IterMut`
Date: 2022-01-23
ID: RUSTSEC-2022-0006
URL: https://rustsec.org/advisories/RUSTSEC-2022-0006
Solution: Upgrade to >=1.1.4
Dependency tree:
thread_local 0.3.6
└── regex 0.2.11
├── rustfmt 0.10.0
│ └── libazureinit 0.1.1
│ └── azure-init 0.1.1
└── env_logger 0.4.3
└── rustfmt 0.10.0
Crate: term
Version: 0.4.6
Warning: unmaintained
Title: term is looking for a new maintainer
Date: 2018-11-19
ID: RUSTSEC-2018-0015
URL: https://rustsec.org/advisories/RUSTSEC-2018-0015
Dependency tree:
term 0.4.6
├── syntex_errors 0.59.1
│ ├── syntex_syntax 0.59.1
│ │ └── rustfmt 0.10.0
│ │ └── libazureinit 0.1.1
│ │ └── azure-init 0.1.1
│ └── rustfmt 0.10.0
└── rustfmt 0.10.0
error: 3 vulnerabilities found!
warning: 1 allowed warning found
There's no retry when REST API calls to IMDS or wireserver (goal_state, report_health)
Without retry, if there's a transient issue from Azure platform, provisioning will fail
When to retry and how many times/how long to retry is a complex topic, especially when IMDS/Wireserver does not provide any guidance. This is the current behavior from cloud-init (ref, ref), which we can use as a reference (or perhaps we can provide this as a config that can be configured within the image? e.g., /etc/azure-init/azure-init.conf)
Total retrying time for IMDS should total no more than 5 minutes, for Wireserver 20 minutes.
Retry around Connection timeout/Read timeout: timeout for rest call should be set at 30s
Retry around non-200 http error codes (410, 404, 503, 400, 500, 429): timeout should be set at 2s, with backoff of 1s
Currently azure-init assumes that /dev/sr0 location is where the provisioning iso will surface. While this is true many of the times, it's not always the case. For example, in FreeBSD it will be at /dev/cd0. In some special environment, it might show up as /dev/vda1
The right mechanism to find the device is to enumerate block devices and process all devices that have fstype of iso9660 and udf. Because /dev/sr0 will be the correct choice for > 99% of cases, we might default to /dev/sr0 and fall back to enumeration if /dev/sr0 isn't the right choice. See cloud-init handling of this issue for reference
azure-init won't be able to mount the provisioning iso if the iso isn't showing up at /dev/sr0
azure-init should be able to find the provisioning iso in all Azure environments
Quote from a comment by @jeremycline:
I don't think we should have these configuration options at all. The operating systems already provide a well-known tunable for executable discovery (PATH) so providing a second way to do it feels unnecessary and potentially surprising.
If we add support for other tools some settings becomes irrelevant and confusing. I don't think FreeBSD has hostnamectl. From a library-user perspective, set_hostname() isn't abstracting how it sets the hostname as it takes path_hostnamectl as an argument, so it's not clear how the API can accommodate alternate tools. I think it would be better to drop these environment variables and document that tools need to be on the PATH.
The question remains on how you want to handle the library side of this. Does the library API let you select a backend to use, or do we make users select at compile time? I think it makes sense to design the API to allow callers to select a backend if we're providing a library.
Other options may be better:
https://crates.io/crates/libxml
sxd_xpath - Rust (docs.rs)
https://kwarc.github.io/rust-libxml/libxml/xpath/index.html
image_creation test script was built based on limited testing with Ubuntu as the base image with some image-specific assumptions (e.g., netplan networking). This might cause issues when used with other base images.
Testing might have undefined/unexpected behaviors when used with other distros as base images.
Test script should work against different distros as base images. It also should document which base images have been tested to work well.
Two important issues to handle:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.