Code Monkey home page Code Monkey logo

azure-init's Introduction

Azure-Init

Github CI

A reference implementation for provisioning Linux VMs on Azure.

Azure-init configures Linux guests from provisioning metadata. Contrary to complex guest configuration and customisation systems like e.g. cloud-init, azure-init aims to be minimal. It strictly focuses on basic instance initialisation from Azure metadata.

Azure-init has very few requirements on its environment, so it may run in a very early stage of the boot process.

Installing Rust

To install Rust see here: https://www.rust-lang.org/tools/install.

Building the Project

Building this project can be done by going to the base of the repository in the command line and entering the command cargo build --all. This project contains two binaries, the main provisioning agent and the functional testing binary, so this command builds both. These binaries are quite small, but you can build only one by entering cargo build --bin <binary_name> and indicating either azure-init or functional_tests.

To run the program, you must enter the command cargo run --bin <binary_name> and indicating the correct binary.

Testing

There are two different sets of tests: unit tests and end-to-end (e2e tests). To run unit tests, use cargo test. To run end-to-end testing, use make e2e-test, which will create a test user, ssh directory, place mock ssh keys, and then clean up the test artifacts afterwards.

Contributing

Contribution require you to agree to Microsoft's Contributor License Agreement (CLA). Please refer to CONTRIBUTING.md for detailed instructions.

This project adheres to the Microsoft Open Source Code of Conduct. Check out CODE_OF_CONDUCT.md for a brief collection of links and references.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

libazureinit

For common library used by this reference implementation, please refer to libazureinit.

azure-init's People

Contributors

cadejacobson avatar dependabot[bot] avatar dongsupark avatar jeremycline avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar nellshamrell avatar peytonr18 avatar rata avatar t-lo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-init's Issues

[RFE] Output log to a known (perhaps configurable) location

All logs from azure-init currently are sent to console/journal log. We should channel logs to a file (INFO level logs can go to both file and console) so that it's easier to investigate issue should we end up having a complex issue.

Log can default to /var/log/azure-init.log. The exact file path should be configurable (once we support config).

[RFE] Provide a mechanism to configure behavior of azure-init at runtime

Currently this is not a major issue as we don't have a lot of configurable options. However, as we start adding more features to make azure-init more useful and robust (such as logging, retries), we might need to provide a way for the users to configure azure-init behavior (such as where to put logging, the level of logging that should go to console, how many minutes to retry, etc...)

We can consider providing a config file in /etc/azure-init/azure-init.conf

libazureinit needs to update the sshd config when it provisions a user with password

Description

Most images by default have sshd's config PasswordAuthentication set to no. If the user indicates that disablePasswordAuthentication should be "False", we should update this field to true by writing an additional config to /etc/ssh/sshd_config.d/

Keep in mind that if the service starts later in the process, updating this config might require a restart of ssh service

[RFE] Don't hardcode distros

Current situation

Only select distros are supported with code like Distributions::Debian | Distributions::Ubuntu => {.

Impact

Most distros don't work.

Ideal future situation

Instead of detecting the distro and hardcoding cases, detect whether a needed command is there. E.g., if hostnamectl is there, use it, if not, write to /etc/hostname directly. Similar for adding users: While there are preferred user commands for some distros, there are common ones to use otherwise, and when nothing was found, writing to /etc/passwd could be tried.

**Implementation options

Additional information

Update crates h2, regex, thread_local, term

cargo audit gives 3 security vulnerabilities and 1 warning.
As this repo does not have an automated dependabot alert, a manual update of the packages is necessary.

    Fetching advisory database from `https://github.com/RustSec/advisory-db.git`
      Loaded 605 security advisories (from /home/dpark/.cargo/advisory-db)
    Updating crates.io index
    Scanning Cargo.lock for vulnerabilities (158 crate dependencies)
Crate:     h2
Version:   0.3.21
Title:     Resource exhaustion vulnerability in h2 may lead to Denial of Service (DoS)
Date:      2024-01-17
ID:        RUSTSEC-2024-0003
URL:       https://rustsec.org/advisories/RUSTSEC-2024-0003
Solution:  Upgrade to ^0.3.24 OR >=0.4.2
Dependency tree:
h2 0.3.21
├── reqwest 0.11.22
│   └── libazureinit 0.1.1
│       └── azure-init 0.1.1
└── hyper 0.14.27
    └── reqwest 0.11.22

Crate:     regex
Version:   0.2.11
Title:     Regexes with large repetitions on empty sub-expressions take a very long time to parse
Date:      2022-03-08
ID:        RUSTSEC-2022-0013
URL:       https://rustsec.org/advisories/RUSTSEC-2022-0013
Severity:  7.5 (high)
Solution:  Upgrade to >=1.5.5
Dependency tree:
regex 0.2.11
├── rustfmt 0.10.0
│   └── libazureinit 0.1.1
│       └── azure-init 0.1.1
└── env_logger 0.4.3
    └── rustfmt 0.10.0

Crate:     thread_local
Version:   0.3.6
Title:     Data race in `Iter` and `IterMut`
Date:      2022-01-23
ID:        RUSTSEC-2022-0006
URL:       https://rustsec.org/advisories/RUSTSEC-2022-0006
Solution:  Upgrade to >=1.1.4
Dependency tree:
thread_local 0.3.6
└── regex 0.2.11
    ├── rustfmt 0.10.0
    │   └── libazureinit 0.1.1
    │       └── azure-init 0.1.1
    └── env_logger 0.4.3
        └── rustfmt 0.10.0

Crate:     term
Version:   0.4.6
Warning:   unmaintained
Title:     term is looking for a new maintainer
Date:      2018-11-19
ID:        RUSTSEC-2018-0015
URL:       https://rustsec.org/advisories/RUSTSEC-2018-0015
Dependency tree:
term 0.4.6
├── syntex_errors 0.59.1
│   ├── syntex_syntax 0.59.1
│   │   └── rustfmt 0.10.0
│   │       └── libazureinit 0.1.1
│   │           └── azure-init 0.1.1
│   └── rustfmt 0.10.0
└── rustfmt 0.10.0

error: 3 vulnerabilities found!
warning: 1 allowed warning found

Ensure chpasswd and useradd work with FreeBSD

Vincenzo mentioned that certain smaller linux distros may not support commands like chpasswd and useradd. The code should have some way of checking if these commands work before running them through process::Command

image creation script needs to retry for storage blob access error when trying to retrieve boot diagnostic

The error the script throws when it times out during storage blob access is:

az vm boot-diagnostics get-boot-log -g testagent-1709140193 -n testvm-1709140193
ERROR: Client-Request-ID=7771dfa0-d65c-11ee-bf75-95e10ccbd332 Retry policy did not allow for a retry: Server-Timestamp=Wed, 28 Feb 2024 17:11:51 GMT, Server-Request-ID=f31a688f-601e-006a-5069-6a9417000000, HTTP status code=404, Exception=The specified blob does not exist. ErrorCode: BlobNotFoundBlobNotFoundThe specified blob does not exist.RequestId:f31a688f-601e-006a-5069-6a9417000000Time:2024-02-28T17:11:52.0165015Z.
ERROR: The specified blob does not exist. ErrorCode: BlobNotFound

BlobNotFoundThe specified blob does not exist.
RequestId:f31a688f-601e-006a-5069-6a9417000000
Time:2024-02-28T17:11:52.0165015Z

[RFE]: imds test coverage for functional_tests

Current situation

functional_tests doesn't have coverage for imds module

Impact

Lack of testing for changes against imds

Ideal future situation

Adequate test coverage for imds within functional tests

Add GitHub Actions for CI

Should run unit, functional (if applicable), and e2e tests

  • Basic GitHub Actions workflow with cargo build (#34)
  • Run unit tests (#34)
  • Run cargo fmt for coding style (#34)
  • Run cargo clippy for linting (#34)
  • Run functional tests
  • Run e2e tests

[RFE] consumable library for integration with other provisioning agents

Current situation

The provisioning agent implements all features in a single application.

Impact

Re-using the agent's functionality from other agents like e.g. Afterburn is challenging.
This impacts adoption of the agent by the larger Linux ecosystem and hinders its mission of being a reference implementation for a minimal Azure provisioning guest agent.

Ideal future situation

  • The provisioning agent provides a library (rust crate) which is consumable by other projects. (#39)
  • The library is built and tested in CI, versioned releases are published regularly.

Follow-up tasks

  • The library is integrated with Afterburn (possibly in a WIP fork).
  • Afterburn using the library is integrated with Flatcar to move away from wa-agent supplied guest configuration.
  • Extended Afterburn support of Azure using the library is contributed upstream.

Understand Type mismatch for coref 'Clause' vs 'Description'

User > find me coref tests from 2022 that contain Locking Ring in the description 
Copilot > I'm sorry, but there seems to be no coref tests from 2022 that contain "Locking Ring" in the description.

Clause is incorrectly described in the schema, but this highlighted a bug - it should have queried 'description' but then it came back with type mismatch...

KDBIE Request:

{"table":"coref","startTS":"2022-01-01T00:00:00","endTS":"2023-01-01T00:00:00","filter":[["like","Clause",["%Locking Ring%"]]]}
    "header": {
        "rcvTS": "2024-04-11T15:46:39.893000000",
        "corr": "dcfe3b44-0acc-4505-90a0-99886a8e0d55",
        "protocol": "gw",
        "logCorr": "dcfe3b44-0acc-4505-90a0-99886a8e0d55",
        "client": ":10.0.131.15:5050",
        "http": "json",
        "api": ".kxi.getData",
        "userName": "automation-client",
        "userID": "018cd2cf-3c31-4495-97f5-f9357f1afeac",
        "retryCount": 0,
        "to": "2024-04-11T15:47:39.893000000",
        "agg": ":10.0.132.155:5070",
        "pvVer": 135,
        "rpID": 0,
        "refVintage": -9223372036854775807,
        "rc": 10,
        "ac": 10,
        "ai": "Unexpected error (type_mismatch) encountered executing .kxi.getData"
    },
    "payload": []
}

KDBIE Request:

{"table":"coref","startTS":"2022-01-01T00:00:00","endTS":"2023-01-01T00:00:00","filter":[["like","Description",["*Locking Ring*"]]]}
{
    "header": {
        "rcvTS": "2024-04-11T15:48:36.883000000",
        "corr": "cb9ad89a-f700-48bc-87a4-b3114ab8a4b4",
        "protocol": "gw",
        "logCorr": "cb9ad89a-f700-48bc-87a4-b3114ab8a4b4",
        "client": ":10.0.132.50:5050",
        "http": "json",
        "api": ".kxi.getData",
        "userName": "automation-client",
        "userID": "018cd2cf-3c31-4495-97f5-f9357f1afeac",
        "retryCount": 0,
        "to": "2024-04-11T15:49:36.883000000",
        "agg": ":10.0.132.29:5070",
        "pvVer": 135,
        "rpID": 0,
        "refVintage": -9223372036854775807,
        "rc": 10,
        "ac": 10,
        "ai": "Unexpected error (type_mismatch) encountered executing .kxi.getData"
    },
    "payload": []
}

[RFE] Support for custom data (or document why it is not supported)

Current situation

azure-init does not do anything with custom-data. See this article for custom-data usage on Azure.

Impact

Unlike user-data, custom-data is only available during provisioning phase because it's only available in the ovf-env.xml file. If the provisioning agent doesn't take action on custom-data, it will not be available again.

Ideal future situation

Provide some options for how customers want to handle custom-data. A couple options

  1. We save custom-data to a file and the customer chooses what to do with it.
  2. We assume it's a script and execute custom-data like a script.
  3. Detect that custom-data exists and print a WARNING message to recommend that customer use user-data instead.

**Implementation options

[ Optional: please provide one or more options for implementing the feature requested ]

Additional information

Differences between custom-data and user-data: custom-data is only available to privileged users as it's only available in the ovf-env.xml in a limited window during provisioning (the ovf-env.xml is only available through mounting of the provisioning ISO, which is a privileged action). user-data is available throughout the lifetime of the VM, and is accessible by anyone who has access to the VM. user-data can be updated throughout the lifetime of the VM while custom-data cannot be updated.

[RFE] Support pre-provisioning on Azure

Current situation

Currently azure-init does not support Azure pre-provisioning.

Impact

Lack of support for pre-provisioning will affect adoption as it does not allow the customer to benefit from better reliability and performance gained from Azure pre-provisioning service.

Ideal future situation

Add support for Azure pre-provisioning.

Additional information

Currently the agent does not support Azure pre-provisioning (PPS). Since PPS is not a customer-facing feature, there's no public documentation for it, unfortunately.
cloud-init started supporting Azure PPS since version 18.5. Here's the commit that started the support: link
Other relevant commits as reference:
Network event detection support via netlink: commit
PPS v2 support via nic attach/detach: commit

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Best usage practices: Starting early in boot

Current situation

The systemd unit provided looks like it would run in the final system.

Impact

The instance configuration can be racy because while the system is set up by the agent other services will already start and, e.g., an SSH provisioning helper or cloud-init would race with the agent.

Debatable whether this is desired behavior: The current usage also overwrites a static hostname because the unit is running late and because the unit runs at every boot and ignores a previously set static hostname.

Ideal future situation

Provide a systemd unit that runs in the initrd, and possibly a dracut module to pull it in. This is how Afterburn is used, too, e.g., when setting up the hostname. Similar is also how Ignition is used, which does the creation of user accounts from the initrd.

Then document how the unit should be installed in the initrd and that this is the recommended way compared to a unit on the final system.

**Implementation options

Additional information

[RFE] azure-init should add retries around IMDS and Wireserver operations

Current situation

There's no retry when REST API calls to IMDS or wireserver (goal_state, report_health)

Impact

Without retry, if there's a transient issue from Azure platform, provisioning will fail

Additional information

When to retry and how many times/how long to retry is a complex topic, especially when IMDS/Wireserver does not provide any guidance. This is the current behavior from cloud-init (ref, ref), which we can use as a reference (or perhaps we can provide this as a config that can be configured within the image? e.g., /etc/azure-init/azure-init.conf)

Total retrying time for IMDS should total no more than 5 minutes, for Wireserver 20 minutes.
Retry around Connection timeout/Read timeout: timeout for rest call should be set at 30s
Retry around non-200 http error codes (410, 404, 503, 400, 500, 429): timeout should be set at 2s, with backoff of 1s

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

SUPPORT.md

Update this file with expectations for support when reporting issues with the project, etc.

[RFE] Disable provisioning with password

Current situation

azure-init allows customers to provision Linux VMs with an admin password.

Impact

Password is not as secure as ssh-key. Using password leaves the VM more vulnerable to brute-force attack.

Ideal future situation

Not supporting password provisioning.

**Implementation options

A couple options

  1. Disable password support completely. Note that Azure does allow customers to provide password to provision VM. In that case azure-init should fail provisioning if password is given.
  2. Allow the customer to choose to keep password support as a compile-time configurable option (but disable it by default)

[RFE] Setup ephemeral networking to report provisioning complete

Current situation

Currently azure-init relies on the VM's network to pull provisioning metadata from IMDS and to report provisioning complete to Azure platform

Impact

azure-init is vulnerable to guest network issues that could prevent it from getting provisioning metadata and reporting provisioning complete. The most common scenario is the default route being setup over non-primary interface, which will block traffic to IMDS/wireserver. There will also be other issues such as handling the VF (Virtual Function) network interface for VMs with accelerated networking enabled.

Ideal future situation

azure-init should setup an ephemeral DHCP lease over the correct primary nic to pull information from IMDS and also uses the same lease to report provisioning complete. The lease should be released once done.

[RFE] decouple mount/ovf logic out of get_username

Current situation

@cjp256 pointed out in a comment that the mount/ovf logic is in get_username().
That is not ideal, as CDROM mount should be independent of authentication via IMDS.

Ideal future situation

We should consider decoupling the mount/ovf logic out of get_username().

libazureinit currently does not bounce DHCP to publish hostname to Azure platform DNS

Description

Due to starting late (after network-online), azure-init ends up setting the computer name after DHCP. This has the side effect of not registering the correct hostname to DNS (as the image's stale hostname was what got sent in DHCP).

Impact

Hostname lookup for this VM will fail for other VMs that are in the same VNET or part of VNET peering.

Additional Information

This won't be an issue if we solve #35. Otherwise we'll have to restart network stack (e.g., NetworkManager)

[RFE] passwordless sudo

I believe currently the admin user is configured to be part of sudoers group, but not passwordless sudo. If we want to maintain consistency with current behaviors from cloud-init and walinuxagent, this should be set. We can also provide this as configurable option (#65)

azure-init should not assume provisioning media is always at /dev/sr0

Description

Currently azure-init assumes that /dev/sr0 location is where the provisioning iso will surface. While this is true many of the times, it's not always the case. For example, in FreeBSD it will be at /dev/cd0. In some special environment, it might show up as /dev/vda1

The right mechanism to find the device is to enumerate block devices and process all devices that have fstype of iso9660 and udf. Because /dev/sr0 will be the correct choice for > 99% of cases, we might default to /dev/sr0 and fall back to enumeration if /dev/sr0 isn't the right choice. See cloud-init handling of this issue for reference

Impact

azure-init won't be able to mount the provisioning iso if the iso isn't showing up at /dev/sr0

Expected behavior

azure-init should be able to find the provisioning iso in all Azure environments

[RFE] Send user-agent as part of header for IMDS calls

Current situation

Currently azure-init does not set the user agent (User-Agent) string when communicating to IMDS.

Impact

Sending user agent header will make it easier to track the request from provisioning from IMDS, which will allow Azure to troubleshoot provisioning issues faster

Ideal future situation

Send user agent string as part of header. When sending requests to IMDS, send the user agent azure-init (ideally with the version) as part of the property "User-Agent"

[RFE] Interaction with Linux Guest/Extension Agent (walinuxagent)

Current situation

Currently walinuxagent checks whether cloud-init is enabled. If yes, it will wait for cloud-init to finish provisioning. Otherwise, it will proceed with provisioning.

Impact

If walinuxagent is install/enabled in the system it will race with azure-init to accomplish provisioning.

Ideal future situation

walinuxagent should not assume provisioning if azure-init is expected to do provisioning.

**Implementation options

We should work with walinuxagent team to determine the best approach here. One option is to override the configuration in /etc/waagent.conf to ask walinuxagent to not provision at all.

Additional information

CONTRIBUTING.md file

Create a CONTRIBUTING.md file

  • reference Microsoft CLA
  • guidelines for building, contributing, etc.
  • set expectations for review of pull request, responses to issues

[RFE] Report failures to Azure when there's an unrecoverable error

Current situation

Currently azure-init does not report any failure to Azure. If it can't finish provisioning, it will return with an error code. From a user perspective, provisioning will eventually fail with OS provisioning timeout due to Azure platform not receiving a provisioning complete signal.

In many cases the user might not be able to access the VM if provisioning fails and as such, might have a very hard time figuring out why provisioning failed

Ideal future situation

Have the azure-init report failures to Azure, which will then fail provisioning with a useful error message indicating why provisioning failed.

**Implementation options

These are not two mutually exclusive options, but rather complimenting each other.
 

  1. Use wireserver to report errors to the platform. Here is how cloud-init is doing it. Essentially azure-init will need to construct a health report similar to reporting provisioning complete, but indicating the report status as NotReady, a substatus of ProvisioningFailed, and a meaningful description that will eventually show up as an error message back to the user.
    I would strongly encourage azure-init to follow the error messages used by cloud-init, because we have post-processing, monitoring, and alerting mechanism built around the errors returned by cloud-init.
    A sample error returned by cloud-init

result=error|reason=http error querying IMDS|agent=Cloud-Init/23.3.3-0ubuntu0~20.04.1|http_code=410|duration=300.2051315307617|'exception=UrlError(''410 Client Error: Gone for url: http://169.254.169.254/metadata/instance?api-version=2021-08-01&extended=true'')'|url=http://169.254.169.254/metadata/instance?api-version=2021-08-01&extended=true|vm_id=e76f68ac-04a8-4069-be7c-7f04b01f520f|timestamp=2024-03-12T09:39:16.373226|documentation_url=https://aka.ms/linuxprovisioningerror

  1. The failure reporting via wireserver only works if azure-init can establish communication to wireserver and can successfully post the error. In the cases where it's not working, the other option is to write a KVP with the error and Azure platform will process it.
    See cloud-init implementation as reference

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.