Code Monkey home page Code Monkey logo

hcat's Introduction

This package is unreleased, alpha quality that will have API breaking changes as we get it in shape. We'll do an official release when it is ready.

Hashicorp Configuration And Templating (hashicat) library

Go Reference ci

This library provides a means to fetch data managed by external services and render templates using that data. It also enables monitoring those services for data changes to trigger updates to the templates.

It currently supports Consul and Vault as data sources, but we expect to add more soon.

This library was originally based on the code from Consul-Template with a fair amount of refactoring.

Community Support

If you have questions about hashicat, its capabilities or anything other than a bug or feature request (use github's issue tracker for those), please see our community support resources.

Community portal: https://discuss.hashicorp.com/c/consul

Other resources: https://www.consul.io/community.html

Additionally, for issues and pull requests we'll be using the 👍 reactions as a rough voting system to help gauge community priorities. So please add 👍 to any issue or pull request you'd like to see worked on. Thanks.

Diagrams

While the primary documentation for Hashicat is intended to use official godocs formatting, I thought a few diagrams might help get some aspects across better and have been working on a few. I'm not great at it but with mermaid I'm hoping to incrementally improve them over time. Please feel free to file issues/PRs against them if you have ideas. Thanks.

Overview

These are some general attempts to get an high level view of what's going on with mixed results. Might be useful...

This diagram is kind of "thing" (struct) oriented. Showing the main structs and the contact points between them.

graph TB
    Watcher((Watcher))
    View[View]
    Template[Template]
    TemplateFunction[Template Function]
    Tracker[Tracker]
    Resolver[Resolver]
    Event[Event Notifier]
    Dependency[Dependency]
    Consul{Consul}
    Vault{Vault}

    Watcher --> Template
    Watcher --> Resolver
    Resolver --> Template
    Template --> TemplateFunction
    TemplateFunction --> Dependency
    Template --> Watcher
    Watcher --> View
    View --> Dependency
    Watcher --> Event
    Watcher --> Tracker
    Tracker --> View
    Dependency --> Vault
    Dependency --> Consul
Loading

This diagram was another attempt at the above but including more information on what the contact points are and the general flow of things. In it the squares are structs and the ovals are calls/things-happening.

flowchart TB
    NW([NewWatcher])
    W[Watcher]
    T[Templates]
    R([Register])
    TN[TrackedNotifers]
    TE([TemplatesEvaluated])
    TF[TemplateFunctions]
    D[Dependencies]
    Rc([Recaller])
    TD[TrackedDependencies]
    V[View]

    NW --> W
    T --> R --> W --> TN
    W --> TE --> TF
    TF --> D--> Rc
    D --> TD
    W --> Rc
    Rc --> V
    V --> W
    TD --- TN

Loading

Channels

This shows the main internal channels.

flowchart TB
    W[Watcher]
    V[View]
    Ti[Timer]

    V -. err-from-dependencies .-> W
    V -.data-from-dependencies.-> W
    Ti -.buffer-period.-> W
    W -.internal-stop.-> W
Loading

States

I thought a state diagram was a good idea until I realized there just aren't that many states.

stateDiagram-v2
    [*] --> Initialized
    Initialized --> NotifiersTracked: templates registered
    NotifiersTracked --> ResovingDependencies: templates run
    ResovingDependencies --> ResovingDependencies: templates run
    ResovingDependencies --> Watching: steady state achieved
    Watching --> ResovingDependencies: data updates
    Watching --> [*]: stop
Loading

Template.Execute() Flow

This is probably one of the more useful diagrams, dipicting the call flow of a Template execution. Note that "Dirty" is a term I swiped from filesystems, it denotes that some data that the template uses has been changed.

flowchart TB
    Start --> Execute
    Execute --> D{Dirty?}
    D -->|no| Rc[Return Cache]
    D -->|yes| TE[Template Exec]
    TE --> TF[Template Functions]
    TF --> R[Recaller]
    R --> Tr[Tracker]
    R --> Ca{Cache?}
    Ca -->|hit|Rd[Return Data]
    Ca -->|miss| Poll
    Poll --> Dep[Dependency]
    Dep --> Cl((Cloud))
    Cl --> Dep
    Dep --> Poll
    Poll --> Ca
Loading

Watcher.Wait() Flow

Similar to the above.. What happens when you call watcher.Wait()?

flowchart TB
    Start --> Wait
    Wait --> S{Select?}
    S -->|dataChan| NewData
    S -->|bufferTimer| Return
    S -->|stopChan| Return
    S -->|errChan| Return
    S -->|context.Done| Return
    NewData --> SC[Save To Cache]
    NewData --> N{Notifier approved?}
    N -->|yes| B{Buffering?}
    B -->|yes| S
    B -->|no| Return
    N -->|no| Return
Loading

hcat's People

Contributors

alvin-huang avatar dependabot[bot] avatar eikenb avatar findkim avatar hashi-derek avatar hashicorp-copywrite[bot] avatar hashicorp-tsccr[bot] avatar lornasong avatar mkam avatar nodyhub avatar wilkermichael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hcat's Issues

Goroutine leak in watcher Wait/Stop

You can Stop the watcher, which shuts everything down except if Wait() is in use. Then that goroutine escapes and if they didn't specify a timeout, it would never exit.

The Wait() needs a new internal "stopped" channel that can be used for it to exit cleanly.

Add Detailed HashiCAT Documentation

It would be great to have detailed documentation about HashiCAT, what it does and how it works.

This could be somewhat along the lines of the Raft documentation (idea by @findkim). Maybe not as detailed at the start but something that can help others who are new to the project understand it more easily.

Hcat does not reset index for blocking queries with Consul when Consul restarts

Summary

When a Consul instance is restarted while Hcat is connected to it, when Hcat reconnects it can reuse the same index, causing the next query to block

Investigation

In view.go, when Hcat suspects Consul was restarted, it is supposed to reset the index:

hcat/view.go

Lines 218 to 227 in 4bf0597

if strings.Contains(err.Error(), "connection refused") {
// This indicates that Consul may have restarted. If Consul
// restarted, the current lastIndex will be stale and cause the
// next blocking query to hang until the wait time expires. To
// be safe, reset the lastIndex=0 so that the next query will not
// block and retrieve the latest lastIndex
v.dataLock.Lock()
v.lastIndex = 0
v.dataLock.Unlock()
}

This case is only covered when the expected string "connection refused" is present. While Consul is shutting down, it returns an error with grpc closing which does not match the expected string, and will not cause the index to be reset to 0. If Consul starts up again immediately, Hcat can then reconnect using the previous index, resulting in a blocking query on new changes.

Solution

Update

hcat/view.go

Line 218 in 4bf0597

if strings.Contains(err.Error(), "connection refused") {
to use the consul API StatusError:

type StatusError struct {
	Code int
	Body string
}

https://github.com/hashicorp/consul/blob/fed112e51ee38eee5eb7d7d46bf9b3dc308b70cf/api/api.go#L85-L88

Reset index on code 500 rather than on a particular string

Move remaining template functions into tfunc library

Move the core/dependency based template functions into the library as well. They were initially left in the template's function map as built-ins but with the advent of newer versions of the methods we should just move them all into the library.

Data structures returned by template functions should be public

Template functions that return data structures laid out in internal/dependency/ cannot be referenced by products using the library as they are internal only. They need to reference them as they are returned by the template functions where you might want to add additional functions that take those and filter them (like byMeta in consul-template).

So allow application added template function to work with these returned dependency structures they need to be made public.

Move them into dep/ and fix all references.

Non-template based dependency watching

Create a non-template notifier and supporting code to allow monitoring arbitrary dependencies without the need for a template.

Use cases in CTS and Envconsul as well as internal vault token watcher.

Not necessary but something to keep an eye on... could template dependencies reuse this system as a lower level abstraction?

Check for Leader election before using Consul

Before connecting to Consul it should be sure the leader has been elected, otherwise you can get partial data. Though this is only a possible during in the Consul startup cycle, that very well could be a reasonable bit of time in a live scenario. It is also a PITA when testing.

Here is how consul-esm handles it. Something similar should suffice in this case as well.

https://github.com/hashicorp/consul-esm/blob/89fa74c077d10b3843f755f90442cecb4876c3b4/agent.go#L97-L110

for {
	leader, err := client.Status().Leader()
	if err != nil {
		logger.Printf("[ERR] error getting leader status: %q, retrying in %s...", err.Error(), retryTime.String())
	} else if leader == "" {
		logger.Printf("[INFO] waiting for cluster to elect a leader before starting, will retry in %s...", retryTime.String())
	} else {
		break
	}


	time.Sleep(retryTime)
}


return &agent, nil

Add example for watcher.Watch()

PR #84 adds a watcher.Watch() method that implements a pub-sub like interface for monitoring the dependency updates. It needs an example in doc_test.go.

update doc_test.go

Update doc_test.go with changes from dependency reworking, #26.

Either update existing examples or add a new one.

Convert primary banch from "master" to "main"

This issue is to mark the change of the primary branch from master to main. I'm doing it today but didn't like that there was no way to record the event. This is that.

If you happen to be working off master and need help switching and you see this... feel free to respond to this one or file a new issue. Thanks.

Cleanup monitoring unused dependencies

Context in #40 identified that Watcher.sweep() and trackedPair.isUsed logic needs to be flushed out to cleanup nested dependencies that no longer are associated with a template.

Allow for more async template/watcher/wait behavior

Currently when checking for template updates we use the list of fields updated since the last wait call (it clears this list early in the Wait call). If we could make the update tracking work more per-template instead of global to the watcher we should be able to support a more async style without losing the shared cache.

My initial thoughts are to try to rework things to check the dependencies against those used by a template as they come in and mark the template as 'dirty' and in need of a re-rendering.

See the comments in this PR for a little context.
hashicorp/consul-terraform-sync#63 (review)

Public interfaces to support blocking queries for external dependencies

Description

Custom dependency types external to the library cannot utilize blocking query functionality supported by hcat with BlockingQuery and QueryOptionsSetter interfaces as internal. And so the index query param is never set during the view fetch for CTS dependencies like ServicesRegex and CatalogServicesRegistration. This results in constant querying of the Consul API.

Expected Behavior

For example, the health service dependency is internal and satisfies both interfaces. The index is set on subsequent fetches as seen in the Consul logs.

    2021-10-14T17:24:33.549-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/api?filter=Checks.Status+%3D%3D+%22passing%22 from=127.0.0.1:52798 latency=5.96916ms
    2021-10-14T17:24:51.074-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/api?filter=Checks.Status+%3D%3D+%22passing%22&index=13 from=127.0.0.1:52798 latency=17.421803958s

Actual Behavior

Whereas the catalog services dependency is external (CTS) results in querying the endpoint continuously w/o blocking since the index query parameter is not set.

    2021-10-14T17:20:58.717-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=134.509µs
    2021-10-14T17:20:58.839-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=138.195µs
    2021-10-14T17:20:58.954-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=125.117µs
    2021-10-14T17:20:59.062-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=117.572µs
    2021-10-14T17:20:59.183-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=131.385µs
    2021-10-14T17:20:59.299-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=102.551µs
    2021-10-14T17:20:59.407-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=119.231µs
    2021-10-14T17:20:59.512-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=135.086µs
    2021-10-14T17:20:59.619-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=121.265µs
    2021-10-14T17:20:59.742-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=155.943µs
    2021-10-14T17:20:59.857-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=131.342µs
    2021-10-14T17:20:59.968-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=92.777µs
    2021-10-14T17:21:00.080-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=60.278µs
    2021-10-14T17:21:00.203-0500 [DEBUG] agent.http: Request finished: method=GET url=/v1/catalog/services from=127.0.0.1:52716 latency=91.272µs

Related to #67

Stop() -> context.Cancel

Consider the API change to remove all the Stop() methods and replace them with a passed in Context that is cancelled. This seems like it would be a cleaner API and fits with more modern/common Go practices that would be good to emulate.

What should resolver.Run return when buffering.. error or content?

The ResolveEvent that is returned during buffering doesn't have the content which would otherwise always be available with the caching template (PR #59) and the error is nil. This breaks with the other return semantics where it either returns a meaningful ResolveEvent with content OR returns a empty/default ResolveEvent and an error.

This can be addressed in either direction.

  1. add a method to the template to let the resolver grab the cached content and return a meaningful ResolveEvent (w/ content and NoChange=true).
  2. have it return an error when buffering with the empty/default ResolveEvent (like other errors)

I'm leaning towards option 2 but am not 100% yet.

Add support to query Consul services by namespace

The namespace attribute is available in health service and catalog service objects to be consumed by templates.

Support for namespace right now I believe, relies on the namespace inherited from the ACL token used for the queries. However if the ACL token is not bound to one namespace but has permissions to access multiple namespaces, there is not a way to query using that token for services under a specific namespace with the current syntax.

{{ service "<TAG>.<NAME>@<DATACENTER>~<NEAR>|<FILTER>" }}

To extend further support for namespaces, it would be useful to query services by their namespace. It might look like adding a namespace query option + new template syntax to utilize the Consul API ns param

Add de-dup to Watch

More of a note to myself. When refactoring the Wait/Watch methods on the watcher, look at deduplicating the notifier ids sent down the channel after collecting more due to draining/buffering.

Related to PR: #90

Feature: support Consul's Event API

Consul's Event API description...

The event command provides a mechanism to fire a custom user event to an entire data-center. These events are opaque to Consul, but they can be used to build scripting infrastructure to do automated deploys, restart services, or perform any other orchestration action. Events can be handled by using a watch.

Docs:
https://www.consul.io/api-docs/event
https://www.consul.io/commands/event

I'm searching for more data on potential use cases but haven't found much so far. If you know of any please let me know in a comment. Thanks.

Race between data fetches and template complete checks

There is a services benchmark issue in CTS that I tracked down to a bad check on the template rendering completion. It appears to be a race between fetching and rendering.

When the data is fetched, it is also marked as having been received via the receivedData flag on the view. Once the template has made a pass rendering it checks on completeness by, amongst other things, checking that the recievedData flag is set on all views. I think while processing the template and filling in those view entries it does have all the others views retrieve their data so that by the time it gets to the completeness check it thinks it had everything during the render and marks it as completed.

Strip out non-dependency related template functions

We want this module to focus on managing the external/dependency related template function. That is all those template functions that need to fetch data from external sources and track/cache it locally. We don't want to pollute the code base with the 50+ utility template functions that are more specific to the use case of consul-template.

The open question is which, if any, utility functions should be included. The main ones I'm considering are (and why)...

  • byMeta, byKey, byTag - these are about working with dependencies (consul stuff)
  • env - was originally thinking of the environment as another external dependency... not 100 on this

The file one was on this list as it used the cache... but I'm specifically doing it in consul-template to make sure that template functions that use the cache are possible in extensions.

Indefinite hanging due to timeout error

Creating a new Consul client hangs indefinitely when the connection times out. A faulty network state of a machine connecting to a remote Consul cluster could result in the timeout below:

$ curl https://10.0.0.4:8500/v1/status/leader
curl: (7) Failed to connect to 10.0.0.4 port 8500: Connection timed out

Working with @npearce, we found that this causes Consul-Terraform-Sync to halt during startup when initializing the Consul client (source) -- spinning for over 30 mins. We suspect the Consul leader check ends up looping indefinitely when net.Error.Temporary() returns true for timeouts (source and DNSError).

func hasLeader(client *consulapi.Client) error {
// spin until Consul cluster has a leader
retryTime := time.Second
for {
leader, err := client.Status().Leader()
switch e := err.(type) {
case net.Error:
if e.Temporary() {
continue
}
return e
default:
}
if leader != "" { // will contain the url of leader if good
return nil
}
retryTime = retryTime * 2
if retryTime > time.Minute {
return fmt.Errorf("client set: no consul leader detected")
}
time.Sleep(retryTime)
}
}

Create external dependency module API/Interface(s)

We eventually want to support more dependency, monitored back-ends than just Consul and Vault. To enable this we need a standard API/Interface(s) that they could implement to work. This ticket is to track initial thoughts on what features each of these modules would need to implement to work as dependencies in hashicat.

Initial list of things that come to mind:

  1. client
  2. backend/client specific config
  3. retry functions
  4. views (they have consul specific fields)
  5. ???

Documention on template function library

With the majority of template library functions moving into hashicat, it would be good if the downstream applications had decent reference documentation to refer to. Currently the template functions have basic docs, but nothing compared to the more extensive docs as they current exist in consul-template.

I'd like to hit somewhere between the verbosity of the current consul-template docs and the brief hashicat docs. I thinking the standard gdoc format with examples should work as a good starting point.

Review Defaults for idle connections

Related to: hashicorp/consul-terraform-sync#164

Review the defaults (if any) and consider applying the defaults CTS uses. These are the default values changed in CTS, as copied from Consul-template. See that ticket for details about why and impact of each change.

DefaultMaxIdleConnsPerHost: 1 -> 100
DefaultMaxIdleConns: 100 -> 0
DefaultIdleConnTimeout: 90s -> 5s

Renderer "middleware" for additional file modifications

Add a way to make the renderer, the part that takes the evaluated template text and writes it to a file, extensible with post (and pre?) write effects. Like creating a backup, changing permissions or owners.

It came up in a recent issue hashicorp/consul-template#1497 that Nomad already uses the file permissions code in consul-template and there is a need for them to set the UID/GID of the files as well. In order for hashicat to replace consul-template it will need to support these sorts of features.

The code currently has a placeholder/reminder setup for this with the backup command, replace that with this.

Standard library of template utility functions

In #15 and #16 we strip out all the template functions that don't target external dependencies from the library. After some discussion we decided to re-include them, but in a sub-package standard library of template functions that you have the option to use if you want.

Having them in a sub-package gives some advantages...

  • no access to internal data validates that the API allows developing all types of useful functions
  • splitting them out keeps the modules smaller and easier to maintain
  • keeps a standard set of functions that can be maintained with constant behavior
  • keeps focus and eases maintenance of more complex dependency oriented functions

The tentative plan is to take everything from consul-template except for scratch as it's functionality is in place to allow for functionality in a template that would be better done via code. As this is a library the developer using it will already be writing code where this sort of functionality is better implemented. If any other functions meet this criteria they will probably also be left out (evaluating them as they are migrated).

un-hardcode vault token monitoring in watcher

The watcher has a hardcoded field for monitoring the vault-token as part of its process. This means for testing the vault-token you need to spin up a watcher with that config.

Seems like it would be better to integrate it without the hardcoding. In a way that would be easier to test.

I've refactored this already in consul-template and should use that as a guide to convert here after the non-template notifiers work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.