nypl / engineering-general Goto Github PK
View Code? Open in Web Editor NEWStandards, values, and other information relevant to the NYPL Engineering Team.
Standards, values, and other information relevant to the NYPL Engineering Team.
I definitely ❤️ the "Fixing Quickly vs Ideally" sentiment but I wonder if there's a way to express this in a more generic way so it's more universally applicable. Maybe more in-line with some of the principles expressed in the Agile Manifesto:
Two relevant principles come to mind:
Simplicity--the art of maximizing the amount of work not done--is essential.
Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
I'll take a first try though it's really rough:
Velocity vs. Ideal
Simple solutions, delivered quickly are often better than ideal solutions. “Quick fixes” are acceptable if they are documented and agreed upon by other developers.
tldr; Teams are a mess. By project maybe?
Our Github Teams situation is a mess. Our teams include a mix of teams created for individual projects and teams reflecting prior institutaional hierarchy. I wasn't able to find useful online opinion on patterns for using Github Teams to simplify repository access across multiple interrelated projects & repos. Our challenge is to 1) maximize the ease with which we can grant Read
and Write
access to relevant staff while simultaneously 2) withholding access that is unwanted, unnecessary, or dangerous. We want things locked-down in general, but we're willing to be a little permissive with permissions if it means we save time tracking down owners/admins to manage account-specific access.
As an aside, I think our repositories should be public by default. If we're following our own recommendations, there's no danger making them so. This simplifies Read
access concerns, although public repos would still benefit from better organized Teams for granting Write
access.
The major patterns of Team organization seem to be:
The vanilla use case favored by Github's Nested teams documentation doesn't seem well suited to our environment on its own. I don't think we'd benefit from teams solely organized by role like "NYPL Digital Engineers", "NYPL Digital PMs", "NYPL Digital Design/UX", etc. We don't often need to grant read/write access only to Developers - and to all Developers. In practice our projects involve certain developers, only one PM, and a selection of Design/UX, and subject specialists - some outside Digital.
Our work is typically organized by "project", a thematically squashed collection of products supported by a number of repositories. A project tends to have a regular stand-up and a dedicated Jira/Waffle board. Contributors to a project often benefit from having unified access to all of the repositories involved in that project. Thus, one pattern we could adopt is to create Github teams around projects. At the start of a project, for the purpose of easily giving people read/write access to the repositories for a project, a Github team could be created that matches the product name (e.g. "Best Books", "Discovery/ReCAP", "SimplyE"). The team would include every Github account that might be interested in seeing the contents of the repo. A more selective "Write-Access" sub-team could be created to represent those contributors likely to need write access to the majority of the repositories associated with the project (e.g. "Best Books Write-Access", "Discovery/ReCAP Write-Access"). Anyone added to the project would need only be added to the relevant project team to gain read/write access to all relevant repositories.
(I really don't like "Write-Access" as a suffix, but "Developer"/"Contributor" are presumptuous/limiting.)
Github project teams would not be deleted when project work is complete because doing so may revoke read/write access to the effected repositories, which may make troubleshooting issues post-launch difficult/impossible. Thus, under this model, the list of Github teams is expected to increase steadily forever and that's okay. Teams are free and their persistence may be valuable as a kind of rolling secondary documentation of repository-project relationships (in lieu of using Github's "Projects" feature).
Note that a given repository may be implicated in multiple projects and thus may grant Read
/Write
access to multiple Teams. For example, the repositories representing the Inventory Service may grant Write
access to both "Discovery/ReCAP Write-Access" and "New Arrivals Write-Access".
For common repositories not tied to a specific project (e.g. NYPL/nypl-data-api-client, NYPL/nypl-core, NYPL/engineering-general, etc.), we would be forced to make them Public or grant Read
access to a "NYPL" catch-all team and Write
access to either:
a) individual accounts on a case by case basis, or
b) role-based teams like "NYPL Engineers", "NYPL Metadata Services".
Read
to "NYPL", Write
by projectThis option abandons granular, project-specific Read
Teams in favor of simply granting Read
to every member of a catch-all "NYPL" team. We gain the convenience of not having to manage project or role-specific teams for Read
access. The cost of this is that every member of the "NYPL" catch-all team gets Read
access to every NYPL private repo. (I see no danger in that provided we're not storing secrets.) To retain granular control over Write
access, project specific Teams would be created (e.g. "Best Books Write-Access", "Discovery/ReCAP Write-Access").
Our Teams listing might resemble:
For common repositories not tied to a specific project, our options resemble those in II.
Admin
access, which by default is held solely by repository creators. If we create project based teams, the only time we should need Admin
access is when initially granting team access to the repo, so perhaps that power is sufficient shared solely by the repo owner and NYPL org admin.Hello (especially @NYPL/recap-data-search, @NYPL/recap-request, @NYPL/recap-ui),
There has been some discussion about logging and standardizing our logging messages. Currently, we're using a few different libraries (winston
, bunyan
, and monolog
). Instead of standardizing on the library, maybe we can agree on log message standard with a "minimum" set of key/values. I'll kick off the discussion with the following proposal:
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
The message format MUST be JSON.
The message MUST contain the following top-level keys: level
, message
:
level
MUST be a string of of one of the following values (case-sensitive) and MUST follow this order of severity (from least to greatest): DEBUG
, INFO
, NOTICE
, WARNING
, ERROR
, CRITICAL
, ALERT
, or EMERGENCY
.message
MUST be a string and SHOULD contain a message useful for debugging/error reporting.Additional key/values of any type MAY be included and MUST NOT break functionality.
Any thoughts? Should we RECOMMEND or REQUIRE additional key/values?
I think it might be a good idea to add an "Onboarding" document to this repo to help new developers get up to speed.
Do you know of any existing documents that we can build on for @gkallenberg @nodanaonlyzuul?
@thisisstephenbetts, what was helpful to you in the onboarding process?
Maybe we can use this issue to help create this document.
CC: @jvarghese01
After a wonderful conversation with @winniequinn regarding offboarding staff on Slack, she suggested the following wonderful ideas:
That way, we can DM the guest while the account is still active for those three months. The guest access only to #general channel will not incur new charges to our nypl.slack.com account.
Let me know what you think, @kfriedman, @nodanaonlyzuul. Please ping anyone who you think should join this conversation!
I will a first draft PR that tackles naming conventions.
At the moment it can be difficult to find out where code that is deployed to AWS is stored on GitHub just from looking at AWS. I thought it would be useful to add something to our standards about indicating corresponding GitHub repos on AWS. For example, perhaps we could ask to have the repo url stored in an environment variable with a standard name, or included in the "description" field.
Is it okay to include this quote at the beginning of the documentation standards page? It's been on my mind. 😄
“Documentation is a love letter that you write to your future self.”
— Damian Conway
Perhaps we should a CI section to the testing standard?
Or, if we're not doing full CI, at least spell out our usage of a CI service like Travis?
I noticed on some of our machines we used different names for our AWS profiles. As a recommendation, I'd like to suggest that the profile names be the Account ID or alias, e.g. nypl-sandbox
, nypl-digital-dev
, and nypl
for AWS account run by Systems Engineering, when we log into AWS, so it is
Right now I'm only concerned about those three profiles, and engineers should be free to name other profiles to their preference.
Hello, @nonword and I went on an adventure with AWS Lambda deployment, and we discovered that if our machines have a [default]
profile inside our .aws/config
and .aws/credentials
files, node-lambda
npm library will look for default settings first and override the necessary KMS decryption settings. I am recommending that we should include a note on AWS Lambda to make sure our machines have no [default]
on our aws credentials.
Understand that 100% might be counter-productive or not so feasible, still seems like we should hold our own feet to the fire. We're developing something that people are going to need to count on, and with as many pieces as this thing has, we need to make sure they all work.
Hi.
Our current logging guides address an optional levelCode
key and gives a guide to how those integers map to the level
key.
That mapping is different across programming languages:
# Logging severity.
module Severity
DEBUG = 0
INFO = 1
WARN = 2
ERROR = 3
FATAL = 4
UNKNOWN = 5
end
Level | Numeric Value |
---|---|
CRITICAL | 50 |
ERROR | 40 |
WARNING | 30 |
INFO | 20 |
DEBUG | 10 |
NOTSET | 0 |
So - I feel weird that our existing documentation calls out the log levels that we do.
I'm not sure what the answer is:
logLevel
key more conspicuous?logLevel
all together?In our travis coverage, we should note ways to better leverage our common travis acct. Suggestion from @nodanaonlyzuul and @gkallenberg :
I think we can get more mileage out of the limited number of concurrent builds our Travis plan gives us by:
- Doing fewer builds.
- Making builds faster.
Doing fewer builds is easy - by doing:
only:
- master
- deployable-branch1
- deployable-branch2
If you do this - TravisCI will not do builds for feature branches but WILL do builds for PRs.
We can make builds faster by asking travis to cache gems, node modules, (maybe even apt packages) https://docs.travis-ci.com/user/caching/.
In order to provide guidance and a common approach to building microservices at NYPL, a set of guidelines was initially drafted. This is a first, rough draft:
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
Services SHOULD follow common microservice patterns.
It's RECOMMENDED that they:
See: Best Practices for Building a Microservice Architecture and Pattern: Microservice Architecture
Services with HTTP endpoints SHOULD follow RESTful design principles.
It's RECOMMENDED that they follow Best Practices for Designing a Pragmatic RESTful API.
Services SHOULD provide adequate documentation.
It's RECOMMENDED that they:
Services SHOULD publish data events to the NYPL data-streaming platform.
Services that publish events MUST:
Services MUST be designed with adequate monitoring/alerting.
It's RECOMMENDED that they:
Services MUST log error messages.
Error messages MUST:
Services SHOULD provide a SLA/performance standard.
It's RECOMMENDED that they are load-tested with results published.
Services with HTTP endpoints SHOULD be deployable via an API Gateway
It's RECOMMENDED that they:
X-NYPL-Identity
header for authentication.Services MUST follow security practices.
It's RECOMMENDED that they:
Services MUST follow agreed-upon engineering practices.
It's RECOMMENDED that they:
Please add your thoughts, suggestions, and comments!
I'd like to continue the discussion started in NYPL/discovery-api#93 , which proposed placing common deployment config in a central place like engineering-general. There are several patterns for deployment emerging from our work, with lots of common config. I like @nodanaonlyzuul's idea that we collect that information in a central place, perhaps organized by technology (e.g. EB, lambda, EKS). For any kind of deployment, we could document the distinct strategies we consider best practice and link to a boilerplate repo, yeoman generator, etc. This would help bring up new apps, but could also allow us to slim down our READMEs by letting us link directly to the deployment standard the app uses, reducing the app's own responsibility for documenting all of the relevant eb
subcommands, for example. (Separately, but in that same spirit, I'd love for us to develop a vocabulary of git deploy strategies to make it easier to read at a glance what kind of app we're dealing with.)
Several links in the README.md
file are broken, namely:
Coding Style/Javascript
Coding Style/Python
Coding Style/Ruby on Rails
Coding Style/PHP
Production Readiness
(was this meant to link to https://github.com/NYPL/engineering-general/blob/main/standards/deployment.md#production-readiness?)In addition, since this Repo builds out a GitHub pages site here: https://nypl.github.io/engineering-general/ it would be nice to embed that link directly into the README.md
file for folks to read the information there, if they wish.
Based on our meeting on 7/28, we agreed to create a first draft of standards (and assigned owners) for the following:
^ indicates first priority as it's a blocker/critical for the ReCAP project
Next steps:
@EdwinGuzman, @nonword, @ktp242, and @rhernand3z: Since you weren't able to attend: 1. Do you have any standards you'd like to see created? 2. Would you like to volunteer for any of the unassigned standards above (or any other standard)?
Thanks everyone. Please let me know if I missed anything or if you have have any more thoughts. Y'all rock! 🎸
cc: @jvarghese01, @ablwr
I know this might be a little too touchy-feely for some (or splitting hairs), but "Rights & Responsibilities" sounds a little too, um, rigid for my liking.
Maybe "Engineering Values" like Medium or Buffer:
https://medium.engineering/engineering-values-7143c0db0bd6
https://buffer.com/about
Or "Principles" like the Agile Manifesto:
http://agilemanifesto.org/principles.html
I recognize this could just be personal preference though. 😄
I think it might be helpful to have a standard that describes how services should be discovered.
Right now, it could just be a file in this Github repo somewhere. Maybe it should list:
This would greatly help with discovery and uniform naming of things like SNS topics, etc.
I was having a conversation with @katesweeney on who needs to be reviewers of nypl-core, and learned that I should include @saverkamp as a reviewer for pull requests.
I'm thinking, rather than asking around every time who should be included as reviewers, a Markdown file telling me maintainers/point-of-contacts I should include within each code repository, would work wonders. I have learned recently that Github gives special meaning to CONTRIBUTING.md at code repository's root level, and it is also a standard practice used at companies such as Pantheon and CircleCI.
While we can figure out which parts of the standards we can and should adopt, some instructions on how to submit a Pull Request within a file such as CONTRIBUTING.md would give me pointers on the workflow of a repo.
@jobinthomasnypl I got a little confused thinking that the file ci-and-deployment.md
covers both CI and Deployment. IMO Deployment may need another page. Is it possible to rename the file to ci-coverage.md
or something else? Thanks!
I've been staring at some old documentation, and these pages are everywhere: On Confluence via Atlassian Cloud, on confluence.nypl, various Google Docs, etc. Our newest spot was to park the documentation on GitHub Wiki pages. Having a discussion with @nonword, he pointed out that the search function on the Wiki is broken. I was on the track of thought that maybe if we ever have to move our repos again somewhere else, we will lose the GitHub Wiki pages because it is specific to GitHub.
I have a suggestion: I think it'll make more sense to embed documentation as part of the repos, e.g. a doc
or docs
folder in the code repo, and park all our documentation written in MarkDown that is somehow outside of README. That way the file structure is vendor-agnostic, we also have a plus that most repo management vendors such as BitBucket and GitHub would parse MarkDown files.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.