Comments (13)
Wouldn't it possible to have an action that kicks in immediately after the files from the PR has been merged into the main?
yes, I'm pretty sure that's possible.
That is, it will be clear who authored the changes and when they where merged from a separate branch.
That's the problem -- for example:
- Person A makes some changes, with some extra whitespace, etc.
- They make a PR, and it gets merged in to main with commits "squashed"
- The git log shows something like: (taken from today)
$ git log
commit 44e7c0784007ebf0f21f67e3e891a746f30edda5 (HEAD -> main, origin/main, origin/HEAD)
Author: Daniel Lee <[email protected]>
Date: Wed Jan 24 17:04:11 2024 +0100
Update line breaks (and minor corrections) (#401)
* Update line breaks (and minor corrections)
* Homogenise monotype and bolding
And git blame shows:
(small excerpt)
44e7c078 (Daniel Lee 2024-01-24 17:04:11 +0100 93) The format for the `formula_terms` attribute is
940afbc0 (Steve Jones 2022-09-14 16:26:49 +0200 94)
44e7c078 (Daniel Lee 2024-01-24 17:04:11 +0100 95) ----
44e7c078 (Daniel Lee 2024-01-24 17:04:11 +0100 96) formula_terms = "a: var1 b: var2 ps: var3 p0: var4"
^eca16f4 (Richard Hattersley 2015-06-04 11:42:26 +0100 97) ----
^eca16f4 (Richard Hattersley 2015-06-04 11:42:26 +0100 98)
All good. We know that lines 95 and 96 were touched by Daniel on 2024-01-24.
Now the bot runs, and corrects whitespace, etc.
Now git log will look like:
$ git log
commit bf0f21f67e3e891a74644e7c0784007ef30edda5 (HEAD -> main, origin/main, origin/HEAD)
Author: cf-autoformat-bot
Date: Wed Jan 24 17:06:00 2024 +0100
Auto-formatting run
not too bad, you can always look at previous commits.
But now git blame looks like this:
44e7c078 (cf-autoformat-bot 2024-01-24 17:06:00 +0100 93) The format for the `formula_terms` attribute is
940afbc0 (Steve Jones 2022-09-14 16:26:49 +0200 94)
44e7c078 (cf-autoformat-bot 2024-01-24 17:06:00 +0100 95) ----
44e7c078 (cf-autoformat-bot 2024-01-24 17:06:00 +0100 96) formula_terms = "a: var1 b: var2 ps: var3 p0: var4"
^eca16f4 (Richard Hattersley 2015-06-04 11:42:26 +0100 97) ----
^eca16f4 (Richard Hattersley 2015-06-04 11:42:26 +0100 98)
So we know when those lines were changed, but not by who.
Anyway, not fatal -- in some cases with real code, you really want to know who made the changes, but for this kind of thing, maybe we don't care (and it is there in the history if you really want to dig for it) -- but that's my point.
If we can find a way to get the autoformat changes to be squashed with the other changes as one merge, then it'll be a clearer history. maybe that's not important, though.
from cf-conventions.
TL;DR: I'd keep it simple for the merging procedures because that takes place a lot more often than forensics do, and I don't think we incur huge debts by having a bit of a dirty history. Note that we have a lot of commits that have very undescriptive messages as they are.
I can unpack that a bit:
Yes, in essence you can still find out who really authored each line by going back to the commit previous to the one made by the autoformatter and then you'd still get the line-by-line allocated to (probably) humans.
In the end though I think the squashing or rebasing or whatever is probably down to the release procedures, because there you have similar concerns. One could e.g. rebase such that the autoformatter changes are always squashed into the previous commit. That'd be how I'd do it, personally. It really depends on how we do the branching and merging - currently after agreement essentially the work branch is merged into master
which will eventually be the next tag. If we wanted great traceability and great formatting, options I see are to
- have a community that abides by the formatting rules (I have shown that I'm not personally trustworthy here because in the absence of an IDE I used the GitHub editor, which introduces trailing white space - mea culpa!) or
- have a pre-commit hook that forces people to abide - not in favour because it could be confusing to many contributors or
- run a job before merging to
main
that would squash the commit. That's a bit more cumbersome for the merger, and given that the Conventions Committee are not all git experts, that could hamper their efforts, or - involve the IM team to finalise merges - then they squash and merge after e.g. the committee has merged to a
next
branch or something, which means that we have an overall higher complexity in our branching approach but it doesn't really hurt the Committee's work
In my view all of those are feasible but come at costs that outweigh the benefits. In the end, in the event that one really wants line-by-line forensics, it's gonna involve a bit of digging anyway, but that digging doesn't take place often and people who are expert in git can be asked for advise at that point.
from cf-conventions.
Hi Sadie,
Thanks for taking the lead here - if we were all devs I'd say let's use a pre-commit hook, but considering the diversity of technical setups that our contributors have, my recommendation would be to setup an automation that remotes trailing whitespace on top of PRs. Thinking out loud here, not sure about feasibility, what if at every PR there's a check that looks for trailing whitespace and if it's found adds a commit on top of that that kills the trailing whitespace?
In fact with such an action, a manual blitz would no longer be necessary...
from cf-conventions.
I"m no expert in git -- but I'm pretty sure that tehre is a way to set up a pre-commit hook that does this beofre the commit gets pushed.
what if at every PR there's a check that looks for trailing whitespace and if it's found adds a commit on top of that that kills the trailing whitespace
This is less than ideal, because you get an extra "wasted" commit -- a somewhat ugly history -- as an example "git blame" would show that every line that had whitespace removed has been last edited by the system, rather than whoever actually made a change.
Some quick googling did not reveal an answer, but this is a VERY common issue, there has got to be a standard solution out there!
from cf-conventions.
I agree that pre-commit probably will be a barrier. Is the auto-blitz intended to happen when opening a PR, and every time time the PR is updated? Then I agree agree with @ChrisBarker-NOAA that this will complicate the history, and that there ought to be a smart solution already out there. Could the auto-bliz happen when the PR is merged?
from cf-conventions.
I expect most PRs contain more than one commit. So I don't think we should worry too much about an extra commit (or two) on a PR messing up the commit history.
There are lots of style guide checkers for code (PEP8, Spotless, etc.) but I haven't found any for Asciidoc. I've work on other projects where the workflow involves a test on PRs that fail if the style isn't followed. A committer must then push another commit with the fix so the test passes before the PR can be merged. Not sure if that is better or easier than an automated process but it is a bit more visible (which might help with awareness among contributors of the style guide).
from cf-conventions.
Thanks @sadielbartholomew , excellently summarised :)
I'm happy with both 2 and 3, but I could see a certain charm in traceability of implementing 2 and agreeing to do this before minting a new release. That means that it would take place en bloc before the release goes out. Of course, I don't mint the releases so that's easy for me to say; @davidhassell and others may have a different perspective.
from cf-conventions.
(2) has the disadvantage of creating gaps in the history -- any lines that were fixed by the job will show as hvng been edited by the CI, not whoever actually made the change (and the data will be wring, too, though if it's run regularly, not by much). On the other hand, maybe history isn't important for this kind of project. If there's something of concern with a part of the doc, does it matter exactly who last touched it or when they did?
I think (3 i. ) is not great -- similar to commit hooks, it puts a burden on the author of the PR -- and depending on their skill set could be a barrier to entry.
I'm not sure I fully understand (3 ii.) -- where / when would the fixing occur? Depending on the answer to that, it could introduce similar history issue.
So I suggest (4) (which might be 3 ii, if I've misunderstood:
(4) Summary: all changed, including auto-linting, are squashed as a single commit when merged into main
Some of this is (hopefully) policy already, but I'm putting it here, as it's important to the process.
- main is never updated directly -- the only way to update main is via a merge from another branch.
- In all non-main branches, a CI job is run that fixes style issue on every push: those branches will always be "clean".
- When a branch is merged, commits are "squashed", so appear as a single commit -- so the history shows the whole thing as a single, correct, change, including style fixing.
This could be a bit complicated by PRs from a third-party repo -- I'd have to think about that more, as asking folks to first merge into a temporary branch,and then we merge into main could be a bit challenging.
from cf-conventions.
I am nothing like an expert in git or github, more the opposite. Maybe that is why I do not fully see the same disadvantage alternative 2 as @ChrisBarker-NOAA does:
Wouldn't it possible to have an action that kicks in immediately after the files from the PR has been merged into the main
? That is, it will be clear who authored the changes and when they where merged from a separate branch. Immediately after the merge the action kicks in and does any cleaning of those specific files in the main branch. I.e. the action is the only one that do changes directly in the main branch, and they happens seconds/minutes after the merge.
--- Or am I envisioning the impossible here? There seems to to be some disussion on this subject over at Stack Overflow, but now I am definitively out of my depth.
from cf-conventions.
Hi all - I favor option 3.1 (test and fix by hand-ish [1]) because it increases the visibility of the style guide rather than hiding it behind an automatic process. I think awareness of the style guide/rules is worth a bit of extra effort. Also, that effort doesn't need to be provided by the original committer. The committer can allow CF repo maintainers to push to the PR branch.
[1] By hand or by a CI job that runs after the failed test and adds a commit to the PR.
from cf-conventions.
After reading @ChrisBarker-NOAA's "squash" comment, I took a closer look and notice that we have a combination of PRs being merged and PRs being squashed onto the 'main' branch. While I think a clean commit history is important to work towards, I think we might want to develop some process around how to decide on what makes a clean commit for a given PR. Encouraging PR committer(s) to decide on and rebase to get a clean (minimal but informative) commit history would be my go to preference. But that adds yet another barrier. Though, again, I think it could be handled by CF repo maintainers.
This should probably be a separate discussion. I'll try and start an issue on PR merge process and clean commit history later today.
from cf-conventions.
Thanks @erget, I agree that pre-commit hooks are probably too much of a barrier to employ and that some automation solution is best.
what if at every PR there's a check that looks for trailing whitespace and if it's found adds a commit on top of that that kills the trailing whitespace
Sounds good to me. I will look into options and see if I can get a job running that implements this, once I've waited a few working days to see if anyone has any issue with the idea in general.
from cf-conventions.
Thanks all for your thoughts and ideas. And it is good to hear some background to this issue e.g. from @ethanrd. It's probably wise as Ethan suggests to consider style linting as a whole whilst we think about this, because if we implement a solution here we would want it to be extensible to any further style fixes we might want to use too, eventually at least. And I think trailing whitespace removal is a good thing to start with, since it's non-controversial, easy to run operations to remove, and clear to review for manually to check our solution is doing what we want.
As everyone's indicated, there's no perfect solution, but plenty of (fairly-)standard approaches to choose from. It seems we have collectively thought of the following possibilities, which I've numbered to help us discuss, and moreover are agreed (please do correct me if anyone feels this is wrong or mis-representative) that we don't want the first so I've crossed it out:
pre-commit hooks: would work but are too much of a barrier to those who contribute, not all of who are (very) familiar with the command-line orgit
etc., so avoid;- a regular PR/commit dedicated to style fixes, ideally by a CI workflow and/or cron or similar running a regular job;
- linting checks on (all) PRs which warn of style issues such as trailing whitespace could be a solution where either:
- the PR author needs to add extra commits to fix any issues before it is allowed to be merged;
- a CI job runs after every PR or set of PRs merge to check for any style issues and automate the fixing of any that have been introduced by the PR.
Maybe if everyone could outline their favourite (or unfavoured) option(s), or ask any questions about them, then we can determine the best way forward? If it helps, based on past experience I have I think it would be simple enough for me to set up any of these solutions, so ease of implementation isn't a concern (to me, at least).
from cf-conventions.
Related Issues (20)
- Bulk change "http://..." to "https://..." HOT 15
- Small update to text in section 2.3 regarding character sets HOT 4
- Incorrect formating for some `<=` in Appendix D HOT 2
- Allow period and hyphen in attribute names HOT 21
- Appendix F: 14 `geotiff.maptools.org` domain links redirecting HOT 4
- Introduce `units_metadata` attribute to clarify the meaning of quantities involving temperature HOT 5
- Add a missing author to the list HOT 1
- Fix affiliation for Dave Allured HOT 2
- Problems in the github document build process HOT 7
- Simple correction to Example 6.1.2 HOT 5
- corrections to `units_metadata` text HOT 2
- Formatting of local links in text; Lists of Figures, Tables and Examples HOT 1
- Clarification of the use of `long_name`, `standard_name`, `cf_role` and non-standard attributes HOT 4
- Incorporating the CFA convention for aggregated datasets into CF HOT 3
- In exceptional cases allow a standard name to be aliased into two alternatives HOT 6
- Appendix B: New element in XML file header to record the "first published date" HOT 5
- Include DOI and License information in the conventions document HOT 20
- recommendation of `standard_name` or `long_name` HOT 4
- Update the XML format specification in Appendix B to provide a robust link to the XML schema file HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cf-conventions.