Code Monkey home page Code Monkey logo

Comments (13)

ChrisBarker-NOAA avatar ChrisBarker-NOAA commented on July 17, 2024 2

Wouldn't it possible to have an action that kicks in immediately after the files from the PR has been merged into the main?

yes, I'm pretty sure that's possible.

That is, it will be clear who authored the changes and when they where merged from a separate branch.

That's the problem -- for example:

  • Person A makes some changes, with some extra whitespace, etc.
  • They make a PR, and it gets merged in to main with commits "squashed"
  • The git log shows something like: (taken from today)
$ git log
commit 44e7c0784007ebf0f21f67e3e891a746f30edda5 (HEAD -> main, origin/main, origin/HEAD)
Author: Daniel Lee <[email protected]>
Date:   Wed Jan 24 17:04:11 2024 +0100

    Update line breaks (and minor corrections) (#401)
    
    * Update line breaks (and minor corrections)
    
    * Homogenise monotype and bolding

And git blame shows:
(small excerpt)

44e7c078 (Daniel Lee         2024-01-24 17:04:11 +0100  93) The format for the `formula_terms` attribute is 
940afbc0 (Steve Jones        2022-09-14 16:26:49 +0200  94) 
44e7c078 (Daniel Lee         2024-01-24 17:04:11 +0100  95) ----
44e7c078 (Daniel Lee         2024-01-24 17:04:11 +0100  96) formula_terms = "a: var1 b: var2 ps: var3 p0: var4" 
^eca16f4 (Richard Hattersley 2015-06-04 11:42:26 +0100  97) ----
^eca16f4 (Richard Hattersley 2015-06-04 11:42:26 +0100  98) 

All good. We know that lines 95 and 96 were touched by Daniel on 2024-01-24.

Now the bot runs, and corrects whitespace, etc.

Now git log will look like:

$ git log
commit bf0f21f67e3e891a74644e7c0784007ef30edda5 (HEAD -> main, origin/main, origin/HEAD)
Author: cf-autoformat-bot 
Date:   Wed Jan 24 17:06:00 2024 +0100

    Auto-formatting run

not too bad, you can always look at previous commits.

But now git blame looks like this:

44e7c078 (cf-autoformat-bot    2024-01-24 17:06:00 +0100  93) The format for the `formula_terms` attribute is 
940afbc0 (Steve Jones        2022-09-14 16:26:49 +0200  94) 
44e7c078 (cf-autoformat-bot    2024-01-24 17:06:00 +0100  95) ----
44e7c078 (cf-autoformat-bot    2024-01-24 17:06:00 +0100  96) formula_terms = "a: var1 b: var2 ps: var3 p0: var4" 
^eca16f4 (Richard Hattersley 2015-06-04 11:42:26 +0100  97) ----
^eca16f4 (Richard Hattersley 2015-06-04 11:42:26 +0100  98) 

So we know when those lines were changed, but not by who.

Anyway, not fatal -- in some cases with real code, you really want to know who made the changes, but for this kind of thing, maybe we don't care (and it is there in the history if you really want to dig for it) -- but that's my point.

If we can find a way to get the autoformat changes to be squashed with the other changes as one merge, then it'll be a clearer history. maybe that's not important, though.

from cf-conventions.

erget avatar erget commented on July 17, 2024 2

TL;DR: I'd keep it simple for the merging procedures because that takes place a lot more often than forensics do, and I don't think we incur huge debts by having a bit of a dirty history. Note that we have a lot of commits that have very undescriptive messages as they are.

I can unpack that a bit:

Yes, in essence you can still find out who really authored each line by going back to the commit previous to the one made by the autoformatter and then you'd still get the line-by-line allocated to (probably) humans.

In the end though I think the squashing or rebasing or whatever is probably down to the release procedures, because there you have similar concerns. One could e.g. rebase such that the autoformatter changes are always squashed into the previous commit. That'd be how I'd do it, personally. It really depends on how we do the branching and merging - currently after agreement essentially the work branch is merged into master which will eventually be the next tag. If we wanted great traceability and great formatting, options I see are to

  • have a community that abides by the formatting rules (I have shown that I'm not personally trustworthy here because in the absence of an IDE I used the GitHub editor, which introduces trailing white space - mea culpa!) or
  • have a pre-commit hook that forces people to abide - not in favour because it could be confusing to many contributors or
  • run a job before merging to main that would squash the commit. That's a bit more cumbersome for the merger, and given that the Conventions Committee are not all git experts, that could hamper their efforts, or
  • involve the IM team to finalise merges - then they squash and merge after e.g. the committee has merged to a next branch or something, which means that we have an overall higher complexity in our branching approach but it doesn't really hurt the Committee's work

In my view all of those are feasible but come at costs that outweigh the benefits. In the end, in the event that one really wants line-by-line forensics, it's gonna involve a bit of digging anyway, but that digging doesn't take place often and people who are expert in git can be asked for advise at that point.

from cf-conventions.

erget avatar erget commented on July 17, 2024 1

Hi Sadie,

Thanks for taking the lead here - if we were all devs I'd say let's use a pre-commit hook, but considering the diversity of technical setups that our contributors have, my recommendation would be to setup an automation that remotes trailing whitespace on top of PRs. Thinking out loud here, not sure about feasibility, what if at every PR there's a check that looks for trailing whitespace and if it's found adds a commit on top of that that kills the trailing whitespace?

In fact with such an action, a manual blitz would no longer be necessary...

What could possibly go wrong?
image

from cf-conventions.

ChrisBarker-NOAA avatar ChrisBarker-NOAA commented on July 17, 2024 1

I"m no expert in git -- but I'm pretty sure that tehre is a way to set up a pre-commit hook that does this beofre the commit gets pushed.

what if at every PR there's a check that looks for trailing whitespace and if it's found adds a commit on top of that that kills the trailing whitespace

This is less than ideal, because you get an extra "wasted" commit -- a somewhat ugly history -- as an example "git blame" would show that every line that had whitespace removed has been last edited by the system, rather than whoever actually made a change.

Some quick googling did not reveal an answer, but this is a VERY common issue, there has got to be a standard solution out there!

from cf-conventions.

larsbarring avatar larsbarring commented on July 17, 2024 1

I agree that pre-commit probably will be a barrier. Is the auto-blitz intended to happen when opening a PR, and every time time the PR is updated? Then I agree agree with @ChrisBarker-NOAA that this will complicate the history, and that there ought to be a smart solution already out there. Could the auto-bliz happen when the PR is merged?

from cf-conventions.

ethanrd avatar ethanrd commented on July 17, 2024 1

I expect most PRs contain more than one commit. So I don't think we should worry too much about an extra commit (or two) on a PR messing up the commit history.

There are lots of style guide checkers for code (PEP8, Spotless, etc.) but I haven't found any for Asciidoc. I've work on other projects where the workflow involves a test on PRs that fail if the style isn't followed. A committer must then push another commit with the fix so the test passes before the PR can be merged. Not sure if that is better or easier than an automated process but it is a bit more visible (which might help with awareness among contributors of the style guide).

from cf-conventions.

erget avatar erget commented on July 17, 2024 1

Thanks @sadielbartholomew , excellently summarised :)

I'm happy with both 2 and 3, but I could see a certain charm in traceability of implementing 2 and agreeing to do this before minting a new release. That means that it would take place en bloc before the release goes out. Of course, I don't mint the releases so that's easy for me to say; @davidhassell and others may have a different perspective.

from cf-conventions.

ChrisBarker-NOAA avatar ChrisBarker-NOAA commented on July 17, 2024 1

(2) has the disadvantage of creating gaps in the history -- any lines that were fixed by the job will show as hvng been edited by the CI, not whoever actually made the change (and the data will be wring, too, though if it's run regularly, not by much). On the other hand, maybe history isn't important for this kind of project. If there's something of concern with a part of the doc, does it matter exactly who last touched it or when they did?

I think (3 i. ) is not great -- similar to commit hooks, it puts a burden on the author of the PR -- and depending on their skill set could be a barrier to entry.

I'm not sure I fully understand (3 ii.) -- where / when would the fixing occur? Depending on the answer to that, it could introduce similar history issue.

So I suggest (4) (which might be 3 ii, if I've misunderstood:

(4) Summary: all changed, including auto-linting, are squashed as a single commit when merged into main

Some of this is (hopefully) policy already, but I'm putting it here, as it's important to the process.

  • main is never updated directly -- the only way to update main is via a merge from another branch.
  • In all non-main branches, a CI job is run that fixes style issue on every push: those branches will always be "clean".
  • When a branch is merged, commits are "squashed", so appear as a single commit -- so the history shows the whole thing as a single, correct, change, including style fixing.

This could be a bit complicated by PRs from a third-party repo -- I'd have to think about that more, as asking folks to first merge into a temporary branch,and then we merge into main could be a bit challenging.

from cf-conventions.

larsbarring avatar larsbarring commented on July 17, 2024 1

I am nothing like an expert in git or github, more the opposite. Maybe that is why I do not fully see the same disadvantage alternative 2 as @ChrisBarker-NOAA does:
Wouldn't it possible to have an action that kicks in immediately after the files from the PR has been merged into the main? That is, it will be clear who authored the changes and when they where merged from a separate branch. Immediately after the merge the action kicks in and does any cleaning of those specific files in the main branch. I.e. the action is the only one that do changes directly in the main branch, and they happens seconds/minutes after the merge.
--- Or am I envisioning the impossible here? There seems to to be some disussion on this subject over at Stack Overflow, but now I am definitively out of my depth.

from cf-conventions.

ethanrd avatar ethanrd commented on July 17, 2024 1

Hi all - I favor option 3.1 (test and fix by hand-ish [1]) because it increases the visibility of the style guide rather than hiding it behind an automatic process. I think awareness of the style guide/rules is worth a bit of extra effort. Also, that effort doesn't need to be provided by the original committer. The committer can allow CF repo maintainers to push to the PR branch.

[1] By hand or by a CI job that runs after the failed test and adds a commit to the PR.

from cf-conventions.

ethanrd avatar ethanrd commented on July 17, 2024 1

After reading @ChrisBarker-NOAA's "squash" comment, I took a closer look and notice that we have a combination of PRs being merged and PRs being squashed onto the 'main' branch. While I think a clean commit history is important to work towards, I think we might want to develop some process around how to decide on what makes a clean commit for a given PR. Encouraging PR committer(s) to decide on and rebase to get a clean (minimal but informative) commit history would be my go to preference. But that adds yet another barrier. Though, again, I think it could be handled by CF repo maintainers.

This should probably be a separate discussion. I'll try and start an issue on PR merge process and clean commit history later today.

from cf-conventions.

sadielbartholomew avatar sadielbartholomew commented on July 17, 2024

Thanks @erget, I agree that pre-commit hooks are probably too much of a barrier to employ and that some automation solution is best.

what if at every PR there's a check that looks for trailing whitespace and if it's found adds a commit on top of that that kills the trailing whitespace

Sounds good to me. I will look into options and see if I can get a job running that implements this, once I've waited a few working days to see if anyone has any issue with the idea in general.

from cf-conventions.

sadielbartholomew avatar sadielbartholomew commented on July 17, 2024

Thanks all for your thoughts and ideas. And it is good to hear some background to this issue e.g. from @ethanrd. It's probably wise as Ethan suggests to consider style linting as a whole whilst we think about this, because if we implement a solution here we would want it to be extensible to any further style fixes we might want to use too, eventually at least. And I think trailing whitespace removal is a good thing to start with, since it's non-controversial, easy to run operations to remove, and clear to review for manually to check our solution is doing what we want.

As everyone's indicated, there's no perfect solution, but plenty of (fairly-)standard approaches to choose from. It seems we have collectively thought of the following possibilities, which I've numbered to help us discuss, and moreover are agreed (please do correct me if anyone feels this is wrong or mis-representative) that we don't want the first so I've crossed it out:

  1. pre-commit hooks: would work but are too much of a barrier to those who contribute, not all of who are (very) familiar with the command-line or git etc., so avoid;
  2. a regular PR/commit dedicated to style fixes, ideally by a CI workflow and/or cron or similar running a regular job;
  3. linting checks on (all) PRs which warn of style issues such as trailing whitespace could be a solution where either:
    1. the PR author needs to add extra commits to fix any issues before it is allowed to be merged;
    2. a CI job runs after every PR or set of PRs merge to check for any style issues and automate the fixing of any that have been introduced by the PR.

Maybe if everyone could outline their favourite (or unfavoured) option(s), or ask any questions about them, then we can determine the best way forward? If it helps, based on past experience I have I think it would be simple enough for me to set up any of these solutions, so ease of implementation isn't a concern (to me, at least).

from cf-conventions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.