Code Monkey home page Code Monkey logo

git-repo-sync's Introduction

git-repo-sync

Synchronization of Branches between Remote Git-repositories

  • The git-repo-sync is a bash script that synchronizes branches between two remote Git-repositories.
    • Git-tags are not synchronized.
  • You configure once what branches to synchronize and how.
    • You have to investigate some time to understand git-repo-sync conflict solving strategies and configuring.
  • Your run git-repo-sync periodically, preferebly every few minutes.

Warning! Before reading the following keep in mind the difference between local and remote Git repositories.

If your people push (commit) often to a single synchronized Git-branch and do it to different remote Git-repositories, then:

  • Run git-repo-sync before pushing to such the branch.

Manual Action of Synchronization

If someone pushed to the same branch (to another remote repo) in between the runing of git-repo-sync and your pusing. In this case:

  • Run git-repo-sync
  • Udpate your local repository (git fetch).
  • Check if your commits (push) wasn't deleted from your remote Git-repository. (FYI. You local commits in your local repository will not be changed!)
  • If it was deleted in the remote repo:
  • merge, rebase, etc., your local branch over the latest remote commits;
  • repeat Git-push for your branch.
  • Repeat everything until your pushed branch will be in expected commits in your remote Git-repository.

This situation is covered by notifications but you have to configure this by yourlself in your enterprise environment.

Use Cases

  • Adhesion of Git-remote-repositories of clients and software/support suppliers. Temporary or permanent.
  • Independence from an external remote Git repository if it is slow and could be out of service time after time.
  • You software teams have independent Git remote repositories.

Requirements

  • Install Git
  • Use bash to run git-repo-sync. (It is not tested for zsh)
  • Tune any automation to run git-repo-sync periodically - crones, schedulers, Jenkins, GitLab-CI, etc. Or run it periodically yourself.

This is enough for Windows, Arch based Linux (Manjaro), GNU based Linux

macOS Additional Requirements

  • Update bash by running (restart your shell after this)
    • brew install bash
  • Install gAWK (GNU AWK)
    • brew install gawk

Ubuntu Additional Requirements

  • Install gAWK (GNU AWK).

Other Linux Additional Requirements

How to use

Copy git-repo-sync somewhere

git clone https://github.com/it3xl/git-repo-sync.git

Let git-repo-sync know location of your remote Git repositories.
Modify url_a and url_b variables in default_sync_project.sh.
You can use URL-s and file paths.

url_a=https://example.com/git/my_repo.git

url_b='/c/my-folder/my-local-git-repo-folder'

Run periodically the git-sync.sh file, which is located in the root of git-repo-sync.

bash  git-sync.sh

The git-sync.sh will tell you if there are any troubles. For example you need to update awk to gAWK in Ubuntu.

The Trade-off

The Trade-off is an automated Git-conflict solving logic of git-repo-sync.

Even if you run git-repo-sync periodically and often, you still have a chance to get a Git-conflict. But a small chance.
So, you must know what to do in case of Git-conflicts solved by git-repo-sync.

Minimize chances of The Trade-off

Run git-repo-sync before Git-pushing. I.e. synchronize your both Git-remote-repos before pushing into any of them.
In this case, Git will be responsible for conflict resolution, not git-repo-sync.

When git-repo-sync will be solving the conflicts.

You should have the both

  • You run git-repo-sync rarely. I.e. someone aready pushed commites exactly to your branch after last running of git-repo-sync.
  • And you and your teammate have pushed changes to the same Git-branch but through different remote repositories and your remote repositories are no synchronized between your Git-pushes.

Basically, you don't know about git-repo-sync until you are in this situation.

Behavior of git-repo-sync in case of Git-conflicst

git-repo-sync sees a Git-conflict and uses one of Conflict Solving strategies described below.
As a result, you should provide the below steps to fix The Trade-off.

Your steps to fix The Trade-off

The main idea is "Re-push your local Git-commit in case of a conflict".

  • Run git-repo-sync to synchronize both Git-remote-repositories (if you have no periodical auto-runs).

  • Upload changes from your remote Git-repository to your local repository.

  • Check if you local commit have lost its remote counterpart. I.e. the commit exist only in your local repository.

    • Performe Git-merge/rebase of your local commit.
    • Performe Git-push of your changes.
  • Run git-repo-sync to synchronize your changes with changes from another side Git-remote-repository (if you have no periodical auto-runs).

How do I know if there were Git-conflicts

  • Check it manually. This is described in the above steps.
  • git-repo-sync has notifications over plain text files. Ask your DevOps to distribute it.

Using On Linux

Run git-sync.sh and it will tell you what git-repo-sync needs.
In most cases you have to install gAWK. This applies to Ubuntu.
Docker Alpine Linux images require bash and gAWK to be installed.
You have to update the bash if you use an extra old Linux distro.

Using on Windows

Ha! You're lucky. You have to do nothing and have five options to run git-repo-sync.

Open PowerShell or CMD in the git-repo-sync folder and run one of three.

"C:\Program Files\Git\bin\bash.exe" git-sync.sh
"C:\Program Files\Git\usr\bin\bash.exe" git-sync.sh
"C:\Program Files\Git\git-bash.exe" git-sync.sh

Or you can reinstall Git and integrate the bash into your Windows during installation. Then run

bash  git-sync.sh

Or you can try to update the PATH environment variable. Try to add the following (that wasn't tested by me)

;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin

Do not synchronize all branches

Despite that there are fair cases when it is useful to sync all branches, this is not always a good idea.
Some well know Git-servers block some branches in different ways. Some of them create "trash"-branches which you do not want to see synchronized.

So, you can synchronize branches that have special prefixes only.
You could configure these prefixes in default_sync_project.sh configuration file.
What's important, these prefixes are related to correspondent conflict solving strategies.

Conflict Solving Strategies

The Victim Strategy

By default all branches are synced under this strategy.
You can do whatever you want with such branches from both sides (repositories).
In case of commit conflicts, any newest commit will win.
You can relocate branches to any position, delete and move them back in history if you run git-repo-sync regularly.

Use the following variable to limit branches synchronized by this strategy.

victim_branches_prefix=@

The most common value for victim_branches_prefix is "@".
In this case only branches that start with @ will be synchronized. E.g. @dev, @dev-staging, @test, @test-staging, @my-feature, etc.

The Conventional Strategy

By using this strategy you limit what your teammates may do from another side repository with branches on your side remote repository.

Branches with the following prefix will be owned by the repo from url_a variable. Let's call it A side.

side_a_conventional_branches_prefix=client-

Branches with the following prefix will be owned by the repo from url_b variable. Let's call it B side.

side_b_conventional_branches_prefix=vendor-

Other examples of prefix pairs: a-, b-; microsoft/, google/; foo-, bar-;

On the owning side repo: You can do whatever you want with such branches.

On a repo of another side:
You can do fast-forward updates and merges.
You can move such branches back in Git-history if you run git-repo-sync periodically.

All commit conflicts will be solved in favor of the owning side.

Other Unimplemented Strategies

Just propouse something interesting.
BTW, the Victim and Conventional approaches cover 80% of cases you need (I beleive).

Disaster Protection

People have to make mistakes to become better. This is normal. But let's protect our clients from such the mistakes.
Define sync_enabling_branch variable

sync_enabling_branch=it3xl_git_repo_sync_enabled

Its value may represent any branch name.
Examples: @test, client-prod, vendor-master, it3xl_git_repo_sync_enabled.

The git-repo-sync will check if such a branch exist in both remote repositories and that it has the same or related commits, i.e. its commits are located in the same Git-tree.
This will protect you from occasional adhesion of unrelated git-repositories and deletion of branches that have the same names.
Git may store many independent projects (trees) in the same repository and this is uncommon behavior for many users.

I advise to use it3xl_git_repo_sync_enabled branch name to make this explicit for others that their remote Git-repo is synchronized with another remote repo.
They could search for the word it3xl_git_repo_sync_enabled in the Internet and understand the applied sync solution.

Be aware that a branch mentioned in the sync_enabling_branch variable will be alwasy synchronized by git-repo-sync.
Probably this is not a good idea to specify here the master branch name because a branch mentioned in sync_enabling_branch will be synchronized under the Victim strategy. But you can specify there a branch with one of your conventional prefixes for the Conventional syncing of it. For example client-master.

Notes, Drawbacks & Limitations

  • Usage with SSH isn't tested but possible.
  • git-repo-sync is resilient for HTTP fails and interruptions.
  • It has protections from an occasional deletion of your entire remote repository.
  • Arbitrary Git-history rewriting is supported.
  • Within a single installation, git-repo-sync can synchronize as many pairs of Git-repositories as you want. Every sync pair is a sync project for git-repo-sync.
  • Git-tags are not synchronized.
    • Remarks why: Some Git-servers block manipulations with Git-tags. Time was saved for research and covering all possible cases. Another repo's tags created issues for Git-tag based CI/CD-s.
  • git-repo-sync doesn't attempt to do Git-merge or rebase. Just FYI.

Support Operations

Remote Repo Replacing Support

This is a real case of my customer. You may want to synchronize your existing Git-repo with a Git-repo of your new software parnter.

Option 1.
Create a new git-repo-sync project and use it (project description file or environment variables).

Option 2.
Modify your existing project. Update its description file or environment variables.
Delete git-repo-sync/sync-projects/<your-sync-project-name> directory.
Start synchronization as usual.

Option 3.
Your Git-repository is extra huge and you can't recreate it. This is a TL;DR. Ask a Git-professional for a help.

Known Issues

It is still untracked

Something went wrong for <your-branch>. It is still untracked. Possibly the program or the network were interrupted.

  • Disable your antivirus or Check Point.
  • Check if this branch is blocked by your Git-server.

Automation support

  • git-repo-sync works with remote Git repositories asynchronously, by default.
  • It works much faster under *nix OS-es because Git-bash on Windows is slower. But compare to network latency, this is nothing.
  • You can separate change detection and synchronization phases of git-repo-sync for readability of CI/CD logs.
  • Multiple configuration capabilities are supported. Environment, configuration files, combination of them.
  • Integration with bash Git Credential Helper - git-cred to obtain credentials from a parent shell environment.
  • You shouldn't do anything in case of connectivity fails. Continue to run git-repo-sync periodically and everything will be restored automatically.
  • After every synchronization, analyze notification files to send notifications about branch deletions or commit conflict solving.
    See git-repo-sync/sync-projects/<your-sync-project-name>/file-signals/
    • notify_solving - for conflict solving
    • notify_del - for deletions
  • See instructions on how to configure more synchronization pairs of remote Git repositories.
  • Number of pairs is unlimited. Every pair is a separate sync project.

git-repo-sync's People

Contributors

it3xl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

git-repo-sync's Issues

Protection from things like KDE near-disaster

This is implemented already. But it will be great to add a protection from a "slow" branch by branch deletion.
Probably we can use the Git's reflog as a recovery.

Recheck this. Think about this.

Preventing push/pull from only side_b

Hello!
First of all - I'd like to thank you for this useful tool. I appreciate it's usefulness.

I'm using it in one of my hobby projects already successfully, but I have encountered special use case, where I'm not sure if existing configuration can solve it. I suspect side_X_conventional_branches_prefix could be a solution, but I'd like to ask for your opinion about it - if I understand it correctly. Here is the scenario:

RepoA - hosted in primary location as single source of truth.
RepoB - sitting on "other side" - I would like to allow guys on other side to experiment with as many branches they want, but only some branches with known prefix are important to be stored securely also in hosted site.

I wish, that all branches from RepoA are pushed to RepoB during sync, but only prefixed branches are pushed from RepoB to RepoA.

Is this exactly what side_b_conventional_branches_prefix is for or am I misunderstanding it?

Thank you for your answer!

Docker image

It would like to create a community Docker image for git-repo-sync.

But I need your advises.

For a jenkins base image the result image will be approximately 750 MB. Really not bed.
But for the Alpine base image it will take 7 MB.

But for Alpine I need to solve the following

  • What is the best way to store your secrets? I.e. passwords from your Git-repositories.
  • How and where it will be convenient to configure git-repo-sync?
    • Should it be a plain file?
    • Or will it be a Node.js application with a Web wizard?

Interrup parallel runs.

Recheck interruptions of wrong parallel runs for the same sync project.
Protect users from mistakes and errors of their auto-run automations.

It is assumed that an automation external server is responsible for prevention of parallel runs.
But this will be useful for those who use cron like approaches to run git-repo-sync periodically.
For example in a Docker image.

Bash Credential Helper not working

I have set use_bash_git_credential_helper value to 1. However it still requires me to input the username and password. I have exported the credentials to env variables as suggested in documentation. As far as I understand setting the "use_bash_git_credential_helper=1" should not require to manually input credentials

One way sync feature

Hey!
Checked out your project and I can definitely say that it is amazing. Thanks for your work!
Also wanted to propose a feature, to enable "one way sync", in other words a mirror feature that will pull and push changes form Repo A to Repo B.
Thanks in advance for reply!

Git Large File Storage (LFS) Extension Support

Hello!

Thanks for that repo, it's almost exactly what we are looking for.

Only thing missing for us would be LFS support. Currently LFS files are not synced which means that the raw pointer files are uploaded without any related LFS files.

Would it be possible to add this / how much work would it be?

Thanks!

Fight with excessive configuring

Make git-repo-sync working with minimal or zero configuring.

Remove sync_enabling_branch config variable as its role could be fully automated.

Fast-Forward updates treated with conflict solving strategies

The git-repo-sync sometime applies conflict solving sync strategies for simple fast-forward Git-branch changes.
This creates excessive notifications (email spamming) for development teams.

This behavior now could be correct in case of re-starting after network fails.
It looks like we can identify fast-forward update situations even after network troubles.
Or it is a state interpretation bug.

There are not troubles for syncing, in any way. Everything works right.
Just a place to improve.

Define multible branch prefixes and conventional strategie only

First of all, thank you for this helpfull tool!

In our use-case we want to synchronise a local copy of a customers fairly large git. For this we only want to synchronize a few branches. But these do not have a common prefix.

Is it possible, that you define multiple branch names/prefixes?

Moreover is it possible to select the conventional conflict solving strategie only and as default? For now it seem, that you have to set a branch in victim_branches_prefix or the whole repository is synced.

Thank you in advance.

Synchronizing all branches?

I'm trying to synchronize all branches of two servers, everything including master and other branches.

Is that possible? How can this be achieved?
Currently I can only set prefixes for certain strategies, but I can't seem to be able to set and empty prefix.

Git-rebase is dangerous and inappropriate.

I just used multiple Git-rebases and something strange and phantom is happened.
Some code added and the deleted by other became my code.
It looks like I've added this code but I had no intent. It was code from commits of others.

Other behavior is that some code added by other disappeared from a project.

I looked over everything thoroughly and I realized that the trouble is multiple Git-rebases while working simultaneously with other people on the same project.

But I want to know if you use such a dangerous thing as Git-rebase in your git-repo-sync implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.