Code Monkey home page Code Monkey logo

objinsync's People

Contributors

allebacco avatar gardnerdev avatar houqp avatar jackdanger avatar kangman avatar liorrozen avatar matthewoflynn avatar neodino avatar ron-damon avatar rtyler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

objinsync's Issues

Add --exclude flag to pull command similar to rsync

There are cases when it could be beneficial to sync multiple directories without deleting certain contents that already exist in the target location. For example, I have a lib/ directory with custom operators, hooks, etc that needs to be synced in addition to my dags/ folder and both are under the following location: s3://<bucket>/airflow/. Currently, if I were to run the pull command: objinsync pull s3://<bucket>/airflow/ ${AIRFLOW_HOME}/ I would delete other important files such as my airflow.cfg file. Therefore it could be nice to implement an --exclude flag that would preserve files or directories in the target location so the pull command could be used to directories with other files present.

Final command might look something like this:
objinsync pull s3://<bucket>/airflow/ ${AIRFLOW_HOME}/ --exclude "airflow.cfg" --exclude "some_dir/*"

Object not synced if you remove and add the same file in the bucket

In the following scenario, object FOO doesn't sync properly:

  1. ๐ŸŸข FOO object is uploaded to the bucket -> objinsync add them
  2. ๐ŸŸข FOO object is deleted from the bucket -> objinsync remove them
  3. ๐Ÿ”ด The same FOO object is uploaded again to the bucket -> objinsync doesn't add them
    If I restart objinsync after this step, it will sync properly.
  4. ๐ŸŸข FOO object is uploaded with a modified content -> objinsync add them

I think it's because of the uidCache that is not purged when a file is deleted.
Maybe adding a delete in this for works:

		for f, _ := range self.filesToDelete {
			os.Remove(f)
			uidKey, err := uidKeyFromLocalPath(self.LocalDir, f)
			if err == nil {
				self.uidLock.Lock()
				delete(self.uidCache, uidKey)
				self.uidLock.Unlock()
			}
		}

Disclaimer: I'm not a Go expert

airflow 2 deploying info

hi
we trying to deploy the objinsync solution with airflow 2, using this chart
https://artifacthub.io/packages/helm/airflow-helm/airflow

but with no success
we added the env varibels accoring to https://tech.scribd.com/blog/2020/breaking-up-the-dag-repo.html

and even i changed the chart "git sync" part

## the git-sync container image ## image: repository: ghcr.io/scribd/objinsync #k8s.gcr.io/git-sync/git-sync tag: latest

repo: "s3://airflow-stg-s3-log-bucket/airflow_home/dags" #"[email protected]:lusha_team/airflow-dags.git" branch: "dummy"

can we have more instruction how to do it ?

any idea's?

thanks

When I run two objinsync I get an error

Hi,
That tool is wonderfull! Thank you. Sorry to be the first to open an issue but I can't run two instances from systemd at the same time

/opt/objinsync pull --disable-ssl -i 120s  s3://airflow-us-east-1/datawarehouse/profiles /opt/airflow/images/dbt/profiles
/opt/objinsync pull --disable-ssl -i 120s  s3://airflow-us-east-1/datawarehouse/dags /opt/airflow/images/dbt/dags
{"level":"info","ts":1629897460.9748888,"caller":"objinsync/main.go:87","msg":"SENTRY_DSN not found, sentry integration disabled."}
{"level":"info","ts":1629897460.9751706,"caller":"objinsync/main.go:147","msg":"Serving health check endpoints at: :8087."}
{"level":"info","ts":1629897460.9752388,"caller":"objinsync/main.go:148","msg":"Pulling from s3://airflow-us-east-1.rifiniti.com/datawarehouse/profiles to /opt/airflow/images/dbt/profiles every 2m0s..."}
{"level":"info","ts":1629897460.9752793,"caller":"objinsync/main.go:126","msg":"Pull started."}
2021/08/25 13:17:40 listen tcp :8087: bind: address already in use

Can you help?

Pulled files are empty

Hi! Thanks for this tool! It looks like its exactly what I need! I'm having some issues using it though:

Any file I pull is created as an empty (0 byte) file.

I have an s3 bucket called my-bucket which holds two files: foo.txt and image.png.
When I run a pull command the files get created but they are both empty:

root@gosand:/tmp# ls -l /tmp
total 0
root@gosand:/tmp# objinsync pull --once s3://my-bucket/git /tmp
{"level":"info","ts":1595423591.3152442,"caller":"objinsync/main.go:86","msg":"SENTRY_DSN not found, sentry integration disabled."}
{"level":"info","ts":1595423591.3153822,"caller":"objinsync/main.go:141","msg":"Pulling from s3://my-bucket/git to /tmp..."}
{"level":"info","ts":1595423591.3154054,"caller":"objinsync/main.go:125","msg":"Pull started."}
{"level":"info","ts":1595423591.317076,"caller":"sync/pull.go:357","msg":"Listing objects","bucket":"my-bucket","dirpath":"git"}
{"level":"info","ts":1595423591.393758,"caller":"sync/pull.go:203","msg":"Object list page contains 3 objects."}
{"level":"info","ts":1595423591.423097,"caller":"objinsync/main.go:137","msg":"Pull finished in 107.686584ms seconds."}
root@gosand:/tmp# ls -l /tmp
total 0
-rw-r--r-- 1 root root 0 Jul 22 13:13 foo.txt
-rw-r--r-- 1 root root 0 Jul 22 13:13 image.png
  • I'm running objinsync on a container based off golang:1.14.
  • I've tried using a prebuilt binary (v2.3.0) and also using go get.
  • The container is running on an EKS cluster (kubernetes 1.16)
  • It receives permissions to access this bucket with the following IAM policy:
{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Sid": "VisualEditor0",
           "Effect": "Allow",
           "Action": [
               "s3:GetObject",
               "s3:ListBucket"
           ],
           "Resource": [
               "arn:aws:s3:::my-bucket",
               "arn:aws:s3:::my-bucket/git"
           ]
       }
   ]
}

Is there any other info I can provide to help debug this?

Not able to make this work with minio!

Hi Team,

I am trying to make this utility work with minio and having some issues. Any help or pointers to resolve this is highly appreciated.

I have installed the minio via docker on the same server where my airflow is installed. I am able to access minio properly via the publicip:9001. I then created a bucket by the name dags and able to upload files.

Now I want to sync the dags volume of the airflow container with the minio. For test purposes I am trying to run it with the following arguments but always getting error. I am having hardtime uderstanding how to fix this.

objinsync pull --once --disable-ssl --s3-endpoint http://localhost:9000 s3://dags/*.* ~/Projects/airflow/dags

{"level":"info","ts":1657540886.0054412,"caller":"objinsync/main.go:87","msg":"SENTRY_DSN not found, sentry integration disabled."}
{"level":"info","ts":1657540886.009082,"caller":"objinsync/main.go:142","msg":"Pulling from s3://dags/*.* to /home/lyridadmin/Projects/airflow/dags..."}
{"level":"info","ts":1657540886.0091188,"caller":"objinsync/main.go:126","msg":"Pull started."}
ERROR: failed to pull objects from remote store: Failed to detect AWS region: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Tried with this as well - but getting the same error as above.
objinsync pull --once --disable-ssl --s3-endpoint http://publicip:9000 s3://dags/*.* ~/Projects/airflow/dags

Edit:
Also, how do I put the details of the accessKey and SecretKey of the Minio?

Perform atomic writes when pulling files

Iโ€™m running into some issues where the objinsync causes imports not to be found intermittently. I believe the reason for this that the replace conflicts with the scheduler parsing loop sometimes thus causing import not found expectations.

Publish to Docker Registry with GitHub workflow

Thanks for the tool, it's really useful!

It would be great to publish to docker Registry using the existent GitHub workflow, tagging the image with the specific release version v#.#.# and latest.

Configure the pull interval

Is it possible to configure the interval between synchronizations with the S3 buckets?

From the code, I have seen that it is 5 seconds, but, since I have never used the go language, I don't know how it can be configured using CLI like for the --once or --exclude arguments.

I would like to be able to write:

objinsync pull --interval 60 s3://bucket/keyprefix ./localdir

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.