Code Monkey home page Code Monkey logo

airbyte_serverless's Introduction

Hi there ๐Ÿ‘‹, I am Paul, an open-source Data-Product builder ๐Ÿ˜ƒ



From Head of Data to open-source Data-Product builder

As Head of Data at Nickel, I scaled data-organization from 3 to 100+ data-practitioners.

My vision is to give data-power to data-analysts by making them autonomous on the whole data-chain: from data-collection to data-algorithm-deployments
๐Ÿ‘‰ we build data products for that.

I created Unytics to go further as a personal project
๐Ÿ‘‰ to provide open-source data products to the worldwide data-community. ๐Ÿš€

logo (2)

airbyte_serverless's People

Contributors

maelstorm19 avatar unytics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

airbyte_serverless's Issues

Streams can't be defined

When using it with the following config I have TypeError: 'in <string>' requires string as left operand, not dict

source:
  docker_image: "airbyte/source-github" # GENERATED | string | A Public Docker Airbyte Source. Example: `airbyte/source-faker:0.1.4`. (see connectors list at: "https://hub.docker.com/search?q=airbyte%2Fsource-" )
  config: # PREGENERATED | object | PLEASE UPDATE this pre-generated config by following the documentation https://docs.airbyte.com/integrations/sources/github
    credentials:
      personal_access_token: ${GITHUB_TOKEN} # SECRET (please store in environment variables) | REQUIRED | string | Log into GitHub and then generate a &lt;a href=&#34;https://github.com/settings/tokens&#34;&gt;personal access token&lt;/a&gt;. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with &#34;,&#34;
    repositories: 
      - "ORG/REPO"
    start_date: "2023-10-09T00:00:00Z" 
  streams: workflows_run
destination:
  connector: "print"
  config:
    buffer_size_max: 100
remote_runner:
  type: "cloud_run_job" # GENERATED | string | Runner Type. Must be one of ['direct', 'cloud_run_job']
  config: # PREGENERATED | object | PLEASE UPDATE this pre-generated config
    project:  # REQUIRED | string | GCP Project where cloud run job will be deployed
    region: "europe-west1" # REQUIRED | string | Region where cloud run job will be deployed
    service_account: "" # OPTIONAL | string | Service account email used bu Cloud Run Job. If empty default compute service account will be used

If you don't set streams it works, but requires to retrieve all data

The catalog file is not written in full by the time the connector starts

Hi!

We're using airbyte-serverless in the Pathway framework as a connector to airbyte sources.
Recently we've run into an issue with the internally serialized catalog file not being JSON-readable. We're using an airbyte github connector, but it doesn't seem an important detail.

I've analyzed the stack trace we've got and found a suspicious place there:

ValueError: Could not read json file /mnt/temp/catalog.json: Expecting ':' delimiter: line 1 column 8192 (char 8191).

So, what happens is that the code tries to read the catalog file which is created here with the usage of json.dump, but stumbles on the character 8192 (out of ~65K chars - I did output it locally to estimate the size we should have) which looks like the end of a filesystem block/chunk.

My guess for the reason is the fact that the opened file is not closed straight away, hence leaving some random amount of time for the file not to be fully written, which results in the airbyte connector's docker image starting before this is done in some rare unlucky cases.

If so, the explicit close/context manager usage should help here. Could you please look into the issue and confirm or reject my assumptions? I can send a PR with the supposed fix if that helps.

Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.