Code Monkey home page Code Monkey logo

beam-mysql-connector's Introduction

Beam - MySQL Connector

PyPI version PyPI - Python Version License: MIT

Beam - MySQL Connector is an io connector of Apache Beam to access MySQL databases.

Installation

pip install beam-mysql-connector

Getting Started

  • Read From MySQL
from beam_mysql.connector import splitters
from beam_mysql.connector.io import ReadFromMySQL


read_from_mysql = ReadFromMySQL(
        query="SELECT * FROM test_db.tests;",
        host="localhost",
        database="test_db",
        user="test",
        password="test",
        port=3306,
        splitter=splitters.NoSplitter()  # you can select how to split query for performance
)
  • Write To MySQL
from beam_mysql.connector.io import WriteToMySQL


write_to_mysql = WriteToMySQL(
        host="localhost",
        database="test_db",
        table="tests",
        user="test",
        password="test",
        port=3306,
        batch_size=1000,
)

License

MIT License. Please refer to the LICENSE.txt, for further details.

beam-mysql-connector's People

Contributors

dependabot[bot] avatar esakik avatar igera97 avatar mkdotam avatar nlenepveu avatar satokiyo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

beam-mysql-connector's Issues

AttributeError: 'NoSplitter' object has no attribute 'source'

I still getting 'NoiSplitter' object has no attribute 'source', even on version 1.8.2.
In order to get it running, I had to do a makeshift change in the code of splitter.py:

class NoSplitter(BaseSplitter):
    """No split bounded source so not work parallel."""

    def estimate_size(self):
        # return self.source.client.rough_counts_estimator(self.source.query)
        return 0x100000 # may be any number

Unable to write null values into DB

None values are converted into the string 'None' instead of NULL.
The bug can be fixed using this patch:

--- a/beam_mysql/connector/io.py
+++ b/beam_mysql/connector/io.py
@@ -115,7 +115,8 @@ class _WriteToMySQLFn(beam.DoFn):
             values.append(value)

         column_str = ", ".join(columns)
-        value_str = ", ".join([f"{value}" if isinstance(value, (int, float)) else f"'{value}'" for value in values])
+        value_str = ", ".join([f"{'NULL' if value is None else value}" if isinstance(value, (type(None), int, float))
+                               else f"'{value}'" for value in values])
         query = f"INSERT INTO {self._config['database']}.{self._table}({column_str}) VALUES({value_str});"

         self._queries.append(query)
--

MySQLInterfaceError

Hello! I just ran the example code in beam-mysql-connector/examples/write_records_pipeline.py on my computer in order to see if I can write into MariaDB table. It returned an error: _mysql_connector.MySQLInterfaceError: Commands out of sync; you can't run this command now [while running 'WriteToMySQL/ParDo(_WriteToMySQLFn)'].

However, if I run the same code with only one row in beam.Create([...]), like beam.Create([{"name": "test data3"}]), it will work just fine.

Is it possible to avoid somehow this type of error using beam-mysql-connector and load multiple rows?

Bulk Insert to MySql

Hi,
I am using the connector to write to MySql using batch size as 1000,but I get error suggesting to use bulk Insert with cmd_insert option.
Could you please share few examples with bulk Insert?The example provided uses insert option with batch size as 0.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.