Code Monkey home page Code Monkey logo

embulk-input-salesforce_bulk's Introduction

Salesforce Bulk input plugin for Embulk

Salesforce Bulk API の一括クエリ結果を取得します。

Overview

  • Plugin type: input
  • Resume supported: no
  • Cleanup supported: no
  • Guess supported: no

Configuration

  • userName: Salesforce user name.(string, required)
  • password: Salesforce password.(string, required)
    • Set a string that concatenated the password and security token.
  • authEndpointUrl: Salesforce login endpoint URL.(string, default is "https://login.salesforce.com/services/Soap/u/39.0")
  • objectType: object type of JobInfo.(string, required)
    • Usually same as query's object.(If querySelectFrom is (snip) FROM xxx (snip) then dataType is xxx)
  • pollingIntervalMillisecond: polling interval millisecond.(string, default is 30000)
  • querySelectFrom: part of query. SELECT and FROM.(string, required)
  • queryWhere: part of query. WHERE.(string, default is "")
  • queryOrder: part of query. ORDER BY.(string, default is "")
  • columns: schema config.(SchemaConfig, required)
  • startRowMarkerName: 開始レコードを特定するための目印とするカラム名を指定する.(String, default is null)
  • start_row_marker: 抽出条件に、『カラム「startRowMarkerName」がこの値よりも大きい』を追加する.(string, default is null)
  • queryAll: if true, uses the queryAll operation so that deleted records are returned.(boolean, default is false)

More information about objectType:

objectType is field of JobInfo. See: JobInfo | Bulk API Developer Guide | Salesforce Developers

These documents will aid in understanding.

Example

query で指定したものをすべて抽出

in:
  type: salesforce_bulk
  userName: USER_NAME
  password: PASSWORD
  authEndpointUrl: https://login.salesforce.com/services/Soap/u/39.0
  objectType: Account
  pollingIntervalMillisecond: 5000
  querySelectFrom: SELECT Id,Name,LastModifiedDate FROM Account
  queryWhere: Name like 'Test%'
  queryOrder: Name desc
  columns:
  - {type: string, name: Id}
  - {type: string, name: Name}
  - {type: timestamp, name: LastModifiedDate, format: '%FT%T.%L%Z'}

前回取得時点から変更があったオブジェクトのみ取得

startRowMarkerName に LastModifiedDate を指定したうえで、 -o オプションを指定して embulk を実行する。

config.yaml

in:
  type: salesforce_bulk
  userName: USER_NAME
  password: PASSWORD
  authEndpointUrl: https://login.salesforce.com/services/Soap/u/39.0
  objectType: Account
  pollingIntervalMillisecond: 5000
  querySelectFrom: SELECT Id,Name,LastModifiedDate FROM Account
  queryOrder: Name desc
  columns:
  - {type: string, name: Id}
  - {type: string, name: Name}
  - {type: timestamp, name: LastModifiedDate, format: '%FT%T.%L%Z'}
  startRowMarkerName: LastModifiedDate

実行コマンド

embulk run config.yaml -o config.yaml

TODO

  • エラーログ出力を真面目にやる
  • guess 対応
  • 効率化

Build

$ ./gradlew gem

embulk-input-salesforce_bulk's People

Contributors

irotoris avatar mikoto2000 avatar yuuna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

embulk-input-salesforce_bulk's Issues

Convert date to timestamp when null throws an error

We have a custom date field on the Account object. Some of the values are null.

  • If I try to get the column out with - { type: timestamp, name: Contract_Start_Date__c, format: '%Y-%m-%d' }, I get an error: org.embulk.exec.PartialExecutionException: org.jruby.exceptions.RaiseException: (TypeError) can't dup NilClass
  • If I get the column out with the same column conversion/definition, but filtering out NULL values in my SOQL query, the conversion and ETL succeeds.
  • If I try to get the column out as a string (no date conversion), the ETL succeeds.

So, I believe there is a conversion error in the code, for timestamps that are NULL.

Full stack trace here: org.embulk.exec.PartialExecutionException: org.jruby.exceptions.RaiseException: (TypeError) can't dup NilClass at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(org/embulk/exec/BulkLoader.java:373) at org.embulk.exec.BulkLoader.doRun(org/embulk/exec/BulkLoader.java:591) at org.embulk.exec.BulkLoader.access$000(org/embulk/exec/BulkLoader.java:33) at org.embulk.exec.BulkLoader$1.run(org/embulk/exec/BulkLoader.java:389) at org.embulk.exec.BulkLoader$1.run(org/embulk/exec/BulkLoader.java:385) at org.embulk.spi.Exec.doWith(org/embulk/spi/Exec.java:25) at org.embulk.exec.BulkLoader.run(org/embulk/exec/BulkLoader.java:385) at org.embulk.EmbulkEmbed.run(org/embulk/EmbulkEmbed.java:180) at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:453) at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:314) at RUBY.run(uri:classloader:/embulk/runner.rb:84) at RUBY.run(uri:classloader:/embulk/command/embulk_run.rb:307) at RUBY.<main>(uri:classloader:/embulk/command/embulk_main.rb:2) at org.jruby.Ruby.runInterpreter(org/jruby/Ruby.java:850) at org.jruby.Ruby.loadFile(org/jruby/Ruby.java:2976) at org.jruby.RubyKernel.requireCommon(org/jruby/RubyKernel.java:963) at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:956) at org.jruby.RubyKernel$INVOKER$s$1$0$require19.call(org/jruby/RubyKernel$INVOKER$s$1$0$require19.gen) at Users.bcipolli.code.insights.etl.embulk.bin.embulk.embulk.command.embulk_bundle.invokeOther35:require(Users/bcipolli/code/insights/etl/embulk/bin/embulk/embulk/command/file:/Users/bcipolli/code/insights/etl/embulk/bin/embulk!/embulk/command/embulk_bundle.rb:30) at Users.bcipolli.code.insights.etl.embulk.bin.embulk.embulk.command.embulk_bundle.<main>(file:/Users/bcipolli/code/insights/etl/embulk/bin/embulk!/embulk/command/embulk_bundle.rb:30) at java.lang.invoke.MethodHandle.invokeWithArguments(java/lang/invoke/MethodHandle.java:627) at org.jruby.Ruby.runScript(org/jruby/Ruby.java:834) at org.jruby.Ruby.runNormally(org/jruby/Ruby.java:749) at org.jruby.Ruby.runNormally(org/jruby/Ruby.java:767) at org.jruby.Ruby.runFromMain(org/jruby/Ruby.java:580) at org.jruby.Main.doRunFromMain(org/jruby/Main.java:425) at org.jruby.Main.internalRun(org/jruby/Main.java:313) at org.jruby.Main.run(org/jruby/Main.java:242) at org.jruby.Main.main(org/jruby/Main.java:204) at org.embulk.cli.Main.main(org/embulk/cli/Main.java:23) Caused by: org.jruby.exceptions.RaiseException: (TypeError) can't dup NilClass at org.jruby.RubyKernel.dup(org/jruby/RubyKernel.java:1884) at RUBY._strptime(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/date/format.rb:379) at RUBY.strptimeUsec(uri:classloader:/embulk/java/time_helper.rb:27)

embulk failing using http proxy server in linux using -J-Dhttps.proxyHost

Hello,
I am trying to run embulk using proxy to extract data from salesforce. And it looks like the proxy settings are not being used by embulk. Can you please take a look and advise?

I checked with embulk core... they advised this is a plugin issue.
Issue submitted to embulk core

Just a note: this config works well in my laptop. Trying to make it work in our production server

Command:
embulk -J-Dhttps.proxyHost=xx.xxx.xx.xx -J-Dhttps.proxyPort=80 preview salesforce.yml

Output:
2022-06-20 14:27:34.410 -0500: Embulk v0.9.24
2022-06-20 14:27:35.638 -0500 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.
2022-06-20 14:27:38.113 -0500 [INFO] (main): Gem's home and path are set by default: "/home/abcd/.embulk/lib/gems"
2022-06-20 14:27:39.030 -0500 [INFO] (main): Started Embulk v0.9.24
2022-06-20 14:27:39.153 -0500 [INFO] (0001:transaction): Loaded plugin embulk-input-salesforce_bulk (0.2.3)
2022-06-20 14:27:39.190 -0500 [INFO] (0001:transaction): Using local thread executor with max_threads=2 / tasks=1
2022-06-20 14:27:39.196 -0500 [INFO] (0001:transaction): {done: 0 / 1, running: 0}
2022-06-20 14:27:39.272 -0500 [INFO] (0076:task-0000): Try login to 'https://xxxxx.my.salesforce.com/services/Soap/u/39.0'.
2022-06-20 14:27:39.867 -0500 [ERROR] (0076:task-0000): class com.sforce.ws.ConnectionException
com.sforce.ws.ConnectionException: Failed to send request to https://xxxxx.my.salesforce.com/services/Soap/u/39.0
at com.sforce.ws.transport.SoapConnection.send(SoapConnection.java:121) ~[na:na]

Not working on Embulk v0.8.15?

I am receiving an error:
Caused by: java.lang.IllegalStateException: Optional.get() cannot be called on an absent value
image

My yaml file is as follows (values get replaced in a string.format() call). I am confident that the formatting of the YAML is good, and that the output connector works (I use the same code for mongodb/mysql inputs).

Any help would be appreciated!

in:
  type: salesforce_bulk
  userName: {SALESFORCE_USERNAME}
  password: "{SALESFORCE_PASSWORD}"
  authEndpointUrl: https://login.salesforce.com/services/Soap/u/34.0
  objectType: Account
  pollingIntervalMillisecond: 5000
  querySelectFrom: SELECT Id,Name,LastModifiedDate FROM Account
  queryWhere: Name like '%a%'
  queryOrder: Name desc
  columns:
  - {{type: string, name: Id}}
  - {{type: string, name: Name}}
  - {{type: timestamp, name: LastModifiedDate, format: '%FT%T.%L%Z'}}

out:
  type: redshift
  host: {REDSHIFT_HOST}
  port: {REDSHIFT_PORT}
  user: {REDSHIFT_USERNAME}
  password: "{REDSHIFT_PASSWORD}"
  database: {REDSHIFT_DATABASE}
  schema: {REDSHIFT_SCHEMA}
  table: {table_name}
  access_key_id: {AWS_ACCESS_KEY}
  secret_access_key: {AWS_SECRET_ACCESS_KEY}
  iam_user_name: {AWS_IAM_USERNAME}
  s3_bucket: insights-etl
  s3_key_prefix: temp/redshift
  mode: replace

Exit 0 when Salesforce API Error

Hello, we are considering to use this plugin in our data infrastructure.

Trying to use this plugin, we found that our Embulk task continue to run, even if we failed to login to our Salesforce account.

as @irotoris mentioned in this PR, it's because this plugin catches the error, and doesn't propagate it.
#13

I think PR above is sufficient to fix this issue, and we really appreciate if you notice and make any response. Thank you.

ColumnVisitorImpl 内でのオブジェクト生成回数を削減する

下記コメントを頂いたので修正する。

https://twitter.com/frsyuki/status/631540354051801088

ColumnVisitorImpl を毎回 new するのは気にしなくてよいらしい。
ColumnVisitorImpl 内でのオブジェクト生成が問題とのこと。
new しない実装としてここを真似すればよいのかな?

https://github.com/embulk/embulk/blob/master/embulk-standards/src/main/java/org/embulk/standards/CsvParserPlugin.java#L265

Timestamp のパースに spi.util.Timestamps を使用するように修正

下記コメントを頂いたので修正。

FURUHASHI SadayukiさんはTwitterを使っています: "@mikoto2000 ここは:https://t.co/vOWwYzWzxI spi.utill.Timestampsを使えばいいです:https://t.co/OIq4K4eVyq"

この修正をおこなうと、ColumnConfig に渡すフォーマット文字列に、
ruby の DateFormat 文字列を渡すことになるので README の修正も忘れずに。
他プラグインの指定方法と同じになるのもうれしい。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.