Code Monkey home page Code Monkey logo

datax's People

Contributors

asdf2014 avatar binaryworld avatar caoliang-web avatar carvinhappy avatar cch1996 avatar dependabot[bot] avatar dingbo8128 avatar dingxiaobo avatar fuyouj avatar hantmac avatar heljoyliu avatar hf200012 avatar hffariel avatar hsbcjone avatar huolibo avatar jtchen-study avatar lsbnbdz avatar luckypicklezz avatar mr-kidbk avatar penglin358 avatar sanchouisacat avatar sangshuduo avatar stephenkgu avatar trafalgarluo avatar wenshao avatar wuchase avatar xudaojie avatar yifanzheng avatar yuzhiping avatar zyyang90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

datax's Issues

DataX /tdengine30writer BUG

Describe the bug 描述你遇到的问题

通过datax把mongodb中多个集合数据导入到tdengine中的同一个超级表时报OOM,可复现的现象是导入第一个mongodb集合成功,第二个集合开始就失败报OOM

使用的数据库和datax版本
Mongodb:4.0.3
Tdengine:3.0.5.0
Datax:mongodbreader,tdengine30writer

To Reproduce 如何重现问题
1:tdengine新建数据库
2:mongo中有待迁移集合N个,每个集合上亿条数据
3:当tdengine待迁移的超级表中无数据时,迁移任意一个mongo集合到tdengine中都可以成功
4:当待迁移tdengine库的超级表中已有上亿条数据后,再通过datax迁移mongodb任意一个集合(包括之前迁移成功的集合)数据时datax发生OOM

问题排查过程
1:调大datax内存到6G,一样发生OOM
2:排除datax配置问题
3:datax发生OOM期间tdengine数据库cpu高启,源端mongodb无导出流量显示,判断为在mongodb数据导出前datax发生的OOM
系统监控截图
image
追踪hprof文件后,定位到datax问题源码的截图:
image
直接在tdengine中执行sql,复现了一样的问题,判断是datax把所有tagid给加载到了datax的内存中,导致OOM
image
Expected behavior 期待修复的效果
不是很确定为什么datax需要执行下面的代码,感觉意义不大,是否可以屏蔽掉或者只抓取每个子表tagid就行了,不需要加载具体明细tagid数据

两个taos3.0之前迁移,都是本地的,用的是datax3.0,脏数据导致数据同步失败

对应错误

2023-06-08 17:45:35.245 [job-0] INFO StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 0.00%
2023-06-08 17:45:45.248 [job-0] INFO StandAloneJobContainerCommunicator - Total 10000 records, 256450 bytes | Speed 25.04KB/s, 1000 records/s | Error 2000 records, 51270 bytes | All Task WaitWriterTime 0.006s | All Task WaitReaderTime 0.338s | Percentage 0.00%
2023-06-08 17:45:45.249 [job-0] ERROR JobContainer - 运行scheduler 模式[standalone]出错.
2023-06-08 17:45:45.249 [job-0] ERROR JobContainer - Exception when job run
com.alibaba.datax.common.exception.DataXException: Code:[Framework-14], Description:[DataX传输脏数据超过用户预期,该错误通常是由于源端数据存在较多业务脏数据导致,请仔细检查DataX汇报的脏数据日志信息, 或者您可以适当调大脏数据阈值 .]. - 脏数据条数检查不通过,限制是[0]条,但实际上捕获了[2000]条.
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30) ~[datax-common-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.util.ErrorRecordChecker.checkRecordLimit(ErrorRecordChecker.java:58) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.scheduler.AbstractScheduler.schedule(AbstractScheduler.java:89) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.JobContainer.schedule(JobContainer.java:535) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:119) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.start(Engine.java:93) [datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.entry(Engine.java:175) [datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.main(Engine.java:208) [datax-core-0.0.1-SNAPSHOT.jar:na]
2023-06-08 17:45:45.250 [job-0] INFO StandAloneJobContainerCommunicator - Total 10000 records, 256450 bytes | Speed 250.44KB/s, 10000 records/s | Error 2000 records, 51270 bytes | All Task WaitWriterTime 0.006s | All Task WaitReaderTime 0.338s | Percentage 0.00%
2023-06-08 17:45:45.252 [job-0] ERROR Engine -

经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Framework-14], Description:[DataX传输脏数据超过用户预期,该错误通常是由于源端数据存在较多业务脏数据导致,请仔细检查DataX汇报的脏数据日志信息, 或者您可以适当调大脏数据阈值 .]. - 脏数据条数检查不通过,限制是[0]条,但实际上捕获了[2000]条.
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30)
at com.alibaba.datax.core.util.ErrorRecordChecker.checkRecordLimit(ErrorRecordChecker.java:58)
at com.alibaba.datax.core.job.scheduler.AbstractScheduler.schedule(AbstractScheduler.java:89)
at com.alibaba.datax.core.job.JobContainer.schedule(JobContainer.java:535)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:119)
at com.alibaba.datax.core.Engine.start(Engine.java:93)
at com.alibaba.datax.core.Engine.entry(Engine.java:175)
at com.alibaba.datax.core.Engine.main(Engine.java:208)

对应job.json

{
"job":{
"content":[
{
"reader":{
"name":"tdengine30reader",
"parameter":{
"username":"root",
"password":"taosdata",
"connection":[
{
"table":[
"weather"
],
"jdbcUrl":[
"jdbc:TAOS-RS://127.0.0.1:6041/wanyanjun?timestampFormat=TIMESTAMP"
]
}
],
"column":[
"ts",
"temperature",
"humidity",
"location",
"groupid"
]
}
},
"writer":{
"name":"tdengine30writer",
"parameter":{
"username":"root",
"password":"taosdata",
"column":[
"ts",
"temperature",
"humidity",
"location",
"groupid"
],
"connection":[
{
"table":[
"weather"
],
"jdbcUrl":"jdbc:TAOS-RS://192.168.3.212:6041/wanyanjun"
}
],
"encoding":"UTF-8",
"batchSize":1000,
"ignoreTagsUnmatched":true
}
}
}
],
"setting":{
"speed":{
"channel":5
},
"errorLimit":{
"record":0
}
}
}
}

tdenginewriter写入从cassandra读入的json字符串失败

相关错误日志如下:
ERROR convert nchar string to UCS4_LE failed:{"carNumber":"","image":"","outParkingSpace":1638927251000,"parkingSpaceStatus":0}
02/20 18:21:18.947000 00150464 TSC ERROR 0xf bind column 4: type mismatch or invalid
02/20 18:21:18.947000 00150464 JNI ERROR jobj:00000069469FE648, conn:000001CC512F7AA0, code:Invalid operation

编译tdenginereader报错

[INFO] --------------------------------[ jar ]---------------------------------
Downloading from central: https://maven.aliyun.com/repository/central/com/alibaba/datax/tdenginewriter/tdenginewriter/0.0.1-SNAPSHOT/maven-metadata.xml
Downloading from central: https://maven.aliyun.com/repository/central/com/alibaba/datax/tdenginewriter/tdenginewriter/0.0.1-SNAPSHOT/tdenginewriter-0.0.1-SNAPSHOT.pom
[WARNING] The POM for com.alibaba.datax.tdenginewriter:tdenginewriter:jar:0.0.1-SNAPSHOT is missing, no dependency information available
Downloading from central: https://maven.aliyun.com/repository/central/com/alibaba/datax/tdenginewriter/tdenginewriter/0.0.1-SNAPSHOT/tdenginewriter-0.0.1-SNAPSHOT.jar
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Skipping datax-all
[INFO] This project has been banned from the build due to previous failures.
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] datax-all 0.0.1-SNAPSHOT ........................... SUCCESS [ 0.100 s]
[INFO] datax-common ....................................... SUCCESS [ 2.106 s]
[INFO] datax-transformer .................................. SUCCESS [ 0.886 s]
[INFO] datax-core ......................................... SUCCESS [ 2.224 s]
[INFO] plugin-rdbms-util .................................. SUCCESS [ 0.387 s]
[INFO] tdenginereader ..................................... FAILURE [ 0.861 s]
[INFO] plugin-unstructured-storage-util 0.0.1-SNAPSHOT .... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.752 s
[INFO] Finished at: 2022-07-21T15:28:25+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project tdenginereader: Could not resolve dependencies for project com.alibaba.datax:tdenginereader:jar:0.0.1-SNAPSHOT: Could not find artifact com.alibaba.datax.tdenginewriter:tdenginewriter:jar:0.0.1-SNAPSHOT in central (https://maven.aliyun.com/repository/central) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :tdenginereader

从tdengine迁移到tdengine报错

java.lang.UnsatisfiedLinkError: Native Library C:\Windows\System32\taos.dll already loaded in another classloader
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1900) ~[na:1.8.0_261]
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1850) ~[na:1.8.0_261]
at java.lang.Runtime.loadLibrary0(Runtime.java:871) ~[na:1.8.0_261]
at java.lang.System.loadLibrary(System.java:1122) ~[na:1.8.0_261]
at com.taosdata.jdbc.TSDBJNIConnector.(TSDBJNIConnector.java:28) ~[taos-jdbcdriver-2.0.42.jar:na]
at com.taosdata.jdbc.TSDBDriver.connect(TSDBDriver.java:162) ~[taos-jdbcdriver-2.0.42.jar:na]
at java.sql.DriverManager.getConnection(DriverManager.java:664) ~[na:1.8.0_261]
at java.sql.DriverManager.getConnection(DriverManager.java:247) ~[na:1.8.0_261]
at com.alibaba.datax.plugin.writer.tdengine20writer.DefaultDataHandler.handle(DefaultDataHandler.java:77) ~[tdengine20writer-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.tdengine20writer.TDengineWriter$Task.startWrite(TDengineWriter.java:109) ~[tdengine20writer-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:56) ~[datax-core-0.0.1-SNAPSHOT.jar:na]

taos-jdbcdriver-2.0.39解析时间类型

微信图片_20220928175703

Exception in thread "main" java.lang.NoSuchMethodError: com.alibaba.fastjson.JSONArray.getTimestamp(I)Ljava/lang/Object;
at com.taosdata.jdbc.rs.RestfulResultSet.parseTimestampColumnData(RestfulResultSet.java:255)
at com.taosdata.jdbc.rs.RestfulResultSet.parseColumnData(RestfulResultSet.java:183)
at com.taosdata.jdbc.rs.RestfulResultSet.(RestfulResultSet.java:98)
at com.taosdata.jdbc.rs.RestfulStatement.execute(RestfulStatement.java:88)
at com.taosdata.jdbc.rs.RestfulStatement.executeQuery(RestfulStatement.java:37)
at Test.main(Test.java:16)

在同步时遇到了一个报错没有找到原因。

com.alibaba.datax.common.exception.DataXException: Code:[TDengineWriter-02], Description:[runtime exception]. - cannot find col: ts in columns: [ts, i_a, i_b, i_c, i_sum, elc, u_a, u_b, u_c, power, corp_id, equipid, line_id]
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30) ~[datax-common-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.tdenginewriter.DefaultDataHandler.indexOf(DefaultDataHandler.java:552) [tdenginewriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.tdenginewriter.DefaultDataHandler.writeBatchToSupTableBySchemaless(DefaultDataHandler.java:317) [tdenginewriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.tdenginewriter.DefaultDataHandler.writeBatch(DefaultDataHandler.java:158) [tdenginewriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.tdenginewriter.DefaultDataHandler.writeEachRow(DefaultDataHandler.java:129) [tdenginewriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.tdenginewriter.DefaultDataHandler.handle(DefaultDataHandler.java:96) [tdenginewriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.tdenginewriter.TDengineWriter$Task.startWrite(TDengineWriter.java:109) [tdenginewriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:56) [datax-core-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:750) [na:1.8.0_345]
脚本如下:
{
"content":[
{
"reader":{
"name":"tdenginereader",
"parameter":{
"beginDateTime":"2022-10-25 17:47:31",
"column":[
"ts",
"i_a",
"i_b",
"i_c",
"i_sum",
"elc",
"u_a",
"u_b",
"u_c",
"power",
"corpid",
"equipid",
"lineid"
],
"connection":[
{
"jdbcUrl":[
"jdbc:TAOS-RS://xxxxxxx:6041/gkjk?timestampFormat=TIMESTAMP"
],
"table":[
"equip_min"
]
}
],
"endDateTime":"2022-10-25 17:50:31",
"password":"",
"splitInterval":"5m",
"username":"hdec"
}
},
"writer":{
"name":"tdenginewriter",
"parameter":{
"batchSize":1000,
"column":[
"ts",
"i_a",
"i_b",
"i_c",
"i_sum",
"elc",
"u_a",
"u_b",
"u_c",
"power",
"corp_id",
"equipid",
"line_id"
],
"connection":[
{
"jdbcUrl":"jdbc:TAOS://xxxxxxx:6030/test?timestampFormat=TIMESTAMP",
"table":[
"abandoned_water_elc_data"
]
}
],
"ignoreTagsUnmatched":true,
"password":"
",
"username":"root"
}
}
}
],
"setting":{
"speed":{
"channel":1
}
}
}

使用Datax迁移tdengine3.0.3.2到tdengine3.2.1.0,无模式写入报错

2024-01-23 13:40:27.865 [0-0-0-writer] ERROR StdoutPluginCollector - 脏数据:
{"exception":"TDengine ERROR (0x80003002): Invalid data format","record":[{"byteSize":8,"index":0,"rawData":1705770149000,"type":"DATE"},{"byteSize":1,"index":1,"rawData":2,"type":"LONG"},{"byteSize":10,"index":2,"rawData":1705770318,"type":"LONG"},{"byteSize":3,"index":3,"rawData":169,"type":"LONG"},{"byteSize":9,"index":4,"rawData":"4.7264E-4","type":"DOUBLE"},{"byteSize":10,"index":5,"rawData":"0.00571764","type":"DOUBLE"},{"byteSize":23,"index":6,"rawData":"id_15","type":"STRING"},{"byteSize":5,"index":7,"rawData":"15","type":"STRING"}],"type":"writer"}
行数据:[balanced_state,tname=id_15,device_id=15 battery_state=3,end_time=1705770742,duration=424,energy=6.4E-7f64,capacity=6.4E-7f64 1705770318000]
2024-01-23 13:40:27.865 [0-0-0-writer] ERROR DefaultDataHandler - TDengine ERROR (0x80003002): Invalid data format

读取没有问题,但是写入的时候会报数据格式错误,是因为已经在3.2.1.0的库中创建了超级表,在进行无模式插入时,例如battery_state=3表示的是double类型,实际应该插入battery_state=3u8的形式,最终导致格式错误

有什么解决方案吗?应该是Datax将数据转为Long类型,但是转不回老版本的taos数据库类型?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.