Code Monkey home page Code Monkey logo

weiye-jing / datax-web Goto Github PK

View Code? Open in Web Editor NEW
5.4K 132.0 2.1K 42.65 MB

DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持RDBMS、Hive、HBase、ClickHouse、MongoDB等数据源,批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。

Home Page: https://segmentfault.com/u/weiye_jing/articles

License: MIT License

Java 95.57% HTML 0.59% CSS 0.02% Shell 3.83%

datax-web's Issues

增加oracle数据库配置临活性

例如:增量使用的专用用户有另外一个用户的权限,数据同步的过程中控制使用专用用户,在界面上能够选取指定用户下有权限的表

Wanted: Who is using DataX Web

谁在使用DataX Web

  • 诚挚地感谢每一位持续关注并使用DataX Web的朋友。我们会持续投入,把DataX Web做得更好,让数据集成的社区和生态更加繁荣。

此Issue的出发点

  1. 聆听社区的声音,让DataX Web更好
  2. 吸引更多的伙伴来参与贡献
  3. 更多的了解DataX Web的实际使用场景,以方便下一步的规划

我们期待您能提供

  1. 在此提交一条评论, 评论内容包括:
  2. 您所在公司、学校或组织和首页
  3. 您所在的城市、国家
  4. 您的联系方式: 微博、邮箱、微信 (至少一个)
  5. 您将DataX Web用于哪些业务场景

您可以参考下面的样例来提供您的信息:

* 组织:个人 , https://github.com/WeiYe-Jing/datax-web
* 地点:**苏州
* 联系方式:[email protected]
* 场景:使用DataX做异构数据源同步,项目选择数据源即可构建json极大提升了同步效率。

再次感谢你的参与!!! 您的支持是我们前进的强大动力!!
DataX Web 社区

read 插件不支持persql和postsql吗

eg.需求: 取数据前,对源库上table更改状态字段值,做完同步后,再改状态值。
如下步骤:

1.update @table set status=1 where status=0 --源表
2.执行select * from @table where status=1 -- insert目的表
3.update @table set status=2 where status=1 --postsql 源表

Specified key was too long; max key length is 767 bytes

job_registry表使用utf8mb4字符集,联合索引i_g_k_v字节数超限制。建议修改建表语句设置registry_keyregistry_value字节大小为191

DROP TABLE IF EXISTS `job_registry`;
CREATE TABLE `job_registry`  (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `registry_group` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `registry_key` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `registry_value` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `update_time` datetime(0) NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  INDEX `i_g_k_v`(`registry_group`, `registry_key`, `registry_value`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 26 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

执行器启动报错Unparseable date: "callbacklog"

java.text.ParseException: Unparseable date: "callbacklog"
at java.text.DateFormat.parse(DateFormat.java:366)
at com.wugui.datatx.core.thread.JobLogFileCleanThread.lambda$start$0(JobLogFileCleanThread.java:60)
at java.lang.Thread.run(Thread.java:748)
13:21:47.780 exe [datax-web, executor JobLogFileCleanThread] ERROR c.w.d.c.t.JobLogFileCleanThread - Unparseable date: "datax-executor.log"
java.text.ParseException: Unparseable date: "datax-executor.log"
at java.text.DateFormat.parse(DateFormat.java:366)
at com.wugui.datatx.core.thread.JobLogFileCleanThread.lambda$start$0(JobLogFileCleanThread.java:60)
at java.lang.Thread.run(Thread.java:748)
13:21:47.781 exe [datax-web, executor JobLogFileCleanThread] ERROR c.w.d.c.t.JobLogFileCleanThread - Unparseable date: "gluesource"
java.text.ParseException: Unparseable date: "gluesource"
at java.text.DateFormat.parse(DateFormat.java:366)
at com.wugui.datatx.core.thread.JobLogFileCleanThread.lambda$start$0(JobLogFileCleanThread.java:60)
at java.lang.Thread.run(Thread.java:748)
13:21:47.781 exe [datax-web, executor JobLogFileCleanThread] ERROR c.w.d.c.t.JobLogFileCleanThread - Unparseable date: "processcallbacklog"
java.text.ParseException: Unparseable date: "processcallbacklog"
at java.text.DateFormat.parse(DateFormat.java:366)
at com.wugui.datatx.core.thread.JobLogFileCleanThread.lambda$start$0(JobLogFileCleanThread.java:60)
at java.lang.Thread.run(Thread.java:748)

执行非DataX任务后端报空指针异常

JobInfo(id=7, jobGroup=1, jobCron=0 0 2 1/1 * ? , jobDesc=hive-sql-test, addTime=Wed Mar 25 14:30:24 CST 2020, updateTime=Wed Mar 25 14:54:49 CST 2020, author=qijun, alarmEmail=, executorRouteStrategy=ROUND, executorHandler=, executorParam=, executorBlockStrategy=SERIAL_EXECUTION, executorTimeout=0, executorFailRetryCount=0, glueType=GLUE_SHELL, glueSource=yesterday=date -d '-1 day' '+%Y-%m-%d'
echo $yesterday, glueRemark=null, glueUpdatetime=Wed Mar 25 14:54:49 CST 2020, childJobId=, triggerStatus=0, triggerLastTime=0, triggerNextTime=0, jobJson=null, replaceParam=, jvmParam=, incStartTime=null, partitionInfo=, lastHandleCode=0) 根据jobid 获取job信息 发现字段jobJson内容为空

json构建第四步选择模板优化

目前的做法是点击 “选择模板(操作步骤:构建->选择模板->下一步)”这段文字,希望通过按钮加说明来提示用户

json构建\符号转义异常

在datax json构建这个里面 输入 \t 那最后的 json 会出来 \t 执行任务报错,如果是 输入t 那 最后的json里又没有 ""

获取用户信息接口

用户管理,点击“编辑”应该是前台根据userId从后台查询user信息。现在好像是前台直接从pageList中获取用户信息

executor服务器资源使用问题

可以考虑一下自己写一下rpc的网关和负载均衡,如果没有将单个datax作业的executor拆分到多台执行服务器上的话,需要监控一个executor服务器上剩下的内存,不然机器上正在运行的作业会oom, 可以考虑一下把xxl的rpc改成蚂蚁开源的sofa-rpc, 然后自定义网关和负载均衡,作业推到执行器上,内存到达阀值就不再推了,延后

关于增量字段

您好,由于有些数据表中不包含时间戳字段。在做数据增量时,需要根据自增主键进行脚本替换。我准备支持该功能并向仓库提交PR,您觉得这个主意怎么样?可能会涉及到修改一些表结构及相关代码。

元数据mysql账密加密

数据源或者json中储存大量账密,所以元数据库的账密不能直接明文;
加密解密类JasyptUtil

dataxjson构建的问题

在dataxjson构建第三步需要填写字段映射,但填写的数据在构建时并没有使用上,按照了第一步和第二部的勾选顺序进行了映射。
比如:我在第一步先勾选了(name,pk,age)第二步我勾选了(pk,username,userage)那么第三步我填写映射关系,[
{
"src": {
"name": "name",
"name": "pk",
"name": "age"
},
"des": {
"name": "username",
"name": "pk",
"name": "userage"
}
}
]
生成的json中对应也是错误的。

偶现报错:XxlRpcException: xxl-rpc, request timeout at

出现频率:偶现(低于1%)
邮件报错信息:
msg:com.wugui.datax.rpc.util.XxlRpcException: xxl-rpc, request timeout at:1585273504252, request:XxlRpcRequest{requestId='761c6e38-1401-4a1c-b345-01ca66684054', createMillisTime=1585273501247, accessToken='', className='com.wugui.datatx.core.biz.ExecutorBiz', methodName='run', parameterTypes=[class com.wugui.datatx.core.biz.model.TriggerParam], parameters=[TriggerParam{jobId=152, executorHandler='executorJobHandler', executorParams='', executorBlockStrategy='DISCARD_LATER', executorTimeout=0, logId=2551329, logDateTime=1585273501000, glueType='BEAN', glueSource='null', glueUpdatetime=1582114646000, broadcastIndex=0, broadcastTotal=1, jobJson={ "job": { "setting": { "speed": { "channel": 16 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "ods_readonly", "password": "IEOIXNo3QCKDbI4n1VYdYpSlGqGfmR3d", "connection": [ { "jdbcUrl": [ "jdbc:mysql://rm-2zen18ooiq4wsn0rj.mysql.rds.aliyuncs.com:3306/trade_hebei_prod?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8" ], "querySql": [ "select id,created_by_id,created_time,updated_by_id,updated_time,data_flag,goods_47_kind_id,replace_kind_id,hxProductId,medicalAreaId,pack_convert_factor,replace_goods_id,replace_type,stand_spec_mg,superviseId,medicalConfigId,batch_type from p_goods_47_kind_replace_relation where (created_time >= FROM_UNIXTIME(${lastTime}-60) and created_time < FROM_UNIXTIME(${currentTime})) or (updated_time >= FROM_UNIXTIME(${lastTime}-60) and updated_time < FROM_UNIXTIME(${currentTime}))" ] } ] } }, "writer": { "name": "kafkawriter", "parameter": { "ack": "all", "batchSize": 2000000, "bootstrapServers": "192.168.1.246:9092,192.168.1.249:9092,192.168.1.248:9092", "fieldDelimiter": "#", "keySerializer": "org.apache.kafka.common.serialization.StringSerializer", "retries": 0, "transactionalId": "hb_trd_p_goods_47_kind_replace_relation_prod_trans_id", "topic": "hb_trd_p_goods_47_kind_replace_relation_prod_1", "writeType": "csv", "valueSerializer": "org.apache.kafka.common.serialization.StringSerializer", "columns": [ { "name": "id", "type": "Long" } ] } } } ] } }, processId=null, replaceParam=-DlastTime='%s' -DcurrentTime='%s', jvmParam=, startTime=Fri Mar 27 09:30:01 CST 2020, triggerTime=Fri Mar 27 09:45:01 CST 2020}], version='null'} at com.wugui.datax.rpc.remoting.net.params.XxlRpcFutureResponse.get(XxlRpcFutureResponse.java:120) at com.wugui.datax.rpc.remoting.invoker.reference.XxlRpcReferenceBean$1.invoke(XxlRpcReferenceBean.java:242) at com.sun.proxy.$Proxy164.run(Unknown Source) at com.wugui.datax.admin.core.trigger.JobTrigger.runExecutor(JobTrigger.java:204) at com.wugui.datax.admin.core.trigger.JobTrigger.processTrigger(JobTrigger.java:156) at com.wugui.datax.admin.core.trigger.JobTrigger.trigger(JobTrigger.java:73) at com.wugui.datax.admin.core.thread.JobTriggerPoolHelper.lambda$addTrigger$0(JobTriggerPoolHelper.java:88) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

增加任务超时配置

增加任务超时配置,kill DataX任务超过配置时间的进程。可以配合重试机制,避免网络问题导致的reader读不到数据。

datax转换的问题

官方文档:dx_groovy只能调用一次。不能多次调用。([https://github.com/alibaba/DataX/blob/master/transformer/doc/transformer.md])
请问这个只能调用一次,当调度执行的时候怎么办?你是怎么解决的?

前端form表单不能缓存数据

比如我先点击"datax-json构建这个模块"
然后点击其他模块
然后再点击"datax-json"模块,之前已经配置的进度就会消失,这样是不合理的

字段顺序可调整

在进行表字段对照时,源端字段和目标字段顺序可能不一致,建议增加字段顺序调整按钮,方便字段调整

where及postsql参数

同步过程中使用where、postsql参数也比较多,建议增加where、postsql参数的选项框

编辑用户逻辑是否需要修改?

编辑用户逻辑是否需要修改?
1,“密码”显示的是密文,使用的BCryptPasswordEncoder是无法解密的。要显示明文需要更换加密方法
2,或者逻辑修改为“密码”置空,让用户填入最新明文密码

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.