Comments (3)
这是我本地测试的结果,读取指定目录下 311个文件。为了减少篇幅,相似日志输出做了截断
2024-03-13 13:32:58.801 [ main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-03-13 13:32:58.810 [ main] INFO Engine -
{
"setting":{
"speed":{
"byte":-1,
"channel":8
},
"errorLimit":{
"record":0,
"percentage":0.02
}
},
"content":{
"reader":{
"name":"ftpreader",
"parameter":{
"column":[
"*"
],
"protocol":"ftp",
"host":"127.0.0.1",
"port":"21",
"username":"wgzhao",
"password":"*****",
"skipDelimiter":true,
"path":"/home/wgzhao/ftptest/ftpreader"
}
},
"writer":{
"name":"ftpwriter",
"parameter":{
"column":[
"*"
],
"protocol":"ftp",
"host":"127.0.0.1",
"port":"21",
"username":"wgzhao",
"password":"*****",
"path":"/home/wgzhao/ftptest/ftpwriter",
"fileName":"101-测试",
"writeMode":"truncate",
"compress":"gz",
"skipDelimiter":true
}
}
}
}
2024-03-13 13:32:58.823 [ main] INFO JobContainer - The jobContainer begins to process the job.
2024-03-13 13:32:58.856 [ job-0] WARN StorageWriterUtil - The item encoding is empty, uses [UTF-8] as default.
2024-03-13 13:32:58.856 [ job-0] WARN StorageWriterUtil - The item delimiter is empty, uses [,] as default.
2024-03-13 13:32:58.874 [ job-0] INFO JobContainer - The Reader.Job [ftpreader] perform prepare work .
2024-03-13 13:32:58.916 [ job-0] INFO FtpReader$Job - 您即将读取的文件数为: [311]
2024-03-13 13:32:58.916 [ job-0] INFO JobContainer - The Writer.Job [ftpwriter] perform prepare work .
2024-03-13 13:32:58.917 [ job-0] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:58.938 [ job-0] INFO FtpWriter$Job - The current writeMode is truncate, begin to cleanup all files with prefix [101-测试] under [/home/wgzhao/ftptest/ftpwriter].
2024-03-13 13:32:58.938 [ job-0] INFO FtpWriter$Job - The following file(s) will be deleted: [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_793_tdwnz8ut.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_783_4rmxunsq.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_817_44veuzg4.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_byy3pc11.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_gcfcw8vm.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_790_3xvm4f94.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_5t4agz8d.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_817_gu145tv1.txt].
2024-03-13 13:32:58.938 [ job-0] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:58.940 [ job-0] INFO JobContainer - Job set Channel-Number to 8 channel(s).
2024-03-13 13:32:58.956 [ job-0] INFO JobContainer - The Reader.Job [ftpreader] is divided into [311] task(s).
2024-03-13 13:32:58.956 [ job-0] INFO StorageWriterUtil - Begin to split...
2024-03-13 13:32:58.963 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_962_x27b8yt1]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_963_2n52sntt]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_e623a1s8]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_hgva8zyv]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_68uqewte]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_xwreawz9]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_6f3q3fyb]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_g3xqcuz4]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_u4ux9ywb]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_cdqacs9e]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_hb98t8ag]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_nshwrf8v]
2024-03-13 13:32:58.966 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_966_ct0fr1vx]
......
2024-03-13 13:32:59.000 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_000_ayxeax1h]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_upb2bgwv]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_9v2acmmw]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_bfmfes78]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_efg8bsz3]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_w8yu94xs]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_wtv1v4qp]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_utvqbwwa]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_uen8emwr]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_fexsrc16]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_5rb40g5w]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_9bsxftnm]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_1pqh18dv]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_fpppg6ep]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_n4vx5rq6]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_huh199rz]
2024-03-13 13:32:59.010 [ job-0] INFO StorageWriterUtil - Finished split.
2024-03-13 13:32:59.010 [ job-0] INFO JobContainer - The Writer.Job [ftpwriter] is divided into [311] task(s).
2024-03-13 13:32:59.079 [ job-0] INFO JobContainer - The Scheduler launches [1] taskGroup(s).
2024-03-13 13:32:59.092 [ taskGroup-0] INFO TaskGroupContainer - The taskGroupId=[0] started [8] channels for [311] tasks.
2024-03-13 13:32:59.095 [ taskGroup-0] INFO Channel - The Channel set byte_speed_limit to -1, No bps activated.
2024-03-13 13:32:59.095 [ taskGroup-0] INFO Channel - The Channel set record_speed_limit to -1, No tps activated.
2024-03-13 13:32:59.116 [writer-0-284] INFO FtpWriter$Task - begin do write...
2024-03-13 13:32:59.116 [writer-0-284] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133259_007_g1c7ysae.txt]
2024-03-13 13:32:59.116 [reader-0-284] INFO FtpReader$Task - reading file : [/home/wgzhao/ftptest/ftpreader/000851.SZ-300660.SZ.csv]
2024-03-13 13:32:59.116 [writer-0-284] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:32:59.117 [writer-0-284] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:59.118 [writer-0-279] INFO FtpWriter$Task - begin do write...
2024-03-13 13:32:59.118 [writer-0-182] INFO FtpWriter$Task - begin do write...
2024-03-13 13:32:59.119 [writer-0-182] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_995_uqn2751q.txt]
2024-03-13 13:32:59.119 [writer-0-279] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133259_007_25dn2gmv.txt]
2024-03-13 13:32:59.120 [writer-0-182] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao
.....
2024-03-13 13:33:03.463 [writer-0-205] INFO FtpWriter$Task - begin do write...
2024-03-13 13:33:03.463 [reader-0-173] WARN StorageReaderUtil - Uses [,] as delimiter by default
2024-03-13 13:33:03.463 [writer-0-205] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_998_f4c0mn89.txt]
2024-03-13 13:33:03.463 [writer-0-205] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:33:03.463 [writer-0-205] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:33:03.463 [writer-0-171] INFO FtpWriter$Task - end do write
2024-03-13 13:33:03.464 [writer-0-173] INFO FtpWriter$Task - begin do write...
2024-03-13 13:33:03.464 [writer-0-173] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_994_s8z8tsdr.txt]
2024-03-13 13:33:03.464 [writer-0-173] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:33:03.464 [writer-0-173] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:33:03.465 [writer-0-173] INFO FtpWriter$Task - end do write
2024-03-13 13:33:03.465 [reader-0-205] INFO FtpReader$Task - reading file : [/home/wgzhao/ftptest/ftpreader/600575.SH-000677.SZ.csv]
2024-03-13 13:33:03.466 [reader-0-205] WARN StorageReaderUtil - Uses [,] as delimiter by default
2024-03-13 13:33:03.466 [writer-0-205] INFO FtpWriter$Task - end do write
2024-03-13 13:33:03.467 [ reader-0-42] INFO FtpReader$Task - reading file : [/home/wgzhao/ftptest/ftpreader/688018.SH-000632.SZ.csv]
2024-03-13 13:33:03.467 [ reader-0-42] WARN StorageReaderUtil - Uses [,] as delimiter by default
2024-03-13 13:33:03.468 [ writer-0-42] INFO FtpWriter$Task - end do write
2024-03-13 13:33:03.499 [writer-0-158] INFO FtpWriter$Task - end do write
2024-03-13 13:33:05.096 [ job-0] INFO AbstractScheduler - The scheduler has completed all tasks.
2024-03-13 13:33:05.096 [ job-0] INFO JobContainer - The Writer.Job [ftpwriter] perform post work.
2024-03-13 13:33:05.096 [ job-0] INFO JobContainer - The Reader.Job [ftpreader] perform post work.
2024-03-13 13:33:05.103 [ job-0] INFO StandAloneJobContainerCommunicator - Total 225262 records, 23043447 bytes | Speed 3.66MB/s, 37543 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.974s | All Task WaitReaderTime 0.156s | Percentage 100.00%
2024-03-13 13:33:05.103 [ job-0] INFO JobContainer -
Job start at : 2024-03-13 13:32:58
Job end at : 2024-03-13 13:33:05
Job took secs : 6s
Average bps : 3.66MB/s
Average rps : 37543rec/s
Number of rec : 225262
Failed record : 0
from addax.
基本上文件夹下的文件在1000~2000左右的时候问题不大,但是我现在的情况是,每天都会有8~20万个文件的增量但文件不大,只有几百个字节一个文件,以天作为的文件夹;
咱这个支持分批去同步吗,比如20万个文件,分成200次,每次1000个文件?
from addax.
目前的模式每一个文件对应一个任务,也就是一个线程,如果一次读取上万,乃至上十万的话,等于一次性要开这么多个线程,在我本地模拟读取15万个文件时,直接退出了。如果你的文件命名是有规则的话,你可以在 path
项使用通配符的方式一次制定一批文件,这样应该可以临时解决你的问题,类似如下:
{
"parameter": {
"path": "/home/wgzhao/ftptest/ftpreader/100*.csv"
}
}
from addax.
Related Issues (20)
- [Bug]: 连接postgrsql失败 HOT 4
- [Bug]: windows调试源码包方法抛出'java.lang.NoClassDefFoundError' HOT 1
- 请问一下postgresqlreader支持querySql吗? HOT 1
- [Bug]: 各个组件中的libs目录下的jar包与根目录lib包下的jar包冲突
- [Bug]: java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/LoggerAdapter HOT 5
- [Bug]: 4.1.4版本中是否把Record中的removeColumn方法移除
- [Bug]: 数据从rdbms 写入到Doris 报错 HOT 5
- [Bug]: log4j-core-2.17.1.jar HOT 3
- [Bug]: MongoDB写入starrocks报错 HOT 2
- [Bug]: java往数据库中直接存为clob字段时,oracle会报ORA-01704问题:字符串文字过长。 HOT 6
- [Bug]: mongoDB 写入 mongoDB 报错:Command failed with error 18 (AuthenticationFailed) HOT 2
- [help wanted]: 任务结果上报使用详细描述 HOT 1
- 关于influxdb2同步influxdb2问题
- 关于influxdb2同步到influxdb2问题 HOT 1
- [Bug]: MongoDBReader连接复制集类型mongo报错连接超时 HOT 2
- [question]: Support for deployment via k8s HOT 1
- [Bug]: KafkaReader无法读取 HOT 4
- [Bug]: 通过hdfswriter写入orc格式数据异常 HOT 8
- [Bug]: 写入hive数据为null
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from addax.