Code Monkey home page Code Monkey logo

alibaba / alink Goto Github PK

View Code? Open in Web Editor NEW
3.5K 141.0 797.0 18.77 MB

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

License: Apache License 2.0

Java 50.70% Scala 0.08% Jupyter Notebook 0.14% Shell 0.06% Python 7.66% CMake 0.28% C++ 4.58% C 0.01% Dockerfile 0.01% TypeScript 1.56% Less 0.09% Batchfile 0.01% HTML 34.85% Makefile 0.01%
machine-learning flink classification clustering regression graph-algorithms xgboost recommender recommender-system feature-engineering

alink's People

Contributors

5917549999 avatar alinkgit avatar bai0335 avatar biyuhao avatar cainingnk avatar chengscu avatar fanoid avatar hapsunday avatar liulfy avatar lqb11 avatar queyuexzy avatar shaomengwang avatar stenicholas avatar weibozhao avatar xuyang1706 avatar zhangzq94 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alink's Issues

Double-Checked Locking

if (singleton == null) {
synchronized (WordDictionary.class) {
if (singleton == null) {
singleton = new WordDictionary();
return singleton;

Double-Checked Locking is widely cited and used as an efficient method for implementing lazy initialization in a multithreaded environment.
Unfortunately, it will not work reliably in a platform independent way when implemented in Java, without additional synchronization.
declares the singleton field volatile offers a much more elegant solution

为何easy_install安装失败

我没学过python,在python这方面小白,我安装python3.5的环境,也下载包了,但是按照提示easy_install时候报错,官方提供的是easy_install [存放的路径]/pyalink-0.0.1-py3.*.egg。我的是easy_install /usr/apps/pyalink-1.0_flink_1.9.0_scala_2.11-py3.5.egg 为何跑了很久一直卡在Couldn't find index page for 'pyalink' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
卡的有半小时多,最后报错说找不到pyalink-1.0_flink_1.9.0_scala_2.11-py3.5,有安装成功的么?试了好多次都不行,有人懂么?在线等,挺急的

pmml model

How this algorithms can transfer to pmml models??

ALSExample发生错误

打包官网example上传到flink上跑ALSExample出现以下错误

2019-12-27 14:56:29
java.lang.Exception: The user defined 'open()' method caused an exception: null
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:499)
at org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:157)
at org.apache.flink.runtime.iterative.task.IterationIntermediateTask.run(IterationIntermediateTask.java:107)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NegativeArraySizeException
at com.alibaba.alink.operator.common.recommendation.AlsTrain$9.open(AlsTrain.java:308)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.api.java.operators.translation.WrappingFunction.open(WrappingFunction.java:45)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:495)
... 6 more

stream读取csv后,csv内容变更后,print出来的内容没有更新

大神们好,我有个问题,用CsvSourceStreamOp方法读取csv后,csv内容变更后,print出来的内容没有更新。以下是我的代码,请大神帮忙解决一下,谢谢!
`#!/usr/bin/python

-- coding: UTF-8 --

from pyalink.alink import *
import numpy as np
import pandas as pd

useLocalEnv(1)

schemaStr = "A double, B double, C double, D double, label string"
path = "D:/program/Alink_script/data/iris2_1_stream.csv"
csvSource = CsvSourceStreamOp().setFilePath(path).setSchemaStr(schemaStr).setFieldDelimiter(",")
csvSource.print()
sample = SampleStreamOp().setRatio(0.01).linkFrom(csvSource)
sample.print(key="csvSource", refreshInterval=1)
StreamOperator.execute()`

Adapted to flink 1.10.0

  • change flink version to 1.10.0
  • change the dependence of common lang to commn lang3 in icq
  • optimize the unit tests

Please upload the package to PyPI

Most python packages (currently more than 200 thousand) can be easily installed with pip, making it easier to integrate new packages like yours into large existing python development environments.

This however would require uploading your package to PyPI, which I think will be easy with this package, given that it's already bundled into an .egg format.

运行时提示TypeError: 'JavaPackage' object is not callable

导入完包后输入useLocalEnv(2, flinkHome=None, config=None),提示
JVM listening on 127.0.0.1:6327

TypeError Traceback (most recent call last)
in
----> 1 useLocalEnv(2, flinkHome=None, config=None)

D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\env.pyc in useLocalEnv(parallelism, flinkHome, config)

D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\env.pyc in make_configuration(config)

D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\py4j_util.pyc in get_java_gateway()

D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\py4j_util.pyc in inst(cls)

D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\py4j_util.pyc in init(self)

TypeError: 'JavaPackage' object is not callable

Allow to inject existed flink env

For now Alink would create env by itself, which make it difficult to integrate with other tools/framework. It would be nice to allow alink to use the env created by others.

在线学习FTRL

在线学习训练支持FTRL ,以后会支持其它的算法吗,问个小白的问题,为什么在线学习只支持线性的,可以解决吗

Improvement of UDF/UDTF operators

Hi,
I found the current implementation of UDF/UDTF operators has the following limitations:

  • Not aligned with corresponding Flink feature: only 1 selectedCol is used and no joinType supported in UDTF, and case-sensitivity of identifier breaks in UDTFBatchOp by MappingColNameIgnoreCase.
  • No parameter MLEnvrionmentId provided.
  • Incomplete JavaDoc and unit tests.
  • Redundant codes.

Moreover, in current codes, if a same column name is used in selectedCol(s), reservedCols, and outputCols, the related codes are not very clear.
Maybe we can re-phrase codes to 3 rules, and the implementation should align to these rules:

  1. If reservedCols is not set or set to null, then all columns in the input table are automatically reserved except for cases in rule 3. In this way, the behaviors of reservedCols in UDF/UDTF are consistent to other operators.
  2. For UDTF, if a same name is used in selectedCol(s) and outputCols, the usage should be allowed. The conflicts exist due to the cross join or left outer join clause. However, because the implementation details are hidden for users, we should handle such conflicts within the operators.
  3. If a same name is used in reservedCols and outputCols, then the one representing a column in the input table will not exist in the output table.

是否支持在线深度学习

浏览了example和文档之后,发现在线学习是应用在浅层模型。不支持深度学习难免挂万漏一。

Some wrong

试图与flink交互时,表环境不一致错误
Exception in thread "main" org.apache.flink.table.api.ValidationException: Only tables from the same TableEnvironment can be joined.

`
package com.alibaba.alink;

import com.alibaba.alink.operator.batch.BatchOperator;
import com.alibaba.alink.operator.batch.dataproc.SplitBatchOp;
import com.alibaba.alink.operator.batch.recommendation.AlsPredictBatchOp;
import com.alibaba.alink.operator.batch.recommendation.AlsTrainBatchOp;
import com.alibaba.alink.operator.batch.source.CsvSourceBatchOp;
import com.alibaba.alink.operator.batch.source.TableSourceBatchOp;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment;

public class WrongDemo {
public static void main(String[] args) throws Exception {
ExecutionEnvironment env= ExecutionEnvironment.createLocalEnvironment();
BatchTableEnvironment tEnv= BatchTableEnvironment.create(env);
String url = "https://alink-release.oss-cn-beijing.aliyuncs.com/data-files/movielens_ratings.csv";
String schema = "userCol bigint, itemCol bigint,rateCol double, timestamp string";

    Tuple3[] fake= new Tuple3[]{
            new Tuple3<Long,Long,Double>(1L,1L,0.6),
            new Tuple3<Long,Long,Double>(2L,2L,0.8),
            new Tuple3<Long,Long,Double>(2L,3L,0.6),
            new Tuple3<Long,Long,Double>(4L,1L,0.6),
            new Tuple3<Long,Long,Double>(4L,2L,0.3),
    };

    DataSet ds= env.fromElements(fake);
    Table table =tEnv.fromDataSet(ds);


    BatchOperator data=BatchOperator.fromTable(table);
    BatchOperator data3=new TableSourceBatchOp(table);
    SplitBatchOp spliter = new SplitBatchOp().setFraction(0.8);
    BatchOperator data2 = new CsvSourceBatchOp()
            .setFilePath(url).setSchemaStr(schema);

            System.out.println(data.getDataSet().getExecutionEnvironment());
    System.out.println(BatchOperator.getExecutionEnvironmentFromDataSets(ds));
    System.out.println(BatchOperator.getExecutionEnvironmentFromOps(data2));
    System.out.println(BatchOperator.getExecutionEnvironmentFromOps(data3));
    System.out.println(BatchOperator.getExecutionEnvironmentFromOps(spliter.linkFrom(data2)));
    spliter.linkFrom(data);

    BatchOperator trainData = spliter;
    BatchOperator testData = spliter.getSideOutput(0);
    AlsTrainBatchOp als = new AlsTrainBatchOp()
            .setUserCol("userid").setItemCol("movieid").setRateCol("rating")
            .setNumIter(10).setRank(10).setLambda(0.1);
    BatchOperator model = als.linkFrom(trainData);

    AlsPredictBatchOp predictor = new AlsPredictBatchOp()
            .setUserCol("userid").setItemCol("movieid").setPredictionCol("prediction_result");
    BatchOperator preditionResult;

    preditionResult = predictor.linkFrom(model, testData).select("rating, prediction_result");


    preditionResult.print();
}

}
`

GBDTExample occurs wrong setFeautreCols required String.

First, I can't get the jar of alink_core.
So, I import my project with alink_open-0.1...jar and alink_python-0.1..jar that in pyalink lib folder.
When I test the GBDTExample in my project, then , I just get this error like title described.

如何将单列Operator修改格式变为多列Operator

大神们好,有个问题想请教一下,我通过Kafka011SourceStreamOp读取Kafka中的数据,但获取到message生成一列的Operator,我想对message进行解析(例如通过逗号分隔),生成新的多列的Operator,这样该如何操作?以下为我部分代码,希望大神帮忙解答,谢谢!
`#!/usr/bin/python

-- coding: UTF-8 --

from pyalink.alink import *
import numpy as np
import pandas as pd

useLocalEnv(1)

bootstrapServer = "10.124.165.91:9092"
topic = "test_alink1"

data = Kafka011SourceStreamOp().setBootstrapServers(bootstrapServer).setTopic(topic).setStartupMode("EARLIEST").setGroupId("")

#此次可获取Kafka中message的信息,但是想对其进行解析生成新的Operator,同时不想解析成DataFrame再转Operator,因为之前readme中介绍过,不要频繁切换DataFrame与Operator,效率会收到影响。
data.select("message").print()
StreamOperator.execute()`

远程执行发生错误

val env = ExecutionEnvironment.createLocalEnvironment()
val tEnv=BatchTableEnvironment.create(env)
val mle = new MLEnvironment(env, tEnv)
MLEnvironmentFactory.setDefault(mle)
加了本地环境 运行提示
Error:(18, 42) Static methods in interface require -target:jvm-1.8
val tEnv=BatchTableEnvironment.create(env)

Bug of HasVectorSize alias

In HasVectorSize class, ParamInfo of VECTOR_SIZE use vectorSize as name, use vectorSize and inputDim as alias, which cause Exception when use flink-ml-api dependency because of the repeated alias of vectorSize. vectorSize should be removed from alias of VECTOR_SIZE.

Kafka as the sink, something was wrong.

When I test the kafka as the sink, something was wrong.
The codes are as follows:

public class writeKafka {
    private static void write() throws Exception {
        String URL = "data/iris.csv";
        String SCHEMA_STR
                = "sepal_length double, sepal_width double, petal_length double, petal_width double, category string";
        CsvSourceStreamOp data = new CsvSourceStreamOp().setFilePath(URL).setSchemaStr(SCHEMA_STR);
        Kafka011SinkStreamOp sink = new Kafka011SinkStreamOp()
                .setBootstrapServers("127.0.0.1:9092")
                .setDataFormat("json")
                .setTopic("iris");

        data.link(sink);

        StreamOperator.execute();
    }

    public static void main(String[] args) throws Exception {
        write();

    }
}

It goes wrong like

09:54:15,518 WARN  org.apache.kafka.common.utils.AppInfoParser                   - Error registering AppInfo mbean
javax.management.InstanceAlreadyExistsException: kafka.producer:type=app-info,id=kafka_011_-6885371916521496791
	at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
	at org.apache.kafka.common.utils.AppInfoParser.registerAppInfo(AppInfoParser.java:58)
	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:335)
	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:191)
	at com.alibaba.alink.operator.common.io.kafka.KafkaBaseOutputFormat.getKafkaProducer(KafkaBaseOutputFormat.java:160)
	at com.alibaba.alink.operator.common.io.kafka.KafkaBaseOutputFormat.open(KafkaBaseOutputFormat.java:166)
	at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:65)
	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:529)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:393)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
	at java.lang.Thread.run(Thread.java:748)

I don't know what caused it. Besides, I found that the message was generated normally.

为何maven打包说缺依赖

我用的是集群安装方式,为何maven打包时候一直报bug,
[ERROR] Failed to execute goal on project alink_core: Could not resolve dependencies for project com.alibaba.alink:alink_core:jar:0.1-SNAPSHOT: Could not transfer artifact net.sf.json-lib:json-lib:jar:2.2.3 from/to alimaven (http://maven.aliyun.com/nexus/content/groups/public): Transfer failed for http://maven.aliyun.com/nexus/content/groups/public/net/sf/json-lib/json-lib/2.2.3/json-lib-2.2.3.jar: Unknown host maven.aliyun.com: Temporary failure in name resolution -> [Help 1]
找不到这个依赖包,我把依赖包导入pom文件还是报这个错 。那位大佬知道怎么做,告诉我一下,挺急的

Document lack of the way how to save trained model with Alink trainer

In integration with Alink, I didn't find how to save alink trainer trained model in documents of Alink. In my application, I use PipelineModel to save trained model with Alink trainer as follows, but this has no document to describe the way. And I don't know how to save online learning model.

Pipeline pipeline = new Pipeline().add(vectorAssembler).add(kMeans);
PipelineModel pipelineModel = pipeline.fit(data);
pipelineModel.save("/tmp/feature_pipe_model.csv");

pyalink里的ftrl_demo.ipynb跑不通

想请问下,这个https://github.com/alibaba/Alink/blob/master/pyalink/ftrl_demo.ipynb 的例子是被跑通了的吗?

运行到下面这一句的时候,报错如下,是什么原因?
image

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-3-af625ccf78c2> in <module>
     13 feature_pipeline.fit(trainBatchData).save(FEATURE_PIPELINE_MODEL_FILE);
     14 
---> 15 BatchOperator.execute();
     16 
     17 # load pipeline model

~/anaconda3/lib/python3.7/site-packages/pyalink-1.0.1_flink_1.9.0_scala_2.11-py3.7.egg/pyalink/alink/batch/base.pyc in execute(cls)

~/anaconda3/lib/python3.7/site-packages/py4j-0.10.8.1-py3.7.egg/py4j/java_gateway.py in __call__(self, *args)
   1284         answer = self.gateway_client.send_command(command)
   1285         return_value = get_return_value(
-> 1286             answer, self.gateway_client, self.target_id, self.name)
   1287 
   1288         for temp_arg in temp_args:

~/anaconda3/lib/python3.7/site-packages/py4j-0.10.8.1-py3.7.egg/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling z:com.alibaba.alink.operator.batch.BatchOperator.execute.
: org.apache.flink.runtime.client.JobExecutionException: Could not retrieve JobResult.
	at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:623)
	at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:222)
	at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:91)
	at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:820)
	at com.alibaba.alink.operator.batch.BatchOperator.execute(BatchOperator.java:248)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$2(Dispatcher.java:333)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
	... 6 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:152)
	at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83)
	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375)
	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
	... 7 more
Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: java.net.SocketTimeoutException: connect timed out
	at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:270)
	at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:907)
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:230)
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:106)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.createExecutionGraph(LegacyScheduler.java:207)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.createAndRestoreExecutionGraph(LegacyScheduler.java:184)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.<init>(LegacyScheduler.java:176)
	at org.apache.flink.runtime.scheduler.LegacySchedulerFactory.createInstance(LegacySchedulerFactory.java:70)
	at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:275)
	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:265)
	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:146)
	... 10 more
Caused by: java.lang.RuntimeException: java.net.SocketTimeoutException: connect timed out
	at com.alibaba.alink.operator.common.io.reader.HttpFileSplitReader.getFileLength(HttpFileSplitReader.java:79)
	at com.alibaba.alink.operator.common.io.csv.GenericCsvInputFormat.createInputSplits(GenericCsvInputFormat.java:401)
	at com.alibaba.alink.operator.common.io.csv.GenericCsvInputFormat.createInputSplits(GenericCsvInputFormat.java:39)
	at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:256)
	... 22 more
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
	at com.alibaba.alink.operator.common.io.reader.HttpFileSplitReader.getFileLength(HttpFileSplitReader.java:61)
	... 25 more

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.