datahub-project / datahub Goto Github PK
View Code? Open in Web Editor NEWThe Metadata Platform for your Data Stack
Home Page: https://datahubproject.io
License: Apache License 2.0
The Metadata Platform for your Data Stack
Home Page: https://datahubproject.io
License: Apache License 2.0
Currently, WhereHows can only config through API. So if I want to change one job's configuration or want to add a new job, I have run the service and use 'curl' call from command line follow the steps. This is a pain for a admin who need to manage and operate the WhereHows backend jobs. What I usually do is use a database tools(Aqua data studio) which have UI so I can open a table and directly change the value there. But absolutely this is not a standard way.
We should have several 'admin' pages in frontend that could configure WhereHows backend job. User with 'admin' permission login can enter the admin pages.
These admin pages are all one to one map to database configuration tables
They contains at least these pages :
In lineage ETL, there are several configuration table also better need to have a frontend:
[info] Loading project definition from /Users/jvy234/Documents/workspace/WhereHows/web/project
[info] Set current project to wherehows (in build file:/Users/jvy234/Documents/workspace/WhereHows/web/)
[info] Updating {file:/Users/jvy234/Documents/workspace/WhereHows/web/}web...
[info] Packaging /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/wherehows_2.10-1.0-SNAPSHOT-sources.jar ...
[info] Wrote /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/wherehows_2.10-1.0-SNAPSHOT.pom
[info] Resolving com.typesafe.play#play-jdbc_2.10;2.2.4 ...
[info] Done packaging.
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 7 Scala sources and 71 Java sources to /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/classes...
[info] Main Scala API documentation to /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/api...
[error]
[error] while compiling: /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/src_managed/main/views/html/index.template.scala
[error] during phase: typer
[error] library version: version 2.10.3
[error] compiler version: version 2.10.3
[error] reconstructed args: -bootclasspath /Library/Java/JavaVirtualMachines/jdk1.8.0_72.jdk/Contents/Home/jre/lib/resources.jar....
Later on I get:
[error]
[error] last tree to typer: Literal(Constant({))
[error] symbol: null
[error] symbol definition: null
[error] tpe: String("{")
[error] symbol owners:
[error] context owners: method apply -> object index -> package html
[error]
[error] == Enclosing template or block ==
[error]
[error] Apply(
[error] "format"."raw"
[error] "{"
[error] )
[error]
[error] == Expanded type of tree ==
[error]
[error] ConstantType(value = Constant({))
[error]
[error] uncaught exception during compilation: java.lang.StackOverflowError
[error]
[error] while compiling: /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/src_managed/main/views/html/index.template.scala
[error] during phase: typer
[error] library version: version 2.10.3
[error] compiler version: version 2.10.3
[error] reconstructed args: -bootclasspath /Library/Java/JavaVirtualMachines/jdk1.8.0_72.jdk/Contents/Home/jre/lib/resources.jar....
And FInally getting:
[error]
[error] last tree to typer: Literal(Constant(}))
[error] symbol: null
[error] symbol definition: null
[error] tpe: String("}")
[error] symbol owners:
[error] context owners: method apply -> object index -> package html
[error]
[error] == Enclosing template or block ==
[error]
[error] Apply(
[error] "format"."raw"
[error] "}"
[error] )
[error]
[error] == Expanded type of tree ==
[error]
[error] ConstantType(value = Constant(}))
[error]
[error] uncaught exception during compilation: java.lang.StackOverflowError
java.lang.StackOverflowError
at scala.reflect.internal.Constants$Constant.tpe(Constants.scala:74)
Has anyone else run into a similar error, and if so how did you resolve?
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/WhereHows/backend-service/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/play-2.2.4/repository/local/ch.qos.logback/logback-classic/1.0.13/jars/logback-classic.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
java.lang.AbstractMethodError: akka.event.slf4j.Slf4jLogger.aroundReceive with backend server
version:1.0-SNAPSHOT
[ERROR] [08/24/2016 16:12:36.543] [WhereHowsETLService-akka.actor.default-dispatcher-4] [ActorSystem(WhereHowsETLService)] Uncaught error from thread [WhereHowsETLService-akka.actor.default-dispatcher-4] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled
java.lang.AbstractMethodError: akka.event.slf4j.Slf4jLogger.aroundReceive(Lscala/PartialFunction;Ljava/lang/Object;)V
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Uncaught error from thread [WhereHowsETLService-akka.actor.default-dispatcher-4] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[WhereHowsETLService]
java.lang.AbstractMethodError: akka.event.slf4j.Slf4jLogger.aroundReceive(Lscala/PartialFunction;Ljava/lang/Object;)V
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
Are there any group communication channels? I could not find this in looking through the Wiki, blog post,....
Make it possible to import, edit, add business or conceptual object definitions and then link them to environments, systems, databases, subject areas, entities, and fields. Support bi-directional navigation within the UI.
Seems the link below to download WhereHows VM is unaccessable:
https://linkedin.app.box.com/wherehows-demo-in-cloudera-vm
Could anyone help to provide a valid link?
cfg_object_name_map table schema is missing in data-model/DDL scripts, but used in web/app/dao/LineageDAO.java
Hi,
I'm going back to the WhereHows root directory and start the metadata ETL and API service:
cd backend-service ; $PLAY_HOME/play run
then,throw an error:
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
Oops, cannot start the server.
Configuration error: Configuration error[database.conf: 5: Could not resolve substitution to a value: ${WHZ_DB_PASSWORD}]
at play.api.Configuration$.play$api$Configuration$$configError(Configuration.scala:92)
at play.api.Configuration$.load(Configuration.scala:58)
at play.api.WithDefaultConfiguration$$anonfun$initialConfiguration$1.apply(Application.scala:72)
at play.api.WithDefaultConfiguration$$anonfun$initialConfiguration$1.apply(Application.scala:72)
at play.utils.Threads$.withContextClassLoader(Threads.scala:18)
at play.api.WithDefaultConfiguration$class.initialConfiguration(Application.scala:71)
at play.api.DefaultApplication.initialConfiguration$lzycompute(Application.scala:399)
at play.api.DefaultApplication.initialConfiguration(Application.scala:399)
at play.api.WithDefaultGlobal$class.play$api$WithDefaultGlobal$$globalClass(Application.scala:22)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$globalClass$lzycompute(Application.scala:399)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$globalClass(Application.scala:399)
at play.api.WithDefaultGlobal$class.play$api$WithDefaultGlobal$$javaGlobal(Application.scala:28)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$javaGlobal$lzycompute(Application.scala:399)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$javaGlobal(Application.scala:399)
at play.api.WithDefaultGlobal$$anonfun$play$api$WithDefaultGlobal$$globalInstance$1.apply(Application.scala:50)
at play.api.WithDefaultGlobal$$anonfun$play$api$WithDefaultGlobal$$globalInstance$1.apply(Application.scala:49)
at play.utils.Threads$.withContextClassLoader(Threads.scala:18)
at play.api.WithDefaultGlobal$class.play$api$WithDefaultGlobal$$globalInstance(Application.scala:48)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$globalInstance$lzycompute(Application.scala:399)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$globalInstance(Application.scala:399)
at play.api.WithDefaultGlobal$class.global(Application.scala:64)
at play.api.DefaultApplication.global(Application.scala:399)
at play.api.WithDefaultConfiguration$class.play$api$WithDefaultConfiguration$$fullConfiguration(Application.scala:78)
at play.api.DefaultApplication.play$api$WithDefaultConfiguration$$fullConfiguration$lzycompute(Application.scala:399)
at play.api.DefaultApplication.play$api$WithDefaultConfiguration$$fullConfiguration(Application.scala:399)
at play.api.WithDefaultConfiguration$class.configuration(Application.scala:80)
at play.api.DefaultApplication.configuration(Application.scala:399)
at play.api.Application$class.$init$(Application.scala:272)
at play.api.DefaultApplication.(Application.scala:399)
at play.core.StaticApplication.(ApplicationProvider.scala:50)
at play.core.server.NettyServer$.createServer(NettyServer.scala:280)
at play.core.server.NettyServer$$anonfun$main$3.apply(NettyServer.scala:316)
at play.core.server.NettyServer$$anonfun$main$3.apply(NettyServer.scala:311)
at scala.Option.map(Option.scala:145)
at play.core.server.NettyServer$.main(NettyServer.scala:311)
at play.core.server.NettyServer.main(NettyServer.scala)
Please give me some help.Thanks!
when unzip by 7zip
append data error :cloudera-quickstart-vm-5.4.2-0-vmware-s010.vmdk
Sql syntax error in file: WhereHows/data-model/DDL/ETL_DDL/kafka_tracking.sql
line 148, missing comma:
PRIMARY KEY(dataset_id,db_id,data_time_epoch,partition_grain,partition_expr)
Did you ever try execute the DDL after modify the sql?
I'm receiving driver not found error from Hive Jython script when running Hive metadata ETL.
I'm guessing it's not dynamically loading the driver based on my afternoon's worth of research. MySQL jar file is in lib directory.
I'm running with a very recent fork. Any help appreciated. Detailed log file & properties file attached.
Thank you!
2016-07-30 18:13:48 INFO HiveMetadataEtl:39 - In Hive metadata ETL, launch extract jython scripts
2016-07-30 18:13:49 ERROR Job Launcher:80 - Traceback (most recent call last):
File "", line 302, in
zxJDBC.DatabaseError: driver ["com.mysql.jdbc.Driver"] not found
at org.python.core.PyException.doRaise(PyException.java:198)
at org.python.core.Py.makeException(Py.java:1337)
at org.python.core.Py.makeException(Py.java:1341)
at com.ziclix.python.sql.zxJDBC.makeException(zxJDBC.java:329)
at com.ziclix.python.sql.connect.Connect.__call__(Connect.java:79)
at org.python.core.PyObject.__call__(PyObject.java:515)
at org.python.core.PyObject.__call__(PyObject.java:521)
at org.python.pycode._pyx0.f$0(<iostream>:308)
at org.python.pycode._pyx0.call_function(<iostream>)
at org.python.core.PyTableCode.call(PyTableCode.java:167)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1386)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:291)
at metadata.etl.dataset.hive.HiveMetadataEtl.extract(HiveMetadataEtl.java:42)
at metadata.etl.EtlJob.run(EtlJob.java:181)
at metadata.etl.Launcher.main(Launcher.java:75)
bug-driver-not-found.txt
hive-test.properties.txt
go-hive.sh.txt
POST /etl/ HTTP/1.1
Host: localhost:9001
Content-Type: application/json
Cache-Control: no-cache
Postman-Token: 687fc192-7ceb-34f3-06a0-be1d7110e46b
{
"wh_etl_job_name": "HADOOP_DATASET_METADATA_ETL",
"ref_id": 102,
"cron_expr": "20 * * * * ?",
"properties": {"hdfs.cluster": "hdp24ma",
"hdfs.remote.machine": "hdp24ma",
"hdfs.private_key_location": "/home/conf/.ssh/id_rsa",
"hdfs.remote.jar": "/home/conf/schemaFetch.jar",
"hdfs.remote.user": "conf",
"hdfs.remote.raw_metadata": "",
"hdfs.remote.sample": "",
"hdfs.local.field_metadata": "",
"hdfs.local.metadata": "",
"hdfs.local.raw_metadata": "/home/conf",
"hdfs.local.sample": "/home/conf",
"hdfs.white_list": "/home/conf/while",
"hdfs.file_path_regex_source_map": ""
},
"comments": "hdfs metadata etl"
}
I cannot see any error printed i nthe console or application.log
I want to submit an ETL job to scan HDFS and perform discovery service for me. Having hard time
Do you guys have any API or way to force a job to run? The only thing I can see is to update the cron_expr but that may cause the job to kick off to many times for testing. I just want to force it to run independently of the cron job.
Do you guys support Kerberos? I can't seem to find any documentation on using something like a keytab to allow the HDFS crawler the ability to maybe run as a super user that can look into any file or something of that nature.
I have a few questions about the hdfs.remote.* properties. In the documentation it is a bit unclear.
Hi,
I am getting this while running gradle build:
Caused by: org.gradle.api.internal.MissingMethodException: Could not find method module() for arguments [org.slf4j:slf4j-log4j12] on org.gradle.api.internal.artifacts.ivyservice.resolutionstrategy.DefaultResolutionStrategy_Decorated@7a1a3468.
at org.gradle.api.internal.AbstractDynamicObject.methodMissingException(AbstractDynamicObject.java:68)
at org.gradle.api.internal.AbstractDynamicObject.invokeMethod(AbstractDynamicObject.java:56)
at org.gradle.api.internal.CompositeDynamicObject.invokeMethod(CompositeDynamicObject.java:175)
at org.gradle.api.internal.artifacts.ivyservice.resolutionstrategy.DefaultResolutionStrategy_Decorated.invokeMethod(Unknown Source)
at build_a37ns0gqj6tuhjeqoghvih0lr$_run_closure2_closure16_closure17.doCall(/home/ec2-user/WhereHows/backend-service/build.gradle:43)
at org.gradle.api.internal.ClosureBackedAction.execute(ClosureBackedAction.java:67)
Hope you can help me, thanks.
VM Player is just hanging at step
Starting Wherehows..
loginLoggerObjName is null, make sure there is a logger with name azkaban.webapp.servelet.LoginAbstractAzkabanServlet
Any idea how to get around this issue?
thanks
I have backend service and web running, but when I go to http://localhost:9000 it doesn't do anything. Running on Ubuntu Linux server with a private IP.
I'm new here, but I have mysql set up and am not sure where I add the connection information on the application.conf file? If anyone has a mocked up example I would really appreciate it.
Start the web application server
Make sure your web/conf/application.conf file have the correct connection info for your mysql database. You can either edit it in your code or run with a specific configuration file location with '-Dconfig.file=$YOUR_CONFIG_FILE' option.
After adding and scheduling Metadata ETL job for hadoop I get following error in logs (when it executes)
2016-03-31 10:11:40 INFO Redirect schemaFetch output : :46 - 16/03/31 10:11:40 INFO wherehows.SchemaFetch: -- scanPath(/test)
2016-03-31 10:11:40 INFO Redirect schemaFetch output : :46 -
2016-03-31 10:11:40 INFO Redirect schemaFetch output : :46 - 16/03/31 10:11:40 INFO wherehows.SchemaFetch: trace table : /test
2016-03-31 10:11:40 INFO Redirect schemaFetch output : :46 - SchemaFetch exit
2016-03-31 10:11:41 INFO HdfsMetadataEtl:133 - ExecChannel exit-status: 0
2016-03-31 10:11:41 INFO HdfsMetadataEtl:141 - extract finished
2016-03-31 10:11:41 INFO HdfsMetadataEtl:158 - hdfs metadata transform
2016-03-31 10:11:42 INFO HdfsMetadataEtl:168 - hdfs metadata load
2016-03-31 10:11:42 ERROR Job Launcher:83 - Traceback (most recent call last):
File "", line 340, in
KeyError: wherehows.db.username
at org.python.core.Py.KeyError(Py.java:249)
at org.python.core.PyObject.__getitem__(PyObject.java:738)
at org.python.pycode._pyx1.f$0(<iostream>:356)
at org.python.pycode._pyx1.call_function(<iostream>)
at org.python.core.PyTableCode.call(PyTableCode.java:167)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1386)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:291)
at metadata.etl.dataset.hdfs.HdfsMetadataEtl.load(HdfsMetadataEtl.java:171)
at metadata.etl.EtlJob.run(EtlJob.java:183)
at metadata.etl.Launcher.main(Launcher.java:77)
2016-03-31 10:11:42 ERROR application:419 - *** Process + 6363 failed, status: 1
2016-03-31 10:11:42 ERROR application:419 - Error Details:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/atshukla/backend-service-1.0-SNAPSHOT/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/atshukla/backend-service-1.0-SNAPSHOT/lib/ch.qos.logback.logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLogg
erBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016-03-31 10:11:42 ERROR application:429 - ETL job jobtype:HADOOP_DATASET_METADATA_ETL refId:10001 refIdType:DB whEtlJobId:1 whEtlExecId37
got a problem
Hello,
first, thanks for open sourcing this project:) I'm now testing this project to see, what it can. I'm creating database structure, but I'm using MySQL 5.7 (this mysql is outsourced). I've got such a errors when executing create_all_tables_wrapper.sql script:
ERROR 1171 (42000) at line 19 in file: 'ETL_DDL/dataset_metadata.sql': All parts of a PRIMARY KEY must be NOT NULL; if you need NULL in a key, use UNIQUE instead
I've removed all problematic DEFAULT NULL
, patch file is here:
data-model-ddl_patch.txt
My question is: is it safe, or is this DEFAULT NULL
s on primary keys is required somewhere in code?
HI ,
I am trying to build it on my mac and running into the below issue , Any idea on how to fix it ?
EN-RajivC:metadata-etl rajiv.chodisetti$ gradle build
Defining custom 'build' task when using the standard Gradle lifecycle plugins has been deprecated and is scheduled to be removed in Gradle 3.0
:wherehows-common:compileJava
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: ')' expected
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: ')' expected
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: illegal start of expression
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: illegal start of expression
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: ';' expected
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
5 errors
:wherehows-common:compileJava FAILED
FAILURE: Build failed with an exception.
What went wrong:
Execution failed for task ':wherehows-common:compileJava'.
Compilation failed; see the compiler error output for details.
Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
BUILD FAILED
Total time: 4.9 secs
scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found
:backend-service:playCompile
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
[info] Loading project definition from /home/conflux/WhereHows/backend-service/project
[info] Set current project to backend-service (in build file:/home/conflux/WhereHows/backend-service/)
[info] Updating {file:/home/conflux/WhereHows/backend-service/}backend-service...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 4 Scala sources and 37 Java sources to /home/conflux/WhereHows/backend-service/target/scala-2.10/classes...
[error] error while loading , error in opening zip file
scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.
at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
My Env:
CentOS 6
JDK 1.8
User: usera (with sudo access). However ran it with just "gradle build". Should I do "sudo gradle build" ?
We want to choose WhereHows as metadata tool at one datawarehouse project, could you please whether WhereHows can support relational databases, such as DB2/Postgress data source?
Thank a lot.
Hi , 请问一下,能不能将自定义的任务调度系统嵌入到wherehows,对产生的上下游数据进行血缘分析。
如果可以的话,那应该注意什么问题,谢谢
很抱歉我直接使用中文,太多了,英语会比较慢,不过几位作者都是**人,应该都能看懂
很高兴看到这个项目,现在正好去缺少元数据管理,原来在京东商城时有类似的项目,不过到新公司的事一直在找是否有开源的此类项目,没有找到。
一个是对hive元数据进行发现同步,能够建立起元数据知识库,包括修改历史,字段解释,问答,全局查询。 另外能够通过数据仓库的调度系统,与表关联起来,构造起元数据的血缘关联。这正是我们需要的。
因为现在最大的困境就是分析师不知道用哪个表哪个字段,还有就是数据修改后,不清楚关联的其他哪些表会受影响。
我们使用的是cloudera公司的 cdh oozie hue hive sqoop一整套方案,之前也一直研究过通过oozie的输入路径构建起表的依赖关系。如果是sqoop action,建立 起与关系数据库对应关系,然后再监控关系数据库的表,如果有变动就报警。
如果是hive action,则建立 起数据仓库表之间的对应关系。
如果是我们的数据推送datachange aciton,则建立 起仓库表和目标系统的对应关系
如果关系能够清晰可见确实对整个系统非常有帮助。看到wherehows,发现很多想法一样
再次感谢几位作者,很希望能够参与其中!
Hi,
I'm using the VM (downloaded in 07/03/2016) to get to know the tool.
Although, I'm having some problems visualizing the Data lineage in the Front End, when I select one flow or dataset.
if I select a dataset, only the dataset box appears, and if I select a job (for instance the one shown in "Front User Guide" EmployeeAnalyze) it does show nothing but the job's name on top right.
Can anyone help?
Thank you.
Joao
I have the same issue with branch master and commit commit 753de7d .
Has the problem been solved yet?
Hello,
we have one question about endpoint GET:/dataset. When asking with urn query parameter, it is validated against storage types hardcoded in line: https://github.com/linkedin/WhereHows/blob/master/backend-service/app/models/utils/Urn.java#L31.
Is there particular reason why it is only teradata and hdfs?
I'm asking because we are adding our type of storage (Hermes, http://hermes-pubsub.readthedocs.io/en/latest/) and we are trying to feed lineage information. We want to do it by API, and when asking about dataset we are now getting "Urn format wrong!" ;-(
I want to move that into configuration file, but before that I want to make sure if it is OK with you:)
I would appreciate any thoughts and remarks.
Kind regards,
Rafal Kluszczynski
Thanks for open-sourcing this! I'm trying to get the quickstart examples running in the VM, but I'm having problems getting logged into the web client. I can get the frontend and backend running, but when I go to log in to the web client running on 9008 it spins for a while and then I get Invalid username or password
. The users table in mysql has an entry for wherehows
with a stored has that matches sha1("wherehows")
, so I don't think it's actually an auth issue. The backend console shows the following:
[error] application - Could not get JDBC Connection; nested exception is java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://172.21.98.60/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull, username = wherehows.
I've run /root/bin/enable-all-hadoop-service.sh
and tried starting the frontend before the backend and vice versa. This seems to be a pretty straightforward "I can't find mysql" error, but I'm kind of at a loss for how to fix it. Any help greatly appreciated!
Full trace from the frontend:
cloudera@quickstart wherehows]$ ./runfrontend
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
Play server process ID is 15205
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /0.0.0.0:9008
[error] application - Authentication failed for user wherehows
[error] application - Could not get JDBC Connection; nested exception is java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://172.21.98.60/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull, username = wherehows. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:350)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2408)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2445)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2230)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:813)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:399)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:334)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:363)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:627)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:692)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:724)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:734)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:784)
at org.springframework.jdbc.core.JdbcTemplate.queryForList(JdbcTemplate.java:889)
at dao.UserDAO.authenticate(UserDAO.java:109)
at security.AuthenticationManager.authenticateUser(AuthenticationManager.java:42)
at controllers.Application.authenticate(Application.java:130)
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(routes_routing.scala:341)
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(routes_routing.scala:341)
at play.core.Router$HandlerInvoker$$anon$7$$anon$2.invocation(Router.scala:183)
at play.core.Router$Routes$$anon$1.invocation(Router.scala:377)
at play.core.j.JavaAction$$anon$1.call(JavaAction.scala:56)
at play.GlobalSettings$1.call(GlobalSettings.java:64)
at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:91)
at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:90)
at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82)
at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82)
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:251)
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:249)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:37)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.<init>(Socket.java:434)
at java.net.Socket.<init>(Socket.java:244)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:300)
... 48 more
------
{
"return_code": 404,
"error_message": "PreparedStatementCallback; SQL [INSERT INTO cfg_application (app_id, app_code, description, uri, short_connection_string, parent_app_id, app_status, is_logical) VALUES (?, ?, ?, ?, ?, ?, ?, ?)]; Field 'tech_matrix_id' doesn't have a default value; nested exception is java.sql.SQLException: Field 'tech_matrix_id' doesn't have a default value"
}
When i build wherehows , i Encounter a problem
[error] error while loading , error in opening zip file
scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.
at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
.....
could u help me to solve this problem in details
In the Wiki page for "Database POST/PUT API" it mentions nothing about db_type_id but it is required for the call. In the mysql database there is a column for it as well.
I went to install WhereHows on a new server following the same instructions I used for an old server and I'm now getting errors like below when I do a gradle build. It looks like there were some code updates in github in the last week, but I don't see where it's related to these errors. Any thoughts???
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:18: object metadata is not a member of package
[error] import metadata.etl.models.EtlJobName;
[error] ^
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:35: not found: type EtlJobName
[error] public EtlJobMessage(EtlJobName etlJobName, EtlType etlType, Integer whEtlJobId, Integer refId, RefIdType refIdType, String cmdParam) {
[error] ^
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:17: object metadata is not a member of package
[error] import metadata.etl.models.EtlType;
[error] ^
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:35: not found: type EtlType
[error] public EtlJobMessage(EtlJobName etlJobName, EtlType etlType, Integer whEtlJobId, Integer refId, RefIdType refIdType, String cmdParam) {
[error] ^
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:19: object metadata is not a member of package
[error] import metadata.etl.models.RefIdType;
execute dataset POST API from the backend API tutorial:
curl -H "Content-Type: application/json" -X POST -d '{"name" : "DUMMY4","urn" : "teradata:///DWH_TMP/DUMMY4","schema" : "{"name": "DUMMY", "fields": [{"accessCount": null, "lastAccessTime": null, "nullable": "Y", "format": null, "type": "INT", "maxByteLength": 4, "name": "DUMMY", "doc": ""}]}","properties" : "{"storage_type": "View", "accessCount": 2670, "lastAccessTime": null, "sizeInMbytes": null, "referenceTables": ["DWH_DIM.DUMMY"], "viewSqlText": "REPLACE VIEW DWH_STG.DUMMY AS\nLOCKING ROW FOR ACCESS\n SELECT * \n FROM DWH_DIM.DUMMY;", "createTime": "2015-03-06 10:43:58", "lastAlterTime": "2015-03-10 20:57:16"}","schema_type" : "JSON","fields" : "{"DUMMY": {"type": "INT", "maxByteLength": 4}}","source" : "Teradata","source_created_time" : null,"location_prefix" : "DWH_STG","ref_dataset_urn" : null,"is_partitioned" : "Y","sample_partition_full_path": null,"parent_name" : null,"storage_type" : null,"dataset_type" : null,"hive_serdes_class" : null}' http://localhost:19001/dataset
the backend web service response with below error message:
{"return_code":404,"error_message":"Incorrect result size: expected 1, actual 0"}
copy and paste the stack trace as below:
org.springframework.dao.EmptyResultDataAccessException: Incorrect result size: expected 1, actual 0
at org.springframework.dao.support.DataAccessUtils.requiredSingleResult(DataAccessUtils.java:71)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.queryForObject(NamedParameterJdbcTemplate.java:212)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.queryForObject(NamedParameterJdbcTemplate.java:219)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.queryForMap(NamedParameterJdbcTemplate.java:243)
at models.daos.DatasetDao.getDatasetByUrn(DatasetDao.java:48)
at models.daos.DatasetDao.insertDataset(DatasetDao.java:57)
at controllers.DatasetController.addDataset(DatasetController.java:81)
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(routes_routing.scala:205)
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(routes_routing.scala:205)
at play.core.Router$HandlerInvoker$$anon$7$$anon$2.invocation(Router.scala:183)
at play.core.Router$Routes$$anon$1.invocation(Router.scala:377)
at play.core.j.JavaAction$$anon$1.call(JavaAction.scala:56)
at play.GlobalSettings$1.call(GlobalSettings.java:64)
at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:91)
at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:90)
at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82)
at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82)
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:278)
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:274)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:29)
at play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:37)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
In the following properties https://github.com/linkedin/WhereHows/blob/master/metadata-etl/src/main/resources/local_test.properties.template There are few fields that are not in the HDFS Dataset https://github.com/LinkedIn/Wherehows/wiki/Hdfs-Dataset or in the VM Table like:
hdfs.remote.working.dir=
hdfs.schema_location=
Are those deprecated?
希望能够得到表与表之间的字段的血缘关系,目前刚接触,所以提前来问问.谢谢
Build has failed several times, at different points but the main one is below (Execution failed for task ':backend-service:playCompile'). I've installed this on an identical ubuntu linux server without issue, any ideas???
[error] Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000f79c5000, 22691840, 0) failed; error='Cannot allocate memory' (errno=12)
[info] #
[info] # There is insufficient memory for the Java Runtime Environment to continue.
[info] # Native memory allocation (mmap) failed to map 22691840 bytes for committing reserved memory.
[info] # An error report file with more information is saved as:
[info] # /home/ubuntu/WhereHows/backend-service/hs_err_pid12621.log
error javac returned nonzero exit code
[error] Total time: 25 s, completed Jun 8, 2016 7:13:03 PM
:backend-service:playCompile FAILED
FAILURE: Build failed with an exception.
Process 'command '/home/ubuntu/play-2.2.4/play'' finished with non-zero exit value 1
Administrator@Win7 MINGW64 ~/repo/WhereHows (master)
$ gradle build
FAILURE: Build failed with an exception.
What went wrong:
A problem occurred configuring root project 'WhereHows'.
Could not resolve all dependencies for configuration ':classpath'.
Could not resolve gradle.plugin.nl.javadude.gradle.plugins:license-gradle-plugin:0.12.0.
Required by:
:WhereHows:unspecified
Could not resolve gradle.plugin.nl.javadude.gradle.plugins:license-gradle-plugin:0.12.0.
> Could not get resource 'https://plugins.gradle.org/m2/gradle/plugin/nl/javadude/gradle/plugins/license-gradle-plugin/0.12.0/license-gradle-plugin-0.12.0.pom'.
> Could not GET 'https://plugins.gradle.org/m2/gradle/plugin/nl/javadude/gradle/plugins/license-gradle-plugin/0.12.0/license-gradle-plugin-0.12.0.pom'.
> Connect to gradleware-plugins.s3.amazonaws.com:443 [gradleware-plugins.s3.amazonaws.com/54.231.12.161] failed: Connection timed out: connect
Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
BUILD FAILED
Total time: 29.256 secs
Hi, guys
I have setup the WhereHows on my local VM, and created the HDFS metadata ETL job successfully, but still have some problems. The tree view of dataset or flow on UI's left side panel doesn't display, it's always in the "Loading" status.
I have configured the wherehows.ui.tree.dataset.file & wherehows.ui.tree.flow.file in table wh_property, but the issue still repro.
And my question is: how are the files dataset.json and flow.json generated?
mysql> select * from wh_property;
+--------------------------------+------------------------------------------+--------------+------------+
| property_name | property_value | is_encrypted | group_name |
+--------------------------------+------------------------------------------+--------------+------------+
| wherehows.app_folder | /tmp/wherehows | N | NULL |
| wherehows.db.driver | com.mysql.jdbc.Driver | N | NULL |
| wherehows.db.jdbc.url | jdbc:mysql://localhost/wherehows | N | NULL |
| wherehows.db.password | wherehows | N | NULL |
| wherehows.db.username | wherehows | N | NULL |
| wherehows.ui.tree.dataset.file | /var/tmp/wherehows/resource/dataset.json | N | NULL |
| wherehows.ui.tree.flow.file | /var/tmp/wherehows/resource/flow.json | N | NULL |
+--------------------------------+------------------------------------------+--------------+------------+
Any help is highly appreciated.
thanks,
Jack
Hello,
I have updated code to latest version today and there are missing DDLs in data-model for new tables added dataset dependency api (#146). I mean here tables used in Hive ETL job:
Could you provide them, please?
At the moment I have got error:
ERROR jython script : HiveLoad:-2 - Table 'wherehows.stg_dict_dataset_instance' doesn't exist [SQLCode: 1146], [SQLState: 42S02]
Kind regards,
Rafal Kluszczynski
The test for wherehows is a joke. (I know it start from me...)
The standard way is to create a lot of mocks for all dependence call, e.g. the database connection call, the rest API call and so on, then use these mocks to test.
It would be very time consuming :( But it would be a longterm goal to achieved to let wherehows be a more mature and standard system.
which branch is the latest stable release version?
where is db_type_id coming from ?
.MySQLIntegrityConstraintViolationException: Column 'db_type_id' cannot be null"
{
"db_id": 10001,
"db_code": "TD",
"description": "TERADATA VM",
"cluster_size": 22,
"associated_data_centers": 1,
"replication_role": "MASTER",
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-TD"
}
Can we please have a sample "dataset.json" file shipped with the bundle for demo purpose . And also can you please provide some CURL based examples for "Dataset POST API" at the below wiki https://github.com/linkedin/WhereHows/wiki/Backend-API#dataset-post
I've noticed in the HDFS crawler you have standard text, avro, orc, but I did not see anything with Parquet. Is there any plans to include the ability to scan parquet? Also are you deriving the schema from the raw_metadata file? If this is true why in avro/parquet not derive the schema from the embedded schema in the file.
HI ,
Does this tool support pulling the database metadata from redshift/postgres db ?
Hi,
did you have some examples / detailed descriptions about Lineage POST api.
Json presented in documentation: https://github.com/linkedin/WhereHows/wiki/Backend-API#lineage-post isn't self descriptive :)
There are three sections, where two have source_target_type described as "source" and last one is "destination". Thats unclear because it looks like some lineage source and data source.
Thanks in advance :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.