datahub-project / datahub Goto Github PK

The Metadata Platform for your Data Stack

License: Apache License 2.0

Java 46.62% HTML 0.02% Python 33.43% JavaScript 1.49% Shell 0.36% TypeScript 17.13% Dockerfile 0.15% CSS 0.01% TSQL 0.01% Less 0.10% LookML 0.07% SCSS 0.27% Mustache 0.32% Scala 0.01% Makefile 0.01%

metadata linkedin datahub data-catalog data-discovery hacktoberfest

datahub's People

Contributors

Stargazers

Watchers

Forkers

donadam2015 drgrove domenicosolazzo jerrybai2009 nitingautam quan2d lcbasu tarunsinghal92 awesome joaocmreis jxcking sijojlouis sumit0k godlonely user99999 rtnidumolu githubno1 nagarajjayakumar noyahuni sunzhaonan 0xqq jackwangcumt schevalier ashitabh nickyuan 7396553 hanfeng0114 xiaoqingwang taiwen caiyj627 linearregression igdp haoqoo fnet123 megagao lexmao leobelive xiangfu0 lipengyu naiteluode hhy5277 hq20051252 benjaminyu tao518 fuhm andyz-2021 arekosinski testerrandolph superwood nkhuyu wangjun rainstar82 brucezhou2012 sksundaram-learning rkluszczynski daltonwang mt0803 lonely7345 fuhmdyd is00hcw jerry826 alyiwang dog-sunflower meteoritt strogo nvemuri1 macc6579 newlysoft planckiii dangyifei benbenzhu12321 lzm7455 gwworld digideskio xpaulz cs-dieter-kling shoudi camelliazhang ab212 apurbad debugger87 hunny-lh wsw2008new txseasar kioco a2903214 linkerzx xwangx corei7wumo coreixwumo clsun bleachzk simplesteph dmoore247 liguo86 mrfsong363 knowledgehacker kaikaigo57 diamondbigdata r-easy-harswas-s

datahub's Issues

Add admin pages for WhereHows

Currently, WhereHows can only config through API. So if I want to change one job's configuration or want to add a new job, I have run the service and use 'curl' call from command line follow the steps. This is a pain for a admin who need to manage and operate the WhereHows backend jobs. What I usually do is use a database tools(Aqua data studio) which have UI so I can open a table and directly change the value there. But absolutely this is not a standard way.

We should have several 'admin' pages in frontend that could configure WhereHows backend job. User with 'admin' permission login can enter the admin pages.
These admin pages are all one to one map to database configuration tables
They contains at least these pages :

Scheduled jobs --> wh_etl_job
job configurations --> wh_etl_job_property
all applications configurations --> cfg_application
all databases configurations--> cfg_database
wherehows property --> wh_property
job execution history (read only) --> wh_etl_job_execution

In lineage ETL, there are several configuration table also better need to have a frontend:

filename_pattern
dataset_partition_layout_pattern
log_lineage_pattern
log_reference_job_id_pattern

During Gradle Build, New Errors After Updating to Latest Commit

[info] Loading project definition from /Users/jvy234/Documents/workspace/WhereHows/web/project
[info] Set current project to wherehows (in build file:/Users/jvy234/Documents/workspace/WhereHows/web/)
[info] Updating {file:/Users/jvy234/Documents/workspace/WhereHows/web/}web...
[info] Packaging /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/wherehows_2.10-1.0-SNAPSHOT-sources.jar ...
[info] Wrote /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/wherehows_2.10-1.0-SNAPSHOT.pom
[info] Resolving com.typesafe.play#play-jdbc_2.10;2.2.4 ...
[info] Done packaging.
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 7 Scala sources and 71 Java sources to /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/classes...
[info] Main Scala API documentation to /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/api...
[error]
[error]      while compiling: /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/src_managed/main/views/html/index.template.scala
[error]         during phase: typer
[error]      library version: version 2.10.3
[error]     compiler version: version 2.10.3
[error]   reconstructed args: -bootclasspath /Library/Java/JavaVirtualMachines/jdk1.8.0_72.jdk/Contents/Home/jre/lib/resources.jar....

Later on I get:

[error]
[error]   last tree to typer: Literal(Constant({))
[error]               symbol: null
[error]    symbol definition: null
[error]                  tpe: String("{")
[error]        symbol owners:
[error]       context owners: method apply -> object index -> package html
[error]
[error] == Enclosing template or block ==
[error]
[error] Apply(
[error]   "format"."raw"
[error]   "{"
[error] )
[error]
[error] == Expanded type of tree ==
[error]
[error] ConstantType(value = Constant({))
[error]
[error] uncaught exception during compilation: java.lang.StackOverflowError
[error]
[error]      while compiling: /Users/jvy234/Documents/workspace/WhereHows/web/target/scala-2.10/src_managed/main/views/html/index.template.scala
[error]         during phase: typer
[error]      library version: version 2.10.3
[error]     compiler version: version 2.10.3
[error]   reconstructed args: -bootclasspath /Library/Java/JavaVirtualMachines/jdk1.8.0_72.jdk/Contents/Home/jre/lib/resources.jar....

And FInally getting:

[error]
[error]   last tree to typer: Literal(Constant(}))
[error]               symbol: null
[error]    symbol definition: null
[error]                  tpe: String("}")
[error]        symbol owners:
[error]       context owners: method apply -> object index -> package html
[error]
[error] == Enclosing template or block ==
[error]
[error] Apply(
[error]   "format"."raw"
[error]   "}"
[error] )
[error]
[error] == Expanded type of tree ==
[error]
[error] ConstantType(value = Constant(}))
[error]
[error] uncaught exception during compilation: java.lang.StackOverflowError
java.lang.StackOverflowError
    at scala.reflect.internal.Constants$Constant.tpe(Constants.scala:74)

Class path contains multiple SLF4J bindings

Has anyone else run into a similar error, and if so how did you resolve?

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/WhereHows/backend-service/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/play-2.2.4/repository/local/ch.qos.logback/logback-classic/1.0.13/jars/logback-classic.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

catch error when execute "play run"

I can run web once ,but now it occurs error, as pic below

java.lang.AbstractMethodError: akka.event.slf4j.Slf4jLogger.aroundReceive with backend server

version：1.0-SNAPSHOT

[ERROR] [08/24/2016 16:12:36.543] [WhereHowsETLService-akka.actor.default-dispatcher-4] [ActorSystem(WhereHowsETLService)] Uncaught error from thread [WhereHowsETLService-akka.actor.default-dispatcher-4] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled
java.lang.AbstractMethodError: akka.event.slf4j.Slf4jLogger.aroundReceive(Lscala/PartialFunction;Ljava/lang/Object;)V
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Uncaught error from thread [WhereHowsETLService-akka.actor.default-dispatcher-4] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[WhereHowsETLService]
java.lang.AbstractMethodError: akka.event.slf4j.Slf4jLogger.aroundReceive(Lscala/PartialFunction;Ljava/lang/Object;)V
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)

Question: Mailing list or slack channel?

Are there any group communication channels? I could not find this in looking through the Wiki, blog post,....

Feature Request: Business or Conceptual Object definitions

Make it possible to import, edit, add business or conceptual object definitions and then link them to environments, systems, databases, subject areas, entities, and fields. Support bi-directional navigation within the UI.

The link for downloading WhereHows VM is unaccessable

Seems the link below to download WhereHows VM is unaccessable:

https://linkedin.app.box.com/wherehows-demo-in-cloudera-vm

Could anyone help to provide a valid link?

cfg_object_name_map table schema is missing in data-model/DDL scripts

cfg_object_name_map table schema is missing in data-model/DDL scripts, but used in web/app/dao/LineageDAO.java

Configuration error: Configuration error[database.conf: 5: Could not resolve substitution to a value:

Hi,

I'm going back to the WhereHows root directory and start the metadata ETL and API service:
cd backend-service ; $PLAY_HOME/play run

then，throw an error：

SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
Oops, cannot start the server.
Configuration error: Configuration error[database.conf: 5: Could not resolve substitution to a value: ${WHZ_DB_PASSWORD}]
at play.api.Configuration$.play$api$Configuration$$configError(Configuration.scala:92)
at play.api.Configuration$.load(Configuration.scala:58)
at play.api.WithDefaultConfiguration$$anonfun$initialConfiguration$1.apply(Application.scala:72)
at play.api.WithDefaultConfiguration$$anonfun$initialConfiguration$1.apply(Application.scala:72)
at play.utils.Threads$.withContextClassLoader(Threads.scala:18)
at play.api.WithDefaultConfiguration$class.initialConfiguration(Application.scala:71)
at play.api.DefaultApplication.initialConfiguration$lzycompute(Application.scala:399)
at play.api.DefaultApplication.initialConfiguration(Application.scala:399)
at play.api.WithDefaultGlobal$class.play$api$WithDefaultGlobal$$globalClass(Application.scala:22)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$globalClass$lzycompute(Application.scala:399)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$globalClass(Application.scala:399)
at play.api.WithDefaultGlobal$class.play$api$WithDefaultGlobal$$javaGlobal(Application.scala:28)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$javaGlobal$lzycompute(Application.scala:399)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$javaGlobal(Application.scala:399)
at play.api.WithDefaultGlobal$$anonfun$play$api$WithDefaultGlobal$$globalInstance$1.apply(Application.scala:50)
at play.api.WithDefaultGlobal$$anonfun$play$api$WithDefaultGlobal$$globalInstance$1.apply(Application.scala:49)
at play.utils.Threads$.withContextClassLoader(Threads.scala:18)
at play.api.WithDefaultGlobal$class.play$api$WithDefaultGlobal$$globalInstance(Application.scala:48)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$globalInstance$lzycompute(Application.scala:399)
at play.api.DefaultApplication.play$api$WithDefaultGlobal$$globalInstance(Application.scala:399)
at play.api.WithDefaultGlobal$class.global(Application.scala:64)
at play.api.DefaultApplication.global(Application.scala:399)
at play.api.WithDefaultConfiguration$class.play$api$WithDefaultConfiguration$$fullConfiguration(Application.scala:78)
at play.api.DefaultApplication.play$api$WithDefaultConfiguration$$fullConfiguration$lzycompute(Application.scala:399)
at play.api.DefaultApplication.play$api$WithDefaultConfiguration$$fullConfiguration(Application.scala:399)
at play.api.WithDefaultConfiguration$class.configuration(Application.scala:80)
at play.api.DefaultApplication.configuration(Application.scala:399)
at play.api.Application$class.$init$(Application.scala:272)
at play.api.DefaultApplication.(Application.scala:399)
at play.core.StaticApplication.(ApplicationProvider.scala:50)
at play.core.server.NettyServer$.createServer(NettyServer.scala:280)
at play.core.server.NettyServer$$anonfun$main$3.apply(NettyServer.scala:316)
at play.core.server.NettyServer$$anonfun$main$3.apply(NettyServer.scala:311)
at scala.Option.map(Option.scala:145)
at play.core.server.NettyServer$.main(NettyServer.scala:311)
at play.core.server.NettyServer.main(NettyServer.scala)

Please give me some help.Thanks！

quick start vm can not unzip

when unzip by 7zip
append data error :cloudera-quickstart-vm-5.4.2-0-vmware-s010.vmdk

DDL execute fail

Sql syntax error in file: WhereHows/data-model/DDL/ETL_DDL/kafka_tracking.sql

line 148, missing comma:

PRIMARY KEY(dataset_id,db_id,data_time_epoch,partition_grain,partition_expr)

Did you ever try execute the DDL after modify the sql?

zxJDBC.DatabaseError: driver ["com.mysql.jdbc.Driver"] not found

I'm receiving driver not found error from Hive Jython script when running Hive metadata ETL.

I'm guessing it's not dynamically loading the driver based on my afternoon's worth of research. MySQL jar file is in lib directory.

I'm running with a very recent fork. Any help appreciated. Detailed log file & properties file attached.
Thank you!

2016-07-30 18:13:48 INFO HiveMetadataEtl:39 - In Hive metadata ETL, launch extract jython scripts
2016-07-30 18:13:49 ERROR Job Launcher:80 - Traceback (most recent call last):
File "", line 302, in
zxJDBC.DatabaseError: driver ["com.mysql.jdbc.Driver"] not found

    at org.python.core.PyException.doRaise(PyException.java:198)
    at org.python.core.Py.makeException(Py.java:1337)
    at org.python.core.Py.makeException(Py.java:1341)
    at com.ziclix.python.sql.zxJDBC.makeException(zxJDBC.java:329)
    at com.ziclix.python.sql.connect.Connect.__call__(Connect.java:79)
    at org.python.core.PyObject.__call__(PyObject.java:515)
    at org.python.core.PyObject.__call__(PyObject.java:521)
    at org.python.pycode._pyx0.f$0(<iostream>:308)
    at org.python.pycode._pyx0.call_function(<iostream>)
    at org.python.core.PyTableCode.call(PyTableCode.java:167)
    at org.python.core.PyCode.call(PyCode.java:18)
    at org.python.core.Py.runCode(Py.java:1386)
    at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
    at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:291)
    at metadata.etl.dataset.hive.HiveMetadataEtl.extract(HiveMetadataEtl.java:42)
    at metadata.etl.EtlJob.run(EtlJob.java:181)
    at metadata.etl.Launcher.main(Launcher.java:75)

bug-driver-not-found.txt
hive-test.properties.txt
go-hive.sh.txt

New ETL Job Post always returning 404

POST /etl/ HTTP/1.1
Host: localhost:9001
Content-Type: application/json
Cache-Control: no-cache
Postman-Token: 687fc192-7ceb-34f3-06a0-be1d7110e46b

{
"wh_etl_job_name": "HADOOP_DATASET_METADATA_ETL",
"ref_id": 102,
"cron_expr": "20 * * * * ?",
"properties": {"hdfs.cluster": "hdp24ma",
"hdfs.remote.machine": "hdp24ma",
"hdfs.private_key_location": "/home/conf/.ssh/id_rsa",
"hdfs.remote.jar": "/home/conf/schemaFetch.jar",
"hdfs.remote.user": "conf",
"hdfs.remote.raw_metadata": "",
"hdfs.remote.sample": "",
"hdfs.local.field_metadata": "",
"hdfs.local.metadata": "",
"hdfs.local.raw_metadata": "/home/conf",
"hdfs.local.sample": "/home/conf",
"hdfs.white_list": "/home/conf/while",
"hdfs.file_path_regex_source_map": ""
},
"comments": "hdfs metadata etl"
}

I cannot see any error printed i nthe console or application.log

I want to submit an ETL job to scan HDFS and perform discovery service for me. Having hard time

Force an ETL Job to Run

Do you guys have any API or way to force a job to run? The only thing I can see is to update the cron_expr but that may cause the job to kick off to many times for testing. I just want to force it to run independently of the cron job.

Kerberos Support

Do you guys support Kerberos? I can't seem to find any documentation on using something like a keytab to allow the HDFS crawler the ability to maybe run as a super user that can look into any file or something of that nature.

wh_etl_job_property confusion

I have a few questions about the hdfs.remote.* properties. In the documentation it is a bit unclear.

If I choose my remote.machine as localhost, there should be no jython copying of the .jar correct?
hdfs.remote.sample In the VM you have it populated like: .wherehows/hdfs_sample.dat what does that mean? What is .wherehows?
For the JAR do I build it from the source like gradle dist in the hadoop-dataset-extractor-standalone and than put it somewhere on the local machine and populate the hdfs.remote.jar location for it?
Do you have any performance metrics around hdfs.num_of_threads and what is recommended?

org.gradle.api.internal.MissingMethodException

Hi,

I am getting this while running gradle build:

Caused by: org.gradle.api.internal.MissingMethodException: Could not find method module() for arguments [org.slf4j:slf4j-log4j12] on org.gradle.api.internal.artifacts.ivyservice.resolutionstrategy.DefaultResolutionStrategy_Decorated@7a1a3468.
at org.gradle.api.internal.AbstractDynamicObject.methodMissingException(AbstractDynamicObject.java:68)
at org.gradle.api.internal.AbstractDynamicObject.invokeMethod(AbstractDynamicObject.java:56)
at org.gradle.api.internal.CompositeDynamicObject.invokeMethod(CompositeDynamicObject.java:175)
at org.gradle.api.internal.artifacts.ivyservice.resolutionstrategy.DefaultResolutionStrategy_Decorated.invokeMethod(Unknown Source)
at build_a37ns0gqj6tuhjeqoghvih0lr$_run_closure2_closure16_closure17.doCall(/home/ec2-user/WhereHows/backend-service/build.gradle:43)
at org.gradle.api.internal.ClosureBackedAction.execute(ClosureBackedAction.java:67)

Hope you can help me, thanks.

VM Player is just hanging at step Starting Wherehows...

VM Player is just hanging at step
Starting Wherehows..
loginLoggerObjName is null, make sure there is a logger with name azkaban.webapp.servelet.LoginAbstractAzkabanServlet

Any idea how to get around this issue?
thanks

http://localhost:9000 not working

I have backend service and web running, but when I go to http://localhost:9000 it doesn't do anything. Running on Ubuntu Linux server with a private IP.

Web Application Config - connection question

I'm new here, but I have mysql set up and am not sure where I add the connection information on the application.conf file? If anyone has a mocked up example I would really appreciate it.

Start the web application server

Make sure your web/conf/application.conf file have the correct connection info for your mysql database. You can either edit it in your code or run with a specific configuration file location with '-Dconfig.file=$YOUR_CONFIG_FILE' option.

error while executing HADOOP_DATASET_METADATA_ETL

After adding and scheduling Metadata ETL job for hadoop I get following error in logs (when it executes)

2016-03-31 10:11:40 INFO Redirect schemaFetch output : :46 - 16/03/31 10:11:40 INFO wherehows.SchemaFetch: -- scanPath(/test)
2016-03-31 10:11:40 INFO Redirect schemaFetch output : :46 -
2016-03-31 10:11:40 INFO Redirect schemaFetch output : :46 - 16/03/31 10:11:40 INFO wherehows.SchemaFetch: trace table : /test
2016-03-31 10:11:40 INFO Redirect schemaFetch output : :46 - SchemaFetch exit
2016-03-31 10:11:41 INFO HdfsMetadataEtl:133 - ExecChannel exit-status: 0
2016-03-31 10:11:41 INFO HdfsMetadataEtl:141 - extract finished
2016-03-31 10:11:41 INFO HdfsMetadataEtl:158 - hdfs metadata transform
2016-03-31 10:11:42 INFO HdfsMetadataEtl:168 - hdfs metadata load
2016-03-31 10:11:42 ERROR Job Launcher:83 - Traceback (most recent call last):
File "", line 340, in
KeyError: wherehows.db.username

    at org.python.core.Py.KeyError(Py.java:249)
    at org.python.core.PyObject.__getitem__(PyObject.java:738)
    at org.python.pycode._pyx1.f$0(<iostream>:356)
    at org.python.pycode._pyx1.call_function(<iostream>)
    at org.python.core.PyTableCode.call(PyTableCode.java:167)
    at org.python.core.PyCode.call(PyCode.java:18)
    at org.python.core.Py.runCode(Py.java:1386)
    at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
    at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:291)
    at metadata.etl.dataset.hdfs.HdfsMetadataEtl.load(HdfsMetadataEtl.java:171)
    at metadata.etl.EtlJob.run(EtlJob.java:183)
    at metadata.etl.Launcher.main(Launcher.java:77)

2016-03-31 10:11:42 ERROR application:419 - *** Process + 6363 failed, status: 1
2016-03-31 10:11:42 ERROR application:419 - Error Details:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/atshukla/backend-service-1.0-SNAPSHOT/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/atshukla/backend-service-1.0-SNAPSHOT/lib/ch.qos.logback.logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLogg
erBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

2016-03-31 10:11:42 ERROR application:429 - ETL job jobtype:HADOOP_DATASET_METADATA_ETL refId:10001 refIdType:DB whEtlJobId:1 whEtlExecId37
got a problem

Problem with preparing tables on MySQL 5.7

Hello,
first, thanks for open sourcing this project:) I'm now testing this project to see, what it can. I'm creating database structure, but I'm using MySQL 5.7 (this mysql is outsourced). I've got such a errors when executing create_all_tables_wrapper.sql script:

ERROR 1171 (42000) at line 19 in file: 'ETL_DDL/dataset_metadata.sql': All parts of a PRIMARY KEY must be NOT NULL; if you need NULL in a key, use UNIQUE instead

I've removed all problematic DEFAULT NULL, patch file is here:
data-model-ddl_patch.txt

My question is: is it safe, or is this DEFAULT NULLs on primary keys is required somewhere in code?

Build Error

HI ,

I am trying to build it on my mac and running into the below issue , Any idea on how to fix it ?

EN-RajivC:metadata-etl rajiv.chodisetti$ gradle build
Defining custom 'build' task when using the standard Gradle lifecycle plugins has been deprecated and is scheduled to be removed in Gradle 3.0
:wherehows-common:compileJava
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: ')' expected
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: ')' expected
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: illegal start of expression
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: illegal start of expression
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
/Users/rajiv.chodisetti/WhereHows/wherehows-common/src/main/java/wherehows/common/utils/PartitionPatternMatcher.java:30: error: ';' expected
Collections.sort(layoutList, (PartitionLayout o1, PartitionLayout o2) -> o1.getSortId().compareTo(o2.getSortId()));
^
5 errors
:wherehows-common:compileJava FAILED

FAILURE: Build failed with an exception.

What went wrong:
Execution failed for task ':wherehows-common:compileJava'.

Compilation failed; see the compiler error output for details.
Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 4.9 secs

MissingRequirementError: object scala.runtime in compiler mirror not found

scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found

:backend-service:playCompile
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
[info] Loading project definition from /home/conflux/WhereHows/backend-service/project
[info] Set current project to backend-service (in build file:/home/conflux/WhereHows/backend-service/)
[info] Updating {file:/home/conflux/WhereHows/backend-service/}backend-service...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 4 Scala sources and 37 Java sources to /home/conflux/WhereHows/backend-service/target/scala-2.10/classes...
[error] error while loading , error in opening zip file
scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.
at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)

My Env:
CentOS 6
JDK 1.8
User: usera (with sudo access). However ran it with just "gradle build". Should I do "sudo gradle build" ?

whether WhereHows support DB2/Oracle/Postgres data sources or not?

We want to choose WhereHows as metadata tool at one datawarehouse project, could you please whether WhereHows can support relational databases, such as DB2/Postgress data source?

Thank a lot.

我们是否可以嵌入自己的任务调度系统吗

 Hi ， 请问一下，能不能将自定义的任务调度系统嵌入到wherehows，对产生的上下游数据进行血缘分析。
如果可以的话，那应该注意什么问题，谢谢

非常棒的项目

很抱歉我直接使用中文，太多了，英语会比较慢，不过几位作者都是**人，应该都能看懂

很高兴看到这个项目，现在正好去缺少元数据管理，原来在京东商城时有类似的项目，不过到新公司的事一直在找是否有开源的此类项目，没有找到。

一个是对hive元数据进行发现同步，能够建立起元数据知识库，包括修改历史，字段解释，问答，全局查询。另外能够通过数据仓库的调度系统，与表关联起来，构造起元数据的血缘关联。这正是我们需要的。

因为现在最大的困境就是分析师不知道用哪个表哪个字段，还有就是数据修改后，不清楚关联的其他哪些表会受影响。

我们使用的是cloudera公司的 cdh oozie hue hive sqoop一整套方案，之前也一直研究过通过oozie的输入路径构建起表的依赖关系。如果是sqoop action，建立起与关系数据库对应关系，然后再监控关系数据库的表，如果有变动就报警。
如果是hive action，则建立起数据仓库表之间的对应关系。
如果是我们的数据推送datachange aciton,则建立起仓库表和目标系统的对应关系
如果关系能够清晰可见确实对整个系统非常有帮助。看到wherehows,发现很多想法一样

再次感谢几位作者，很希望能够参与其中！

VM - Data Lineage rendering issue

Hi,

I'm using the VM (downloaded in 07/03/2016) to get to know the tool.
Although, I'm having some problems visualizing the Data lineage in the Front End, when I select one flow or dataset.

if I select a dataset, only the dataset box appears, and if I select a job (for instance the one shown in "Front User Guide" EmployeeAnalyze) it does show nothing but the job's name on top right.

Can anyone help?
Thank you.
Joao

Class path contains multiple SLF4J bindings #155

I have the same issue with branch master and commit commit 753de7d .

Has the problem been solved yet？

Question about URN validation in endpoint GET:/dataset

Hello,
we have one question about endpoint GET:/dataset. When asking with urn query parameter, it is validated against storage types hardcoded in line: https://github.com/linkedin/WhereHows/blob/master/backend-service/app/models/utils/Urn.java#L31.

Is there particular reason why it is only teradata and hdfs?

I'm asking because we are adding our type of storage (Hermes, http://hermes-pubsub.readthedocs.io/en/latest/) and we are trying to feed lineage information. We want to do it by API, and when asking about dataset we are now getting "Urn format wrong!" ;-(

I want to move that into configuration file, but before that I want to make sure if it is OK with you:)
I would appreciate any thoughts and remarks.

Kind regards,
Rafal Kluszczynski

Cannot log into web client on quickstart VM

Thanks for open-sourcing this! I'm trying to get the quickstart examples running in the VM, but I'm having problems getting logged into the web client. I can get the frontend and backend running, but when I go to log in to the web client running on 9008 it spins for a while and then I get Invalid username or password. The users table in mysql has an entry for wherehows with a stored has that matches sha1("wherehows"), so I don't think it's actually an auth issue. The backend console shows the following:

[error] application - Could not get JDBC Connection; nested exception is java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://172.21.98.60/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull, username = wherehows.

I've run /root/bin/enable-all-hadoop-service.sh and tried starting the frontend before the backend and vice versa. This seems to be a pretty straightforward "I can't find mysql" error, but I'm kind of at a loss for how to fix it. Any help greatly appreciated!

Full trace from the frontend:

cloudera@quickstart wherehows]$ ./runfrontend 
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
Play server process ID is 15205
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /0.0.0.0:9008
[error] application - Authentication failed for user wherehows
[error] application - Could not get JDBC Connection; nested exception is java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://172.21.98.60/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull, username = wherehows. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
    at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
    at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:350)
    at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2408)
    at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2445)
    at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2230)
    at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:813)
    at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
    at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:399)
    at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:334)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:247)
    at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:363)
    at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
    at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
    at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
    at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:627)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:692)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:724)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:734)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:784)
    at org.springframework.jdbc.core.JdbcTemplate.queryForList(JdbcTemplate.java:889)
    at dao.UserDAO.authenticate(UserDAO.java:109)
    at security.AuthenticationManager.authenticateUser(AuthenticationManager.java:42)
    at controllers.Application.authenticate(Application.java:130)
    at Routes$$anonfun$routes$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(routes_routing.scala:341)
    at Routes$$anonfun$routes$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(routes_routing.scala:341)
    at play.core.Router$HandlerInvoker$$anon$7$$anon$2.invocation(Router.scala:183)
    at play.core.Router$Routes$$anon$1.invocation(Router.scala:377)
    at play.core.j.JavaAction$$anon$1.call(JavaAction.scala:56)
    at play.GlobalSettings$1.call(GlobalSettings.java:64)
    at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:91)
    at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:90)
    at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82)
    at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82)
    at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:251)
    at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:249)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:37)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.ConnectException: Connection timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at java.net.Socket.connect(Socket.java:538)
    at java.net.Socket.<init>(Socket.java:434)
    at java.net.Socket.<init>(Socket.java:244)
    at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
    at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:300)
    ... 48 more
------

Data model isn't consistent with the cfgdao

{
"return_code": 404,
"error_message": "PreparedStatementCallback; SQL [INSERT INTO cfg_application (app_id, app_code, description, uri, short_connection_string, parent_app_id, app_status, is_logical) VALUES (?, ?, ?, ?, ?, ?, ?, ?)]; Field 'tech_matrix_id' doesn't have a default value; nested exception is java.sql.SQLException: Field 'tech_matrix_id' doesn't have a default value"
}

when i build wherehows , i Encounter a problem

When i build wherehows , i Encounter a problem

[error] error while loading , error in opening zip file
scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.
at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
.....

could u help me to solve this problem in details

Wiki - Add DB for ETL Job

In the Wiki page for "Database POST/PUT API" it mentions nothing about db_type_id but it is required for the call. In the mysql database there is a column for it as well.

Issue with ./gradlew build

I went to install WhereHows on a new server following the same instructions I used for an old server and I'm now getting errors like below when I do a gradle build. It looks like there were some code updates in github in the last week, but I don't see where it's related to these errors. Any thoughts???

[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:18: object metadata is not a member of package
[error] import metadata.etl.models.EtlJobName;
[error] ^
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:35: not found: type EtlJobName
[error] public EtlJobMessage(EtlJobName etlJobName, EtlType etlType, Integer whEtlJobId, Integer refId, RefIdType refIdType, String cmdParam) {
[error] ^
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:17: object metadata is not a member of package
[error] import metadata.etl.models.EtlType;
[error] ^
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:35: not found: type EtlType
[error] public EtlJobMessage(EtlJobName etlJobName, EtlType etlType, Integer whEtlJobId, Integer refId, RefIdType refIdType, String cmdParam) {
[error] ^
[error] /home/ubuntu/WhereHows/backend-service/app/msgs/EtlJobMessage.java:19: object metadata is not a member of package
[error] import metadata.etl.models.RefIdType;

Backend API : dataset POST API reporting error

execute dataset POST API from the backend API tutorial:

curl -H "Content-Type: application/json" -X POST -d '{"name" : "DUMMY4","urn" : "teradata:///DWH_TMP/DUMMY4","schema" : "{"name": "DUMMY", "fields": [{"accessCount": null, "lastAccessTime": null, "nullable": "Y", "format": null, "type": "INT", "maxByteLength": 4, "name": "DUMMY", "doc": ""}]}","properties" : "{"storage_type": "View", "accessCount": 2670, "lastAccessTime": null, "sizeInMbytes": null, "referenceTables": ["DWH_DIM.DUMMY"], "viewSqlText": "REPLACE VIEW DWH_STG.DUMMY AS\nLOCKING ROW FOR ACCESS\n SELECT * \n FROM DWH_DIM.DUMMY;", "createTime": "2015-03-06 10:43:58", "lastAlterTime": "2015-03-10 20:57:16"}","schema_type" : "JSON","fields" : "{"DUMMY": {"type": "INT", "maxByteLength": 4}}","source" : "Teradata","source_created_time" : null,"location_prefix" : "DWH_STG","ref_dataset_urn" : null,"is_partitioned" : "Y","sample_partition_full_path": null,"parent_name" : null,"storage_type" : null,"dataset_type" : null,"hive_serdes_class" : null}' http://localhost:19001/dataset

the backend web service response with below error message:

{"return_code":404,"error_message":"Incorrect result size: expected 1, actual 0"}

copy and paste the stack trace as below:

org.springframework.dao.EmptyResultDataAccessException: Incorrect result size: expected 1, actual 0
at org.springframework.dao.support.DataAccessUtils.requiredSingleResult(DataAccessUtils.java:71)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.queryForObject(NamedParameterJdbcTemplate.java:212)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.queryForObject(NamedParameterJdbcTemplate.java:219)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.queryForMap(NamedParameterJdbcTemplate.java:243)
at models.daos.DatasetDao.getDatasetByUrn(DatasetDao.java:48)
at models.daos.DatasetDao.insertDataset(DatasetDao.java:57)
at controllers.DatasetController.addDataset(DatasetController.java:81)
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(routes_routing.scala:205)
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(routes_routing.scala:205)
at play.core.Router$HandlerInvoker$$anon$7$$anon$2.invocation(Router.scala:183)
at play.core.Router$Routes$$anon$1.invocation(Router.scala:377)
at play.core.j.JavaAction$$anon$1.call(JavaAction.scala:56)
at play.GlobalSettings$1.call(GlobalSettings.java:64)
at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:91)
at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:90)
at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82)
at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82)
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:278)
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:274)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:29)
at play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:37)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Discrepancy in local_test_properties.template

In the following properties https://github.com/linkedin/WhereHows/blob/master/metadata-etl/src/main/resources/local_test.properties.template There are few fields that are not in the HDFS Dataset https://github.com/LinkedIn/Wherehows/wiki/Hdfs-Dataset or in the VM Table like:

hdfs.remote.working.dir=
hdfs.schema_location=

Are those deprecated?

wherehows项目是否支持字段级别的追溯?

希望能够得到表与表之间的字段的血缘关系,目前刚接触,所以提前来问问.谢谢

Gradle Build - 'Cannot allocate memory'

Build has failed several times, at different points but the main one is below (Execution failed for task ':backend-service:playCompile'). I've installed this on an identical ubuntu linux server without issue, any ideas???

[error] Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000f79c5000, 22691840, 0) failed; error='Cannot allocate memory' (errno=12)
[info] #
[info] # There is insufficient memory for the Java Runtime Environment to continue.
[info] # Native memory allocation (mmap) failed to map 22691840 bytes for committing reserved memory.
[info] # An error report file with more information is saved as:
[info] # /home/ubuntu/WhereHows/backend-service/hs_err_pid12621.log
error javac returned nonzero exit code
[error] Total time: 25 s, completed Jun 8, 2016 7:13:03 PM
:backend-service:playCompile FAILED

FAILURE: Build failed with an exception.

What went wrong:
Execution failed for task ':backend-service:playCompile'.

Process 'command '/home/ubuntu/play-2.2.4/play'' finished with non-zero exit value 1

Build error with gradle plugin

Administrator@Win7 MINGW64 ~/repo/WhereHows (master)
$ gradle build

FAILURE: Build failed with an exception.

What went wrong:
A problem occurred configuring root project 'WhereHows'.

Could not resolve all dependencies for configuration ':classpath'.
Could not resolve gradle.plugin.nl.javadude.gradle.plugins:license-gradle-plugin:0.12.0.
Required by:
:WhereHows:unspecified
Could not resolve gradle.plugin.nl.javadude.gradle.plugins:license-gradle-plugin:0.12.0.
> Could not get resource 'https://plugins.gradle.org/m2/gradle/plugin/nl/javadude/gradle/plugins/license-gradle-plugin/0.12.0/license-gradle-plugin-0.12.0.pom'.
> Could not GET 'https://plugins.gradle.org/m2/gradle/plugin/nl/javadude/gradle/plugins/license-gradle-plugin/0.12.0/license-gradle-plugin-0.12.0.pom'.
> Connect to gradleware-plugins.s3.amazonaws.com:443 [gradleware-plugins.s3.amazonaws.com/54.231.12.161] failed: Connection timed out: connect
Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 29.256 secs

WhereHows UI: The tree view of dataset on the left panel cannot be displayed

Hi, guys

I have setup the WhereHows on my local VM, and created the HDFS metadata ETL job successfully, but still have some problems. The tree view of dataset or flow on UI's left side panel doesn't display, it's always in the "Loading" status.

I have configured the wherehows.ui.tree.dataset.file & wherehows.ui.tree.flow.file in table wh_property, but the issue still repro.

And my question is: how are the files dataset.json and flow.json generated?

Any help is highly appreciated.

thanks,
Jack

Missing DDLs for tables added in #146

Hello,
I have updated code to latest version today and there are missing DDLs in data-model for new tables added dataset dependency api (#146). I mean here tables used in Hive ETL job:

cfg_object_name_map
dict_dataset_instance
stg_cfg_object_name_map
stg_dict_dataset_instance

Could you provide them, please?

At the moment I have got error:

ERROR jython script : HiveLoad:-2 - Table 'wherehows.stg_dict_dataset_instance' doesn't exist [SQLCode: 1146], [SQLState: 42S02]

Kind regards,
Rafal Kluszczynski

test on mocks

The test for wherehows is a joke. (I know it start from me...)
The standard way is to create a lot of mocks for all dependence call, e.g. the database connection call, the rest API call and so on, then use these mocks to test.
It would be very time consuming :( But it would be a longterm goal to achieved to let wherehows be a more mature and standard system.

which branch is the latest stable release version?

simple doubt regarding db_type_id in POST http://localhost:9009/cfg/db

where is db_type_id coming from ?

.MySQLIntegrityConstraintViolationException: Column 'db_type_id' cannot be null"

{
"db_id": 10001,
"db_code": "TD",
"description": "TERADATA VM",
"cluster_size": 22,
"associated_data_centers": 1,
"replication_role": "MASTER",
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-TD"
}

dataset.json Not Found

Can we please have a sample "dataset.json" file shipped with the bundle for demo purpose . And also can you please provide some CURL based examples for "Dataset POST API" at the below wiki https://github.com/linkedin/WhereHows/wiki/Backend-API#dataset-post

Parquet Support

I've noticed in the HDFS crawler you have standard text, avro, orc, but I did not see anything with Parquet. Is there any plans to include the ability to scan parquet? Also are you deriving the schema from the raw_metadata file? If this is true why in avro/parquet not derive the schema from the embedded schema in the file.

Does it Support Pulling Metadata from Redshift/Postgres

HI ,

Does this tool support pulling the database metadata from redshift/postgres db ?

Lineage POST api

Hi,
did you have some examples / detailed descriptions about Lineage POST api.
Json presented in documentation: https://github.com/linkedin/WhereHows/wiki/Backend-API#lineage-post isn't self descriptive :)
There are three sections, where two have source_target_type described as "source" and last one is "destination". Thats unclear because it looks like some lineage source and data source.
Thanks in advance :)

datahub-project / datahub Goto Github PK

datahub's People

Contributors

Stargazers

Watchers

Forkers

datahub's Issues

Recommend Projects

Recommend Topics

Recommend Org