Code Monkey home page Code Monkey logo

Comments (12)

maoxingda avatar maoxingda commented on July 17, 2024
  1. 能举个例子么
  2. 再说一下你的预期或者建议

from sqllineage.

LiuWeidongK avatar LiuWeidongK commented on July 17, 2024

非常感谢您的回复,我这边在建设大数据领域的字段血缘逻辑遇到些问题,看到你写的文章想了解下方案实现的细节

字段血缘的DAG,不能独立于表级血缘。 理想情况下,只维护一份统一的血缘图。至于实现,可以有两种:

  1. 把DAG做到字段粒度,通过一些转换, 可以计算出表级血缘的DAG。用关系型数据库的概念来做类比,就像先做一张明细表,在明细表的基础上聚合可以得到汇总表。
  2. 通过属性图的形式来建模, 可以参照JanusGraph的文档。表和字段分别是两种类型 的节点,同时另外还有两种类型的边, 其一是字段到表的所属关系,其二是字段与字段、表与表的血缘关系。

你这里的方案2能详细介绍下么,我举一个简单的例子
image

对于上图来说,Hive TB1 和 Hive TB2 之间的表血缘和字段血缘可以正常构建,但后续通过hive2kafka任务加工得到的kafka实体可能就没有字段级血缘,这种情况在分析一个字段的全部下游时可能就会断在这个没有字段血缘的实体上;
这边不太确定你这里的方案2是如何处理的,所以想请教一下。

from sqllineage.

maoxingda avatar maoxingda commented on July 17, 2024

你截图 hive2kafka & kafka2clickhouse 具体的处理是什么,是SQL么。如果是为什么会没有字段血缘?

from sqllineage.

LiuWeidongK avatar LiuWeidongK commented on July 17, 2024

是一个配置化的数据同步任务,这里拿这两种任务类型举例,假设kafka实体不存在字段血缘

from sqllineage.

maoxingda avatar maoxingda commented on July 17, 2024

kafka实体的上,下游实体之间有没有字段血缘。

from sqllineage.

LiuWeidongK avatar LiuWeidongK commented on July 17, 2024

有,kafka上游的hive有字段血缘,下游的clickhouse也有字段血缘,只有kafka没有

from sqllineage.

maoxingda avatar maoxingda commented on July 17, 2024

我说的是H2_C2, CK_C1之间有没有血缘

from sqllineage.

LiuWeidongK avatar LiuWeidongK commented on July 17, 2024

这个是没有的

from sqllineage.

maoxingda avatar maoxingda commented on July 17, 2024

那我理解你的血缘就应该在kafka实体上游就停止了,这是符合预期的吧

from sqllineage.

maoxingda avatar maoxingda commented on July 17, 2024

字段血缘

from sqllineage.

koftcl avatar koftcl commented on July 17, 2024

CK_C1 应该是 H2_C1 的子代, CK_C2 是 H2_C2 的子代

from sqllineage.

maoxingda avatar maoxingda commented on July 17, 2024

我理解在列级别血缘,你这个图没有你说的这个关系

from sqllineage.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.