Light

zhiting-wang / e-commerce-datawarehouse5.0 Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 71.79 MB

Java 3.29% Shell 91.94% Python 4.77%

e-commerce-datawarehouse5.0's Introduction

系统数据流程设计

（注：本项目不包括实时数仓部分）

电商业务表

架构设计

集群服务规划

服务名称	子服务	node1	node2	node3
HDFS	NameNode	✓
	DataNode	✓	✓	✓
	SecondaryNameNode			✓
YARN	Resourcemanager		✓
	NodeManager	✓	✓	✓
Zookeeper	Zookeeper Server	✓	✓	✓
Flume（采集日志）	Flume	✓	✓
Kafka	Kafka	✓	✓	✓
Flume（消费日志）	Flume			✓
Flume（消费业务）	Flume			✓
MySQL	MySQL	✓
DataX	DataX	✓	✓	✓
Maxwell	Maxwell	✓
Hive	-	✓	✓	✓
Spark	-	✓	✓	✓
DolphinScheduler	MasterServer	✓
	WorkerServer	✓	✓	✓

数据采集

用户行为日志采集

模拟数据生成：使用applog生成模拟数据到磁盘上（修改application.yml指定日期后，通过 getlog.sh 生成数据）
日志同步：启动zookeeper、hadoop、kafka、flume1、flume2，自动识别磁盘文件改动，同步至HDFS上

业务数据采集

模拟数据生成：使用dblog生成模拟数据到mysql上（修改application.properties指定日期后，通过jar包生成数据）
全量表导入：启动hadoop，使用 gen_import_config.sh 生成dataX使用的json文件（执行一次即可），再通过 mysql_to_hdfs_full.sh all 日期 同步数据（注意是最新日期）
增量表同步：启动mysql、zookeeper、hadoop、kafka、flume3、maxwell，使用 mysql_to_kafka_inc_init.sh all 同步首日数据（执行一次即可），后续如果需要生成其它日期的数据时，需要先修改maxwell配置文件里的日期并重启maxwell，再使用dblog（修改配置文件与前面一致）生成新一天的数据，模拟增量同步

数据仓库架构图

采用维度建模

Hive表

建表与数据导入，使用的引擎为 Hive on Spark

hdfs -> ods，包括行为日志（log）和业务数据（db）
ods -> dwd
ods -> dim
dwd -> dws，包括dws_1d、dws_nd 和 dws_td
dws -> ads

调度器

使用 dolphin scheduler 进行脚本调度

头尾的 mysql <-> hdfs 脚本调用的是 dataX
中间的数仓脚本调用的是 Hive on Spark

指标看板

最终在SuperSet上集成离线指标看板效果

e-commerce-datawarehouse5.0's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.