Code Monkey home page Code Monkey logo

c225274 / demo_11.11_storm-spark-hadoop Goto Github PK

View Code? Open in Web Editor NEW

This project forked from liguozhong/demo_11.11_storm-spark-hadoop

1.0 2.0 0.0 123 KB

hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售量,各个省份销售排行,以及后期的SQL分析,数据分析,数据挖掘等。 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)

Java 100.00%

demo_11.11_storm-spark-hadoop's Introduction

-_11.11_storm-spark-hadoop

hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售量,各个省份销售排行,以及后期的SQL分析,数据分析,数据挖掘等。 --------大概流程------- 第一阶段(storm实时报表) (1)用户订单入kafka队列, (2)经过storm,实时计算出总销售量,和各个省份的的销售量, (3)将计算结果保存到hbase数据库中。

第二阶段(离线报表) (1)用户订单入oracle数据库, (2)通过sqoop把数据导入hadoop上。 (3)使用mr和rdd对hadoop上的原始订单做etl清洗 (4)建立hive表和sparkSQL内存表。为后期分析做基础 (5)使用HQL实现业务指标分析,和用户画像分析,将结果存在mysql中。供web前台使用

第三阶段(大规模订单即席查询,和多维度查询) (1)用户订单入oracle数据库, (2)通过sqoop把数据导入hadoop上。 (3)写mr把hadoop的数据加载到hbase上 (4)使用hbase java api实现订单的即席查询 (5)solr绑定hbase,做多维度的条件查询

第四阶段(数据挖掘和图计算) (1)用户订单入oracle数据库, (2)通过sqoop把数据导入hadoop上。 (3)使用mr和rdd对hadoop上的原始订单做etl清洗 (4.1)使用mahout的关联规则做套餐推荐,mllib的als算法做协调过滤推荐。mllib的lr算法做是否购买的分类算法的推荐模块,mllib的kmeans对用户做聚类。寻找优质客户。 (4.2)使用graphX,寻找最热商品。基于图上的随机游走算法等图功能 --------大概流程-------

demo_11.11_storm-spark-hadoop's People

Contributors

liguozhong avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.