Code Monkey home page Code Monkey logo

tooltest's Introduction

Cumultive Data Project Weeks 1-5 Internship (First Half)

Resources used: Apache Kafka, Apache Hadoop, Apache Flume, Hadoop DFS, Java, Maven, Log4J, CentOS, Apache Zookeeper.

Steps 1-5 can be edited to just take data (if doing practice with local files) and put it onto HDFS with the command

"hadoop fs -copyFromLocal 'file/address/in/linux' 'hdfs/location/' "

Step 1:

Use collected data on 1 Hive table (Hue/Company HDFS) and store it onto Personal HDFS

insert overwrite directory '/user/hue/sample_test' row format delimited fields terminated by '|' select device_idfa,device_mac,device_manufacturer,device_screen_pixel_metric,device_model from adcocoa_device where device_idfa is not null and device_idfa != 'null'

The command above stores the data into small pieces in /user/hue/sample_test. There the files can be downloaded and imported onto local file system

Step 2:

Write Java Program to read files from local environment, parse linearly, and send data to Kafka using the log4j and kafka packages provided on Maven.

/src/main/java/parser.java

Step 3:

Wrap package into .jar file and export to HDFS for Kafka/Flume Processing

Step 4:

Start up all needed resources on CentOS (Linux Distribution I'm using, yours may be different). Name/Data nodes, Zookeeper, Kafka, Hadoop. Do -> jps <- to make sure all of them are online.

Step 5:

Run Flume to have a receiver after setting up a Flume.conf file.

flume-ng agent -n flume1 -c conf -f flume.conf - Dflume.root.logger=INFO,console

Run the java -jar file to start the Kafka Producer

java -jar tooltest-VERSION-SNAPSHOT.jar

With both the Consumer/Producer running, the files from the folder will now be read into Hadoop Distributed File System (HDFS) and stored under '/user/kafka/database/%topic/%y-%m-%d'

Step 6:

Install and configure Hive. Start up Hive.

'$HIVE_HOME/bin/hive'

Create a table in hive delimited by whatever you are delimited by, in this case it's the pipe character |

create table tablename(a int, b string, c string, d string, e string)

row format delimited

fields delimited by '\|';

Load data from hdfs into hive table

'load data inpath 'filepath/path' into table tester2

Step 7:

Create sorted table that sorts by phone brand, we'll use this data to create a visual after sending to MySQL

'insert into table sortorder select phone,count(phone) as phoneCount from tester2 group by phone order by phoneCount desc;'

This is a sorted table with entries in 2 columns of phone brand and the # of times that people using that brand have accessed our app.

Step 8:

Use Sqoop (ver 1.4.6 compatible with Hadoop 2.8.0) to export data from hive warehouse to MySQL for web visual integration.

./sqoop export --connect jdbc:mysql://localhost/test --username root -P --table test --fields-terminated-by ',' --lines-terminated-by '\n' --export-dir /user/hive/warehouse/tester2

Step 9:

See other project for continued development, including processing SQL Data to a webpage using Java and Spring

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.