bjzu / logprocessing Goto Github PK
View Code? Open in Web Editor NEWThis project forked from cloudian/logprocessing
Log processing system using Flume and Cassandra
Home Page: http://www.geminimobile.com
This project forked from cloudian/logprocessing
Log processing system using Flume and Cassandra
Home Page: http://www.geminimobile.com
CDR Logprocessing plugin for Flume ================================== Source organization flume-plugin - source of flume plugin that writes CDR logs to cassandra. scripts - simple perl script that generates sample CDR logs for testing. Getting Flume & Thrift ====================== https://github.com/cloudera/flume (master was used to test sample) http://incubator.apache.org/thrift/download/ Thrift was compiled with the following option: ./configure --enable-gen-java=yes --enable-gen-cpp=yes --enable-gen-erlang=no --enable-gen-perl=no --enable-gen-py=no --enable-gen-php=no --with-boost=no; make Assuming flume was installed under $HOME/flume, create a symlink to thrift-0.5.0/compiler/cpp/thrift, under $HOME/flume Under $HOME/flume, ant flume-plugin ============ This plugin allows you to use Cassandra as a Flume sink for CDR logs. Getting Started --------------- 1) This plugin was built using flume-0.9.3-core.jar, which is delivered as part of package. 2) cd cassandra; ant release; 3) Copy cdr_logprocessing-0.1.tar.gz to $HOME/flume directory and uncompress it. 4) Add the following to your .bashrc file export FLUME_HOME=$HOME/flume export FLUME_LOG_DIR=/tmp export FLUME_PID_DIR=/tmp export FLUME_CONF_DIR=$HOME/flume/conf export FLUME_CLASSPATH=$HOME/flume/cdrplugin/lib/apache-cassandra-0.7.0.jar:$HOME/flume/cdrplugin/lib/avro-1.4.0-rc4.jar:$HOME/flume/cdrplugin/lib/cdr_logprocessing-0.1.jar:$HOME/flume/cdrplugin/lib/commons-lang-2.4.jar:$HOME/flume/cdrplugin/lib/hector-core-0.7.0-22.jar:$HOME/flume/cdrplugin/lib/high-scale-lib-1.1.1.jar:$HOME/flume/cdrplugin/lib/jug-asl-2.0.0.jar:$HOME/flume/cdrplugin/lib/log4j-1.2.14.jar:$HOME/flume/cdrplugin/lib/perf4j-0.9.13.jar:$HOME/flume/cdrplugin/lib/slf4j-api-1.5.11.jar:$HOME/flume/cdrplugin/lib/slf4j-log4j12-1.5.8.jar 4. Modify flume-site.xml (you may start out by copying flume-site.xml.template and removing the body of the file) to include: <configuration> <property> <name>flume.plugin.classes</name> <value>com.gemini.logprocessing.cassandra.CDRCassandraSink</value> <description>Comma separated list of plugin classes</description> </property> </configuration> scripts ======= loggen.pl will write sample CDR entries to /tmp/cdr.log. We can use this script for testing our setup. Usage ----- This plugin primarily targets CDR log storage right now. 1) The following needs to be installed in cassandra using cli connect <hostname>/9160; create keyspace CDRLogs with replication_factor = 2 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; use CDRLogs; create column family MSISDNTimeline with column_type = 'Standard' and comparator = 'BytesType'; create column family CDREntry with column_type = 'Standard' and comparator = 'BytesType'; create column family HourlyTimeline with column_type = 'Standard' and comparator = 'BytesType'; 2) In flume config you call this sink as CDRCassandraSink("cassandra_host:cassandra_port",ColumnFamilyForRawCDR); where cassandra_host:cassandra_port - cassandra host/port combination ColumnFamilyforRawCDR - CF where raw cdr entries for this market are to be stored. 3) In our test environment, we had NodeM - running flume master, NodeA - running flume agent and NodeC - running flume collector & cassandra-0.7.2 3.1) On NodeM 3.1.1) Export all environment variables. 3.1.2) cd $FLUME_HOME; bin/flume master 3.1.3) http://NodeM:35871/flumemaster.jsp will all active nodes and their configuration. 3.2) On NodeA 3.2.1) Edit flume-site.xml and add NodeM as master 3.2.2) cd $FLUME_HOME; bin/flume node_nowatch 3.2.3) http://NodeA:35862/flumeagent.jsp will display statistics. 3.3) On NodeC 3.3.1) Edit flume-site.xml and add NodeM as master 3.3.2) cd $FLUME_HOME; bin/flume node_nowatch -n collector 3.3.3) http://NodeC:35862/flumeagent.jsp will display statistics. 4) Go to http://NodeM:35871/flumeconfig.jsp and configure the nodes. 4.1) For NodeA - Source is tail("/tmp/cdr.log") and Sink is agentSink("NodeC",35853) 4.2) For NodeC - Source is collectorSource(35853) and Sink is CDRCassandraSink("NodeC:9160", "CDRRaw_market1") 5) Go to http://NodeM:35871/flumemaster.jsp and if nodes were configured correctly, all nodes should show up as 'ACTIVE' 6) On NodeA - run the script perl loggen.pl (NOTE: This script will write to log file in a for(;;) loop) 7) Verify data in cassandra using cassandra-cli; Issues ------ 1) CDR format currently supported is of form operatorId,operatorMarket,transactionId,cdrType,messageTimestamp,moIMSI,moIP,mtIP,PTN,msgType,moDomain,mtDomain
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.