Code Monkey home page Code Monkey logo

d4m's Introduction

NOTE: This is the Accumulo 1.6.0+ version.
*This build will not work against Accumulo 1.5 and previous.*

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% D4M: Dynamic Distributed Dimensional Data Model
% Architect: Dr. Jeremy Kepner ([email protected])
% MIT Lincoln Laboratory
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% (c) <2010> Massachusetts Institute of Technology
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1. INTRODUCTION

  D4M is a library that allows unstructured data to be represented
as triples in sparse matrices (Associative Arrays) and can be
manipulated using standard linear algebraic operations.
Using D4M it is possible to construct advanced analytics
with just a few lines of code.
  D4M also supports parallel computing and connections to
high performance databases (e.g., Accumulo).

2. DOCUMENTATION

  For installation, please read this short (~5 page) document.
  For usage please see the eight lecture course in d4m_api/docs directory.
  For examples please see the numerous examples (ending in TEST.m) in the d4m_api/examples directory.
  When citing D4M in publications please use:

    [Kepner et al, ICASSP 2012] Dynamic Distributed Dimensional Data Model (D4M) 
    Database and Computation System, J. Kepner, W. Arcand, W. Bergeron, N. Bliss, 
    R. Bond, C. Byun, G. Condon, K. Gregson, M. Hubbell, J. Kurz, A. McCabe, P. Michaleas,
    A. Prout, A. Reuther, A. Rosa & C. Yee, ICASSP (International Conference on Acoustics,
    Speech, and Signal Processing), Special session on Signal and Information Processing
    for "Big Data" (organizers: Bliss & Wolfe), March 25-30, 2012, Kyoto, Japan 

3. REQUIREMENTS

  D4M (standalone)
  -Requires Matlab (www.mathworks.com/matlab) or GNU Octave 3.2+ (www.octave.org)

  D4M Parallel
  -Requires pMatlab (www.ll.mit.edu/pMatlab)

  D4M Database
  -Requires D4M database connector jar (see d4m_api/lib)
  -Requires various 3rd party jars (see d4m_api/libext)
  -Requires a running database
    -D4M provides full support to Accumulo (accumulo.apache.org)
    -D4M provides query support to SQL databases via JTDS (jtds.sourceforge.net)
  -GNU octave < 3.6 requires the Java package

4. LICENSE

  D4M follows the highly successful FFTW MIT licensing model (see fftw.org) and
is avalable via a number of licenses: Free (GPL), U.S. Gov't Agency,
U.S. Gov't Contractor, and Commercial.  See additional documentation in the distribution.

5. INSTALLATION

  Extract d4m_api.X.X.X.zip in your local directory.
If you want to connect to a database, then also download and extract the external libraries
libext.X.X.X.zip file and place it in the d4m_api/ directory.  This should result
in a distribution containing:

          d4m_api-X.X.X
            docs/
            examples/
            lib/
            libext/
            matlab_src/
            TEST/

 From here on we will refer to the full path to d4m_api-X.X.X as D4M_HOME
and ">>" denotes the Matlab (or GNU Octave) prompt.

6. QUICKSTART

  (1) Start Matlab (or GNU Octave)
  (2) Add the D4M library to your path by typing

  >> addpath('D4M_HOME/matlab_src')

  (3)  Done.

  Display the function refernce by typing:

  >> help D4M

  Run the first example by typing:

  >> cd D4M_HOME/examples/1Intro/1AssocIntro
  >> AI1_SetupTEST

7. ADDING PARALLEL AND DATABASE CAPABILITIES

  It is recommended that the D4M setup be placed in the Matlab ~/matlab/startup.m or GNU Octave ~/.ocatverc file.
[Note: Windows users should consult their Matlab/Octave documentation to determine where this should exist.]
Below is a fully commented example of what this file might look like:

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  D4M_HOME = '/Users/kepner/SVN/d4m_api';       % SET TO LOCATION OF D4M.

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  addpath([D4M_HOME '/matlab_src']);            % Add the D4M library.

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  Assoc('','','');                              % Initialize library.

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  % Uncomment the following line to enable the D4M database connector.
  %DBinit;    % This requires that the libext/ directory is in place.

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  % Uncomment and modify the following four lines for parallel D4M.
  %PMATLAB_HOME = '/Users/kepner/SVN/pMatlab';   % SET location of pMatlab.
  %addpath([PMATLAB_HOME '/MatlabMPI/src']);    % Add MatlabMPI.
  %addpath([PMATLAB_HOME '/src']);              % Add pMatlab.
  %pMatlabGlobalsInit;
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

8. TESTING

  To run all the examples, cd to the examples/ directory, start matlab (or GNU Octave) and type:

  >>  cd D4M_HOME/examples
  >>  d4mTestAllExamples             

  NOTE: Some of the programs in examples/3Scaling/2ParallelDatabase require a valid database connection.
  To run in parallel these programs also require pMatlab (www.ll.mit.edu/pMatlab/).

  To configure the Database, you will need to uncomment and modify the DB = DBsetup(...) command in 
  examples/3Scaling/2ParallelDatabase/DBsetup.m


9. RUNNING IN PARALLEL

  Several parallel examples can be found in examples/3Scaling/2ParallelDatabase.
To run in parallel edit an example (e.g., pDB02_FileTEST.m) by uncommenting the lines
marked "% PARALLEL."  To run on 4 processors on your local machine type:

  >> cd D4M_HOME/examples/3Scaling/2ParallelDatabase
  >> eval(pRUN('pDB02_FileTEST',4,{}))


10. DATABASE CONNECTION

10.1 Seting up an Accumulo connection
 To establish an Accumulo connection in D4M, use the DBserver object.

  >> DB = DBserver(host, db_type, instance_name, [username],[password])

DBserver needs the following parameters
  host name :   zookeeper host name
  database type:  always use 'Accumulo' as the parameter value
  instance name:  Accumulo instance name
  user name:   user name on database. 
  password:    password for user  


  You will be prompted for a username and password if you don't include them.
As you type the password you will not see anything displayed, so type carefully and hit return.

    >> hostname='localhost'
    >> cb_type = 'Accumulo'
    >> instance_name='Accumulo'
    >> DB = DBserver(hostname,cb_type,instance_name);

      Enter a username:
          JoeUser <return>
      Enter a password.
              <return>

10.2 Create a table or get an existing table in Accumulo
  D4M has 2 flavors of database table objects - DBtable and DBtablePair.
With these table objects, you have access to the data.
Once you have the DBserver object, you can create a single table or get an existing table
by instantiating a DBtable object by passing a name of the table to the DBserver object.

  >> T = DB('MyTableName');

To create DBtablePair object,

  >> TT = DB('MyTable','MyTableTranspose');

10.3 Querying for data
  You can query for data via  the DBtable or DBtablePair.
The syntax is

  >> A =  T(row_key,column_key)

The results from the query are contained in an associative array object Assoc.

  >> A = T(:,:)

This query will give you back all the data from T in a Assoc object.

The row_key and column_key have a particular format to follow:

 ":"  colon indicate all results.

 'cat,fat,hat,'  queries for cat, fat, and hat
  Note, the ending comma is a necessary delimiter to include in the query string.

 'cat,:,pat,' will query for a range, from cat through pat


10.4 Examples:

  This will search the rows for cat, hat , and sat and any columns.

  >> A = T('cat,hat,sat,',:)

  This query will give me back the range between cat and sat, and all columns.

  >> A= T('cat,:,sat,', :);

  This query will give me back all rows with columns of 'cat', 'fat', and 'what'.

  >> A = T(:,'cat,fat,what,');

  The above query will be much faster if a table pair is used:

  >> A = TT(:,'cat,fat,what,');

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% D4M: Dynamic Distributed Dimensional Data Model
% Architect: Dr. Jeremy Kepner ([email protected])
% Software Engineer: Dr. Jeremy Kepner ([email protected])
% MIT Lincoln Laboratory
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% (c) <2010> Massachusetts Institute of Technology
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




  
 


 




 

d4m's People

Contributors

bhancock8 avatar chuksyee avatar dhutchis avatar graphchallenge avatar lmilechin avatar orthographic-pedant avatar piyushravi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

d4m's Issues

NoDiag on large arrays

The NoDiag function throws an error in Octave when passed a large Associative Array. The culprit is line 15: ij = sub2ind(size(A),i,j);. I ran into this error while processing a SCALE 18 power law graph.

Replacing this line with AA = A - diag(A); solves the problem. Anyone have an objection to using this implementation?

Possible bug in ReadCSV.

When reading a csv file where there are mixed values (e.g. some columns are strings while others are floats), an error occurs.

ERROR: MethodError: Cannot convert an object of type SubString{String} to an object of type Float64

The problem appears to be at Assoc.jl:116. If any value is a float, then it attempts to convert all to floats.

Lacks clarity on separators in Str2mat.m

Source Code.
Are separators supposed to be included in computation of length of strings or in the output? The output in matrix form includes the separator in each row. Are empty strings allowed, i.e. a string 'aa' where 'a' is the separator? Should the output be 2x1 matrix [ 'a'; 'a'] in case of string 'aa'?

In the attached file you can see the output for 'aaa' is 2x1 matrix, while for input 'abc|abdfg|aewq|waqrt|weerewqtyui||' the output is wrong in the 5th row with '|', the separator, replacing 'w' incorrectly.

screenshot from 2017-05-23 15 18 00

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.