Code Monkey home page Code Monkey logo

esfbdata's Introduction

Installation

You do not have to install esfbdata to use it. Simply clone the repository, change to the clone directory, and run

python -m esfbdata --help

esfbdata Relies on the following Python modules, and has been tested on Python version 2.7.11, but should also work on 3.2+.

elasticsearch
beautifulsoup4
dateutil

Description

esfbdata is a small command-line program to parse Facebook data archives and ingest them into an Elasticsearch cluster. Currently, it is capable of parsing the html/events.htm, html/messages.htm, and html/timeline.htm files. It requires the Python interpreter, version 2.7 or 3.2+, and it is not platform specific. It is released under GPLv3 terms. A copy of the GPLv3 is included in LICENSE.

Usage

esfbdata [-h] [--version] -n NODE [NODE ...] [-i INDEX]
         [--ignore STATUS_CODE [STATUS_CODE ...]]
         [--parser {html.parser,lxml,html5lib}]
         [--ingest {events,messenger,timeline} [{events,messenger,timeline} ...]]
         [-v] [-d] [--log-format LOG_FORMAT] [-s]
         FILE [FILE ...]

Options

positional arguments:
  FILE                  The Facebook archives to ingest

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -n NODE [NODE ...], --nodes NODE [NODE ...]
                        Elasticsearch nodes to connect to
  -i INDEX, --index INDEX
                        Elasticsearch index to ingest data into (default:
                        facebook)
  --ignore STATUS_CODE [STATUS_CODE ...]
                        Elasticsearch errors to ignore (default: [400])
  --parser {html.parser,lxml,html5lib}
                        HTML parser to use (default: html.parser)
  --ingest {events,messenger,timeline} [{events,messenger,timeline} ...]
                        Set archives to ingest (default: ['events',
                        'messenger', 'timeline'])
  -v, --verbose         Set log level to INFO
  -d, --debug           Set log level to DEBUG (supercedes --verbose)
  --log-format LOG_FORMAT
                        Set the format of logs (default: %(asctime)s -
                        %(levelname)s - %(message)s)
  -s, --simulate        Skip indexing of data

Example

Spin up instance of Elasticsearch

docker run -d elasticsearch

Get the name of the docker instance

docker ps

Spin up an instance of Kibana and attach it (not required, but you probably will want it)

docker run --link some_elasticsearch:elasticsearch -d kibana

Get your Elasticsearch IP

docker inspect some_elasticsearch_id

Run esfbdata on the ZIP archive downloaded from Facebook

esfbdata -n some_elasticsearch_ip -v /path/to/facebook-username.zip

This will process the data and ingest it into elasticsearch with the default options. Likely, you will want to use lxml if you have it and add --parser lxml to the command arguments. Beware of the --debug option, as it will generate an extreme amount of data and should really only be used for very tailored debugging scenarios.

Developers

Developers will likely want to inherit their parser from the FacebookIngester class or use the already existing classes (FacebookEventsIngester, FacebookMessengerIngester, and FacebookTimelineIngester).

esfbdata uses the logging framework with logger named esfbdata.

esfbdata's People

Contributors

thisita avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.