Code Monkey home page Code Monkey logo

afgexplorer's Introduction

afgexplorer

This is the Django source code that runs http://wardiaries.wikileaks.org, a tool which enables rich browsing and searching of the WikiLeaks Afghan War Diaries document archive.

You may be interested in the DocuDig fork, which generalizes this application to any structured data set.

Installation

1. Dependencies

The latest version of afgexplorer uses Solr as its search backend. Previous versions of afgexplorer only used the database. The latest version should work with any Django-compatible database (the previous version depended on postgreql); however, the management command to import data assumes postgresql for efficiency's sake.

python and Django

It is recommended that you install using pip and virtualenv. To install dependencies:

pip install -r requirements.txt -E /path/to/your/virtualenv

If you use postgresql (recommended), you will need to install egenix-mx-base, which cannot be installed using pip. To install it, first activate your virtualenv, and then:

easy_install -i http://downloads.egenix.com/python/index/ucs4/ egenix-mx-base

Solr

Install Solr. For the purposes of testing and development, the example server should be adequate, though you will need to add add the schema.xml file as described below.

Stylesheets

Style sheets are compiled using Compass. If you wish to modify the style sheets, you will need to install that as well. After compass is installed, stylesheets can be compiled as you modify the .sass files as follows:

cd media/css/sass/ compass watch

2. Settings

Copy the file example.settings.py to settings.py, and add your database settings.

3. Data

Importing data

This project contains only the code to run the site, and not the documents themselves. The documents themselves must be separately obtained at: http://wikileaks.org/wiki/Afghan_War_Diary,_2004-2010

To import the documents, download the CSV format file. Then, start the process as follows.:

python manage.py import_wikileaks path/to/file.csv "2010 July 25"

The first argument is the path to the data file, and the second argument is the release label for that file (used as an additional facet to allow viewers to search within particular document releases). If there are multiple document releases to import at once, add additional filename and label pairs as subsequent arguments.

The script will first collate the entries and extract phrases that are in common between the documents. Then, it will construct a new csv file which contains the cleaned database fields for for efficient bulk importing with postgres. Following this colation, you will need to enter the database password to execute the bulk import.

Indexing with Solr

To generate the Solr schema, run the following management command:

python manage.py build_solr_schema > schema.xml

Copy or link this file to the Solr conf directory (if you're using the example Solr server, this will be apache-solor-1.4.1/example/solr/conf), replacing any schema.xml file that is already there, and then restart Solr. After restarting Solr, the following management command will rebuild the index:

python manage.py rebuild_index

License

Granted to the public domain. If you need other licensing, please file an issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.