Code Monkey home page Code Monkey logo

old's Introduction

DativeBase is the entrypoint for Dative/OLD: software for collaborative language documentation and analysis.

This repository is the hub for deployment strategies for Dative and the OLD. It contains two Docker Compose configurations for local deployments.

  1. HTTP Deploy (recommended)
  2. HTTPS Deploy (error-prone but more like production)

old's People

Contributors

jrwdunham avatar

Stargazers

 avatar

Watchers

 avatar  avatar

old's Issues

Grammaticality being set to NULL instead of empty

I do not think it is currently possible to set the grammaticality of a form to the empty string. The Pylons validators are filtering out the empty string and returning None instead. It may be desirable to change this.

New delimiter addition results in error

Add a new morpheme delimiter to an OLD system settings. Now any forms that contained that delimiter before it was system-recognizes as such will have out-of-date morphemeBreakIDs and morphemeGlossIDs values that will cause an IndexError in the morpheme auto-linking functionality. Make sure this doesn't happen in the new system.

Implement file export

Allow users to export one or more files as an archive. The input to this request would be a JSON array containing a list of one or more file ids or perhaps even a JSON search filter.

Absolute `parent_directory` paths should be relative

The parent_directory values of the morphemelanguagemodel, morphologicalparser, morphology, and phonology tables are absolute paths. They should be paths relative to the directory holding the Pylons config file. The absolute paths make it difficult to move the databases.

What to do about audio file sub-interval reference?

We need to re-consider the nature of OLD files resources/models/objects in view of the possibility of referring to subintervals of audio files in order to sidestep expensive manual editing.

Is programmatic editing of audio files feasible in a web app like this; if so, this might be a part of a solution to this.

Mac MySQL 5.6 MyISAM tests in test_forms_search fail

The inconsistency of MySQL wrt datetime precision across OSes and engines is very frustrating, and there doesn't seem to be very clear documentation about it. If this bug is to be resolved, it may mean exploring the edge cases or switching over to MariaDB or PostgreSQL.

Implement logging

Decide what to log. All db queries, all requests, a subset of the union of those?

Enhance SQLAQueryBuilder: filter by attribute count/length

Filtering by a non-relational scalar's length can already be accomplished via regex, e.g., ['Form', 'transcription', 'regex', '^.{15}$'] will return all Forms whose transcription has 15 characters.

Filtering by the count of a relational collection, i.e., getting all forms with 2 glosses or all collections with more than 50 forms is a bit more complicated: you have to use subqueries and joins. Below I show how to do it in SQLAlchemy.

Implementing this in SQLAQueryBuilder in an open-ended way may be tricky. I was thinking of allowing attribute names to have a 'dot' syntax so that a filter expression like ['Collection', 'files.count', '>', '50'] would return all collections with more than 50 forms associated. The tricky part would be keeping track of subqueries & joins. Anyways, no time for this now ...

How to get all Forms with 2 glosses

stmt = Session.query(model.Gloss.form_id, func.count('*').
label('gloss_count')).
group_by(model.Gloss.form_id).subquery()
r = Session.query(model.Form).
outerjoin(stmt, model.Form.id==stmt.c.form_id).
filter(stmt.c.gloss_count==2).all()

How to get all Forms with 2 files

stmt = Session.query(model.FormFile.form_id, func.count('*').
label('file_count')).
group_by(model.FormFile.form_id).subquery()
r = Session.query(model.Form).
outerjoin(stmt, model.Form.id==stmt.c.form_id).
filter(stmt.c.file_count==2).all()

Suggest morphological analyses based on similarities

System should suggest morphological analyses based on similarities. I.e., if you enter a morpheme in your analysis that is similar to something already present, the system should recognize this an alert you to what you might be wanting to use.

Convert .wav files to .ogg (Vorbis)

Ogg Vorbis is a lossy format that's easy to convert to and has pretty good browser coverage. mp3 and webm might be good too but the former is hard to convert to (in linux at least) and ffmpeg doesn't have obvious support for webm...

Make phonologies testable against a corpus

Consider writing a controller action to test a phonology against a corpus. This could be done client-side with multiple requests or using the JS-foma thingy, but server-side would be more efficient than the former at least ...

Debug worker thread behaviour vis-a-vis foma & mitlm

Request a foma object be compiled twice in succession. flookup will be invoked twice in near succession, spawning two subprocesses. This should probably be prevented, I.e., parallel composition of the same foma-based object.

Create normal files from subinterval-referencing ones

Create a normal file from subinterval-referencing files using the wave module. I have written a simple script showing how this can be done which essentially copies the instructions at http://stackoverflow.com/questions/2881012/how-to-get-the-contents-of-the-wav-file-into-array-so-as-to-cut-the-required-seg?rq=1.

The challenge is the ui for this. Do it automatically? If so, do we make that a config setting? Allow users to create these sub-files via the interface?

Test email_reset_password() without Gmail

Write tests to ensure that email_reset_password() in the login controller is able to send email from the server and not just via Gmail. This requires a server that can send email (which I haven't configured yet).

File creation without base64 encoding

Allow files to be created without needing to first convert them to base64 encoding client-side and then send them as JSON, i.e., use multipart-form-data. This is certainly more efficient; the con is that it is contrary to the "always send JSON" way of doing things currently established in the OLD. Note that WebTest has a self.app.encode_multipart method.

Allow user-defined form attributes

Clearly this is easier in a document store/nosql datastore, but it can be done in a a rdb.

The simplest solution would seem to be storing UDFs in a BLOB of the form table, e.g., form.udfs. Then, udfs could contain a pickled Python dict or a JSON object. Indexing attributes of form.udf poses some challenges, but there are methods for doing it, cf. the friendfeed approach in the link below.

Searches implying joins entail a non-null value for the attribute on which the join is based ...

ACTUALLY, this isn't the case. lib/SQLAQueryBuilder.py uses a left (outer) join so forms without syntactic categories will show up in a search like that described below.

This means that if you search for forms that have category 'S' or whose transcription contains a space you will not get any of the space-containing form with a NULL value for syntacticCategory_id. This is probably not the desired behaviour and we should probably warn users about it, or provide a way of achieving the desired behaviour. Maybe a different type of join?...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.