Code Monkey home page Code Monkey logo

head-tracker's Introduction

head-tracker

Head and gaze tracking of unrestrained marmosets using a convolutional neural network. This repository contains the implementation of the method described in Head and gaze tracking of unrestrained marmosets by Turesson, Ribeiro Conceicao and Ribeiro (2016).

What it is

Current sophisticated technologies for manipulating and recording the nervous system allow us to perform historically unprecedented experiments. However, any influences of our experimental manipulations might have on psychological processes must be inferred from their effects on behavior. Today, quantifying behavior has become the bottleneck for large-scale, high throughput, experiments.

With head-tracker we try to solve this issue by using a modern deep learning algorithm for video-based animal tracking. head-tracker provides methods for tracking head position and direction (i.e. indirectly gaze) from simple video recordings of the common marmoset (Callithrix jacchus).

What it does

The routines in this repository allows researchers to estimate where a subject is and roughly where it is looking using only a webcam. Given a video of a marmoset, filmed from above, the head is localized and its direction in the horizontal plane is estimated. The common marmoset has a relatively small and light head and thus low inertial weight, which means that they rely more on head movements to direct their gaze than to larger primates, such as macaques and humans. Thus, the head direction give a much better estimate of the gaze than it would, for example, in humans who direct their gaze much more with head-independent eye movements.

How it works

To generate the training data, video frames are manually labeled with head direction.

Video frames are resized (480 x 640 to 160 x 120 pixels) and whitened.

The network is a CNN with six convolutional layers and two max pool layers. Filter sizes in the convolutional layers decrease from 11 × 11 to 3 × 3. The final layer is a global average pooling layer from where head position is read out from a class activity map (see: Learning Deep Features for Discriminative Localization). This global average pooling layer has a size of 1024 and is connected to a softmax output layer. We optimized the cross-entropy loss regularized by the Euclidean (L2) norm of the weights set to 0.0001.

Example usage

Label training data

import data_preparation as dp
video_fname = '/path/to/some/video/file.mkv' # doesn't have to be .mkv
lf = dp.LabelFrames(video_fname)

lf.run_batch(t0=2.3)   # time of 1st frame to label in seconds

Extract training data and store in tfrecords format

log_dir = '/path/to/log/files/'
video_dir = '/path/to/video/files/'
out_dir = '/path/to/directory/where/training/and/dev/data/will/be/stored'

dp.labeledDataToStorage(log_dir, video_dir, data_dir, Ndev=3000)

Train the model

from head_tracker import TrainModel
data_dir = '/path/to/directory/containing/tfrecords'
model_dir = '/path/to/directory/where/models/will/be/saved'
Nepoch = 80
Nclass = 29
tm = TrainModel(Nclass=Nclass,  data_dir=data_dir,  model_dir=model_dir)
valid_acc, train_acc = tm.train(Nepoch) # Train for Nepoch iterations over the traing data.

Be patient.

Test tracking performance

from head_tracker import HeadTracker
log_fname = '/path/to/log/file.txt'
video_dir = '/path/to/video/files/'
video_fname = '/path/to/video/file.mkv' # doesn't have to be .mkv
video_save_fname = '/path/to/annotated/video.mp4'
ht = HeadTracker(Nclass=Nclass, model_dir=model_dir)
est_track, true_track, error = ht.test_track(log_fname, video_dir)
# write a video annotated with head direction and position.
ht.track2video(video_fname, video_save_fname, est_track)
ht.close()

Watch the resulting video.

Performance

A subset of the video frames were annotated by two investigators. We use the disagreement between the two invesigators as a reference for the model's performance. The mean head direction error was 10.9° (median error 6.2°), which is very close to the inter-human disagreement with a mean of 9.2° (median 7.2°). The mean position error was 33 pixels (median 23 pixels), compared to the inter-human disagreement with a mean of 10 pixels (median 9 pixels).

head-tracker's People

Contributors

kalleknast avatar

Watchers

James Cloos avatar Boosting avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.