Code Monkey home page Code Monkey logo

research_tricks's Introduction

Scientific Research Tricks

Research is awesome! But today, due to several reasons, it is lacking some things and making some error during its development.

My objective with this project is to gather some tips and tools to improve how research is done.

Copyright 2014 DPPG@ON Organization.

License: Creative Commons Attribution 4.0 International License.

http://creativecommons.org/licenses/by/4.0/

Version Control System

At some point of your research did you make a bad choice or a have done mistake and had to go back some steps? Of course you did. Research is not done following a recipe on some cookbook. Mistakes are done. One should try to avoid mistakes, but do not regret about it. Learn from it.

The problem that I want to address here is what can you do when do you do a mistake. You will probably want to take some steps back. If you are a very methodical person, you will have every step taken wrote down on some logbook or similar. But I guess that not everybody is like this.

Or maybe you keep track of changes like this:

A Story Told in file names

This is not an optimized way of tracking your files. I can see only one possible future to this and it is a totally mess.

So, why do you not try a Version Control System (VCS)?

A VCS "is the management of changes to documents, computer programs, large web sites, and other collections of information.". With this you can keep track of any change you do on your research and fall-back to any point that you want.

One VCS widely used today is git. It was created by Linus Torvalds, the same creator of the Linux Kernel. Fun fact: like Linux, Linus named git after himself. Git is British English slang for a stupid or unpleasant person, and Linus said "I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git' .". Git is free and open source.

You can try the basics of git here and this book by Scott Chacon or this another book by Richard E. Silverman can also help.

Backup and Sharing

So do you want to do a backup of your research and/or work on multiple computers? You could try a USB stick or a portable HD. That is a way to do it, but maybe not the best. Maybe a cloud storage service like Dropbox or Google Drive. I would guess that this a better way to do it. Dropbox has a rudimentary VCS built-in. Maybe you want to share with your collaborators? You could create a shared folder on Dropbox.

But may I suggest something better? Something stronger? Github. It "is a web-based hosting service for software development projects that use the Git revision control system." You can store all your Git projects on Github and share with you collaborators. You can browse your and others research online on your preferred browser. Today you almost not need to know git to use it.

Regular expressions

How to find some pattern on a text.

Useful links:

Shell

"A Unix shell is a command-line interpreter or shell that provides a traditional user interface for the Unix operating system and for Unix-like systems.". Some people are afraid of the shell. Do not. The shell is one of your most powerful ally. If you know it well and dominate it, what you can do is almost magic.

If your are running Linux, you will probably have Bash as your Unix shell. If you are running Windows you can try Cygwin.

I highly recommend seeing the Software Carpentry tutorial track on the Shell. It is very good and it will give you all the basics on the Shell.

For example, below I listed some commands that can speed up your workflow.

  • List files on one folder and send it to a file: ls path/to/folder > list_of_files.txt;
  • Finding files: find path/to/folder -name filename
  • Finding files with RegEx: find path/to/folder | grep 'add-your-regex-here'
  • Finding patterns inside files with RegEx: find path/to/folder | xargs grep 'add-your-regex-here'
  • Removing files found with previous command: find path/to/folder | grep 'add-your-regex-here' | xargs rm
  • Compare directories: diff -rq dirA dirB
  • Convert all figures in a folder from one type to other: for f in *.jpg; do convert ./"$f" ./"${f%.jpg}.png"; done

Bonus Tip: Bash git prompt is a resource that gives basic git information of the repository directly on the prompt.

Script programming

You can learn python basics at CodeAcademy. I have tried and it is very nice.

Another interactive way to learn python is the learnpython.org. Give it a try.

There is a tutorial for non-programmers here.

There is also the now famous Learn Python the Hard Way. It seems that it focus on exercise and repetition as learning tools.

This link and this contains a compilation of free books.

If you want to know which python modules/packages are installed in your system, you just have to type in your shell:

pip freeze

And of course, this assumes that you have pip installed in your system.

Installing python packages on your system could result on some headache. For example, I stayed one afternoon trying to figure it out why my package was not running after I reinstalled it. The reason was that I was getting conflicts between the new and the old version. The solution was to use virtualenv and virtualenvwrapper.

as the name says virtualenv creates a python virtual environment. this allows the user to install any python package without worryng in messing the system packages. also, the user can create any number of virtual environments, for example, one for each project.

Good editor

it is very important to have a good editor.

if you are the nerd-geek-kind-of-awesome guy, you should probably try vim or emacs.

Automation

how to automate things using python. this is great when you do a miskate and had to run several things again.

some links to check:

negative results

negative results are still results. publish them. why publish it? just look the following cartoon and i guess you will understand.

negative data

got it?

probably, a journal will not accepted a paper showing negative results. and because of this, enters figshare. This resource "allows users to upload any file format to be made visualisable in the browser so that figures, datasets, media, papers, posters, presentations and filesets can be disseminated in a way that the current scholarly publishing model does not allow.. Each image, presententation or any other kind of data receives a DOI that will uniquely identify your data and it will allow others to cite it.

Visualization

How to visualize your data is a very important step. I must say, crucial. It is very hard to obtain any results without it. Unless you are a statistic ninja, and even that, you would probably use visualization to convince your audience of your findings.

One of the fathers of this field is Edward Tufte. I have not yet read it, but his first book, The Visual Display of Quantitative Information, is very highly recommended.

So, how one can visualize data? There are several tools available. You could use a spreadsheet editor, like LibreOffice Calc. The resources are limited, but you can get a visualization quickly and it is very good when one does not have any other skils. And if you want to make better plots with it, you will probably want to try RAW.

One can draw it by hand, or use GIMP or Inkscape, for example. But, for that, it would probably be good to have some artistic skills. And also, if something change on your data, you would probably have to start it from scratch.

Also, one could code some routine to do visualization. The advantage of this method is that probably you will only have to code it once, i.e. the code is data independent. If the data changes, you only have to run it again. Well, achieving this will only depend of your coding skill.

There are several good visualization toolkits on the wild, like D3 for javascript. If you want to learn D3, the book by Scott Murray, Interactive Data Visualization for the Web, is now available online for free. And, for python, we have Matplotlib, Mayavi and ggplot. Ggplot2 is a very well-known plotting system for R and ggplot is specially good for those that are learning python but have a solid background in R. And if you want the D3 awesomeness in your python code, try Vincent.

Also, Bret Victor is working on a software that will allow to dynamically draw your visualizations. If you want to know more about it, you can check his talk and here. It is really impressive!

And according to Nathan Yau from Flowing Data, there is only one way to learn visualization: work with data.

I also recommend his first book as a starting point to do visualization. He address what type of visualization there are, the tools available and all on a very pleasant reading content. I have not yet read his second, but I am very anxious to get a copy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.