autolab / tango Goto Github PK

View Code? Open in Web Editor NEW

45.0 22.0 56.0 3.1 MB

Standalone RESTful autograding service

Home Page: http://www.autolabproject.com/

License: Apache License 2.0

Makefile 0.30% C 7.71% Shell 2.27% Python 88.58% Dockerfile 1.14%

autograding tango autolab python

tango's Introduction

Tango

Tango is a standalone RESTful Web service that runs and manages jobs. A job is a set of files that must satisfy the following constraints:

There must be exactly one Makefile that runs the job.
The output for the job should be printed to stdout.

Example jobs are provided for the user to peruse in clients/. Tango has a REST API which is used for job submission.

Upon receiving a job, Tango will copy all of the job's input files into a VM, run make, and copy the resulting output back to the host machine. Tango jobs are run in pre-configured VMs. Support for various Virtual Machine Management Systems (VMMSs) like KVM, Docker, or Amazon EC2 can be added by implementing a high level VMMS API that Tango provides.

A brief overview of the Tango respository:

tango.py - Main tango server
jobQueue.py - Manages the job queue
jobManager.py - Assigns jobs to free VMs
worker.py - Shepherds a job through its execution
preallocator.py - Manages pools of VMs
vmms/ - VMMS library implementations
restful_tango/ - HTTP server layer on the main Tango

Tango was developed as a distributed grading system for Autolab at Carnegie Mellon University and has been extensively used for autograding programming assignments in CMU courses.

Using Tango

Please feel free to use Tango at your school/organization. If you run into any problems with the steps below, you can reach the core developers at [email protected] and we would be happy to help.

Python 2 Support

Tango now runs on Python 3. However, there is a legacy branch master-python2 which is a snapshot of the last Python 2 Tango commit for legacy reasons. You are strongly encouraged to upgrade to the current Python 3 version of Tango if you are still on the Python 2 version, as future enhancements and bug fixes will be focused on the current master.

We will not be backporting new features from master to master-python2.

Contributing to Tango

Fork the Tango repository.
Create a local clone of the forked repo.
Install pre-commit from pip, and run pre-commit install to set up Git pre-commit linting scripts.
Make a branch for your feature and start committing changes.
Create a pull request (PR).
Address any comments by updating the PR and wait for it to be accepted.
Once your PR is accepted, a reviewer will ask you to squash the commits on your branch into one well-worded commit.
Squash your commits into one and push to your branch on your forked repo.
A reviewer will fetch from your repo, rebase your commit, and push to Tango.

Please see the git linear development guide for a more in-depth explanation of the version control model that we use.

License

Tango is released under the Apache License 2.0.

tango's People

Contributors

Stargazers

Watchers

Forkers

yrkumar icanb ifjorissen cg2v yunfanye ubautograding jkim-ru s-wallace fh-salzburg xyzisinus chriswailes hexacyanide wsuv-autolab lonerz bstriner chillinghsu thotypous arnulfoperez codecrawl fanpu arberx loicgelle practischool mtoupsuno leileimiao travers-rhodes mucsci s2t2 mrufaruqui cmastudios vondowntown franzke-usc-couseware linnil1 cool00geek mojojojo99 cccodes akhilnadigatla ugogon amritaahead hecatephy ub-cse-it epicseven-cup

tango's Issues

Spacing behaves strangely in python files

The entire directory needs to be re-spaced.

Losing permission bits

The permission bits of files are lost as they are uploaded to Tango (by /upload). As a result, certain scripts that need +x to run, cannot run. We should either find a way to preserve these bits during upload or tell front-end to not rely on those permission bits and do bash hello.sh in their Makefile instead of ./hello.sh.

Autodriver has a hardcoded path /bin/find

autodriver.c has a hard coded path to /bin/find, which causes it to break on ubuntu systems, which have find in /usr/bin/find.

Local autograding in Docker

Implement a "local-docker" option that autogrades on the localhost in a docker container.

Error codes should be documented

It'd be great if developers connecting to Tango could understand the error codes returned from Tango. (e.g. -2)

Volumes directory not automatically created

If volumes directory is not created and config is *, then it should be automatically created.

Verbose ssh output

Instrument tango so that it can be configured to dump ssh verbose output to the logs. This would be a huge help in debugging ssh/scp issues.

213 labs are not autograding (high priority)

With the exception of proxylab, 213 uses legacy .rb files that overload functions like autogradeInputFiles and parseAutoresult. None of the labs are autograding. Tango returns a status code of -3 to the front-end with no other feedback.

Kosbie uses similar legacy .rb files, so he's going to be running into this issue as well.

Redis implementation

We need to confirm that the Redis version is working.

Tango stalls on destroyVM indefinitely

If the VMs on a backend machine become wedged, Tango stalls indefinitely during the retry waiting for destroyVM to finish. Instead, Tango should timeout on waitVM, create a new instance, and retry waitVM on that instance. Eventually, the wedged machine will fill up with VMs, but this will allow Tango to degrade gracefully while we are waiting to restart the backend.

Redis implementation should reuse established Redis connections

Tango3 jobs are taking too long

The 213 datalab is taking about 7-8 seconds to run, with up to 3-4 seconds required for the scp-based copyin step, which should be almost instantaneous. In the old system, datalab took 2-3 seconds. In the past, this kind of slow-down has been due to authentication issues that required the ssh client to retry using different authentication schemes.

It would be easy to diagnose if there were a way to tell Tango to call ssh with the verbose option (-vvvv) and then dump the ssh output to the tango log.

`elapsed_secs` field in `info` endpoint is wrong

When I hit /info endpoint in the API, the elapsed_secs variable is shown as the current epoch time (e.g. 1429372581) instead of the actual number of seconds elapsed.

May you forgot to save the time when Tango was started?

Crash Vulnerability in Job Manager

Currently, jobs are removed from the liveJobs queue to the deadJobs queue. If the JobManager goes down before a job is added to the deadJob queue, the job will be lost.
Possible solution: only remove from liveJobs queue when adding to deadJobs queue succeeds with no error.

Wiki for setting up Redis

This would be nice for devs and anyone who wants to use Redis.

Reset Tango should be on consumer and should reset jobs as well

Need new VMMS that supports one job at a time on a particular machine

To help reduce jitter on performance-sensitive assignments like malloc, we need a new docker vmms that will allow us to specifify a list of physical machines, and that schedules at most one job per physical machine.

Distributed autograding in Docker containers

Write a "distributed-docker" module that allows autograding on a set of distributed machines, one job at a time on each machine, in a Docker container.

Multiplexing VMMS per job

Jobs can pick which VMMS they want to run on. Some can run on Tashi and some on Distributed Docker.

Tango jobs are not dispatched in any sort of pseudo ordering

Jobs should roughly be dispatched in the order which they came. Exceptions are only when some jobs are not schedulable.

json output is hard to parse

The JSON output of getInfo and getPool uses JSON Arrays of ["a=X", "b=Y"] instead of Objects {"a":X, "b": Y}. This makes them non trivial to parse, since the client has to dive into the string rather then letting a JSON library do all the work for it.

Is there a reason for doing things this way?

Writing unit tests for Preallocator

I'm not entirely sure about the Preallocator implementation so I wasn't able to do it, but we definitely need this.

I already created a boilerplate at tests/testPreallocator.py

UTC regression

It's baaaaack! When I autograded this program the local time was 6:54, but Tango is reporting UTC instead:

_begin_
Autograder [Mon Aug 24 22:54:03 2015]: Received job [email protected]:82
Autograder [Mon Aug 24 22:54:11 2015]: Success: Autodriver returned normally

Autograder [Mon Aug 24 22:54:11 2015]: Here is the output from the autograder:

Autodriver: Job exited with status 0
...
_end_

However, the times in the job trace are correctly reported using local time:
_begin_
Runtime Trace
2015-08-24 18:54:03 | Added job [email protected]:82 to queue
2015-08-24 18:54:03 | Dispatched job [email protected]:82 [try 0]
2015-08-24 18:54:03 | Assigned job [email protected]:82 existing VM
...
_end_

Change response for default route in RESTful Tango API

Make the response interesting.

Additional function in VMM library: getImages

This will return a list of images that the VMM can boot its VMs with.

Distributed autograding on physical host

Write a "distributed" option that allows autograding on a set of distributed hosts.

prealloc breaks when given a JSON body

This issue makes the command-line client unusable for prealloc.

Tango should report local times in the job trace and the logfiles

In the feedback that Tango returns to the client, it should be using local times rather than UTC. For example, I submitted this job at 11:07am EST:

Autograder [Thu Jan 15 16:07:46 2015]: Received...
Autograder [Thu Jan 15 16:07:50 2015]: Success: Autodriver returned normally
Autograder [Thu Jan 15 16:07:50 2015]: Here is the output from the autograder:

Increase memory-resident queue from 512 to 1024

Recent change in MD5 hash function seems to break Autolab integration

Hi there,

I came across an issue where if I deleted an assessment in Autolab and reuploaded it, Tango would grade the latest file submitted to the previously deleted assessment when a new file was submitted to the reuploaded assessment (even if the files differ). Since there was an update in commit 050e5fc which had to do with MD5 checking, I updated Tango to see if this solved my issue. Unfortunately, this lead me to this error in Autolab:

After doing some digging in the source, it seems like the output from Tango's open() function in tangoREST.py is incompatible with what Autolab expects since the commit mentioned above. As you can see from the screenshot, the error finally occurs in tango_upload in Autolab's autograde.rb, but is due to the existing_files variable which is obtained from a TangoClient.open(..) call.

Thanks for looking into this.

resetTango should validate and clean up the machines dictionary

I have encountered a couple of situations in testing where the preallocator's machines datastructure got out of sync. There were machine ids in the list that no longer had an entry in the relevant queue, which prevented any jobs from being scheduled on that machine. If all machines get in that state, job scheduling would halt. #77 fixes some of these, but it probably makes sense to either reset the preallocator state or validate it when tango is reset.

Figure out a better, more consistent tab spacing

The tab spacings are well weird on greatwhite, github and my local machine. Let's make this more consistent, somehow.

Updating wiki to use command-line client instead of cURL

Essentially, this section: https://github.com/autolab/Tango/wiki/Setting-up-Tango-server-and-VMs#test

Jobs that timeout are not handled correctly

Job2 should timeout and runJob should return -1. Instead, the return code is 3, which corresponds to an OS error and results in a catastrophic retry.

LocalSSH VMMS should be removed

The localSSH VMMS is basically deprecated in favor of localDocker, so we should remove it

Regression: feedback file times are UTC again

The times in the feedback file returned by Tango are in UTC instead of local time. This was working a week ago and seems to be a regression caused by having multiple Tango repos.

Nasty 15-381 bug

There is a corner case (experienced by the 15-381) TA that causes Tango to improperly use a previously cached version of a submission file.

Tango3 jobs are taking too long

It would be easy to diagnose if there were a way to tell Tango to call ssh with the verbose option (-vvvv) and then dump the ssh output to the tango log.

Dependencies on 3rd party modules should be listed in requirements.txt

Issue with scp failing

Initially mentioned here: autolab/Autolab#169

/pool has unexpected behavior when there are no VM pools

When there are no VM pools (for example, when Tango has just started). The /pool endpoint returns {"pools": {}, "statusMsg": "Pool not found", "statusId": -1} complaining that the image name is invalid. This causes unexpected behavior on the frontend Autolab job status page.