Code Monkey home page Code Monkey logo

rosettacodedata's Introduction

RosettaCode Data Project

This git repository contains (almost) all of the code samples available on http://rosettacode.org organized by Language and Task.

Getting the Data

All of the data is in this repository, so you can just run:

git clone https://github.com/acmeism/RosettaCodeData

However...

It's a lot of data!

If you just want the latest data, the quickest thing to do is:

git clone https://github.com/acmeism/RosettaCodeData --single-branch --depth=1

Tools

This repository's data content is created by a Perl program called rosettacode.

You can install it with this command:

cpanm RosettaCode

You can rebuild the data with:

make build

This repository has a bin directory with various tools for working with the data.

  • rcd-api-list-all-langs

    List all the programming language names directly from rosettacode.org

  • rcd-api-list-all-tasks

    List all the programming task names directly from rosettacode.org

  • rcd-new-langs

    List the RosettaCode languages not yet add to Conf

  • rcd-new-tasks

    List the RosettaCode tasks not yet add to Conf

  • rcd-samples-per-lang

    Show the number of code samples per language

  • rcd-samples-per-task

    Show the number of code samples per task

  • rcd-tasks-per-lang

    Show the number of tasks with code samples per language

  • rcd-langs-per-task

    Show the number of languages with code samples per task

To Do

Pull requests welcome!

This project is not a perfect representation of RosettaCode yet. It has a few uncicode issues. It also has to deal with various formatting mistakes in the mediawiki source pages.

  • Fix bugs

  • Correct the 100s of guessed file extensions in Conf/lang.yaml

  • Ability to only fetch cache pages since last pushed data update

  • Support names with non-ascii characters

  • Add more bin tools

  • Address errors reported in rosettacode.log after running make build

rosettacodedata's People

Contributors

ingydotnet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rosettacodedata's Issues

Source code for task solutions written in Fōrmulæ

Hi! interesting repo

This is not an issue, it's just a comment (I could not find the discussion section).

As you surely have already seen, solutions for Fōrmulæ language in Rosetta Code are provided as images, because programs are created/edited structurally, not as text. However, there is "textual" source code, using the ".formulae" extension, they are ultimately XML format, which are neither suitable not useful for Rosetta Code.

If you want (or plan) to include the source code for Fōrmulæ solutions in your respository, you can either extract, or reference them from its official repo. The name of the file is the name of the Rosetta Code task (with the .formulae extension added).

The following could be useful:

  • The directory contains several examples other than Rosetta Code tasks, so maybe you would have to filter first, or scam from your task list.
  • Every file is a "script" or "notebook" containing one or many programs and their results. Because of this, some files are big, e.g. programs that create images, see for example: Cistercian numerals.

Cheers

-Match vs -Eq

There are several lines from line 32 down that use -match instead of using -eq. This I believe is in error due to match being a substring match [while it might work for their test set it incorrectly identifies matches when its a substring]. I didn't want to update in rosetta code without having someone verify and yours seems to be the most complete code backup of it, so if this is the wrong place, i'm sorry.

Different Countings?

Hello Together,

on the rosettacode.org page are 870 tasks listed and 206 in draft. In this repo I count 758. Is this intentionally? If yes what was the filter criteria?

BR, CF

Add licence to repository

Hello there,

its implicit that code scrapped is licensed under GNU Free Document License 1.3 but the code/tools/task should also be licensed to allow some users to clone/fork this project in some environments/contexts.

Could you define under which license is this repository?

[Request] Autoit

Is it possible you can create an equivalent AutoIt script for this?

Cannot clone repo on Windows

Hi Acmeism (@acmeism) / Ingy (@ingydotnet),

This in an excellent project! I am sure that many developers will find it useful. However, there appears to be a problem cloning the repo on Windows.

As per the "Getting the Data" section in the ReadMe, I tried the following command:

git clone https://github.com/acmeism/RosettaCodeData --single-branch --depth=1

The initial portion of the clone command worked, but it then failed with an invalid path error:

Cloning into 'RosettaCodeData'...
remote: Enumerating objects: 283863, done.
remote: Counting objects: 100% (283863/283863), done.
remote: Compressing objects: 100% (168129/168129), done.
remote: Total 283863 (delta 5568), reused 280218 (delta 5540), pack-reused 0
Receiving objects: 100% (283863/283863), 52.34 MiB | 3.61 MiB/s, done.

Resolving deltas: 100% (5568/5568), done.
error: invalid path 'Lang/11l/Sequence:-smallest-number-greater-than-previous-term-with-exactly-n-divisors'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

My guess for the failure is that the ":" is an invalid character in the Windows filesystem. Might it be possible to change a colon character in a filename to the valid "-" character instead?

Kind Regards,
Liam

Broken sort

re: RosettaCodeData/Task/Sorting-algorithms-Heapsort/Pascal/sorting-algorithms-heapsort.pascal

Initialise the array to these values:
data: TIntArray = (11, 9, 4, 7, 13, 10, 6, 6, 12, 10, 7, 7);
and this is the result:

The data before sorting:
11 9 4 7 13 10 6 6 12 10 7 7
The data after sorting:
4 6 6 7 7 7 9 10 12 10 11 13

Does that look right to you?!

reduced-row-echelon-form.java returns wrong result

The reduced-row-echelon-form.java of this example:

  double[][] matrix_0 = {
    {1,0,-1,0},
    {0,1,0,-1},
    {1,-2,-1,0},
    {-1,0,3,1}
  };

  Matrix x = new Matrix(matrix_0);
  System.out.println("before\n" + x.toString() + "\n");
  x.RREF();
  System.out.println("after\n" + x.toString() + "\n");

should return the 4x4 identity matrix.

See wolframalpha for the input:

RowReduce({{1,0,-1,0},{0,1,0,-1},{1,-2,-1,0},{-1,0,3,1}})

Combine fragment files into one file?

In the current repository, there exist files that are not, by themselves, valid programs for a given language. For example, in the Ada subdirectory, for the "15 Puzzle Game" task, the first file has a package specification, the second file has the package body, the third has the final executable program, and the fourth has the package instantiation (though I'm not sure why this exists at all). Would it be possible to merge things like this into single files, and eliminate unnecessary ones?

The LCS Implementation in CoffeeScript has a bug

Calling the function, as defined, with the strings:

  • "Chocolate frosted sugarbombs:"
  • ": Now w/ extra nicotine!"

for parameters s1, s2 gives "oa ote", which is not a subsequence in either of the argued strings.

Same code committed as upper and lower case

When doing a git checkout under macOS I get the following warning. Which is normally fatal for the integrity of the repository checkout, but I think in this case it is in fact a simple error of committed duplicate code. You might want to clean the duplicate folders.

warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'Lang/MoonScript/00DESCRIPTION'
  'Lang/Moonscript/00DESCRIPTION'
  'Lang/MoonScript/100-doors'
  'Lang/Moonscript/100-doors'
  'Lang/MoonScript/99-Bottles-of-Beer'
  'Lang/Moonscript/99-Bottles-of-Beer'
  'Lang/MoonScript/A+B'
  'Lang/Moonscript/A+B'
  'Lang/MoonScript/FizzBuzz'
  'Lang/Moonscript/FizzBuzz'
  'Lang/MoonScript/Read-a-specific-line-from-a-file'
  'Lang/Moonscript/Read-a-specific-line-from-a-file'
  'Lang/MoonScript/README'
  'Lang/Moonscript/README'
  'Lang/MSX-BASIC/00DESCRIPTION'
  'Lang/MSX-Basic/00DESCRIPTION'
  'Lang/MSX-BASIC/README'
  'Lang/MSX-Basic/README'
  'Task/100-doors/MoonScript/100-doors.moon'
  'Task/100-doors/Moonscript/100-doors.moon'
  'Task/99-Bottles-of-Beer/MoonScript/99-bottles-of-beer.moon'
  'Task/99-Bottles-of-Beer/Moonscript/99-bottles-of-beer.moon'
  'Task/A+B/MoonScript/a+b.moon'
  'Task/A+B/Moonscript/a+b.moon'
  'Task/FizzBuzz/MoonScript/fizzbuzz.moon'
  'Task/FizzBuzz/Moonscript/fizzbuzz.moon'
  'Task/Read-a-specific-line-from-a-file/MoonScript/read-a-specific-line-from-a-file.moon'
  'Task/Read-a-specific-line-from-a-file/Moonscript/read-a-specific-line-from-a-file.moon'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.