github-linguist / linguist Goto Github PK

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!

License: MIT License

Ruby 67.36% Shell 1.24% C 22.95% Lex 1.64% Go 6.52% Dockerfile 0.30%

syntax-highlighting language-grammars language-statistics linguistic

linguist's Introduction

Linguist

This library is used on GitHub.com to detect blob languages, ignore binary or vendored files, suppress generated files in diffs, and generate language breakdown graphs.

Documentation

Installation

Install the gem:

gem install github-linguist

Dependencies

Linguist is a Ruby library so you will need a recent version of Ruby installed. There are known problems with the macOS/Xcode supplied version of Ruby that causes problems installing some of the dependencies. Accordingly, we highly recommend you install a version of Ruby using Homebrew, rbenv, rvm, ruby-build, asdf or other packaging system, before attempting to install Linguist and the dependencies.

Linguist uses charlock_holmes for character encoding and rugged for libgit2 bindings for Ruby. These components have their own dependencies.

charlock_holmes
- cmake
- pkg-config
- ICU
- zlib
rugged
- libcurl
- OpenSSL

You may need to install missing dependencies before you can install Linguist. For example, on macOS with Homebrew:

brew install cmake pkg-config icu4c

On Ubuntu:

sudo apt-get install build-essential cmake pkg-config libicu-dev zlib1g-dev libcurl4-openssl-dev libssl-dev ruby-dev

Usage

Application usage

Linguist can be used in your application as follows:

require 'rugged'
require 'linguist'

repo = Rugged::Repository.new('.')
project = Linguist::Repository.new(repo, repo.head.target_id)
project.language       #=> "Ruby"
project.languages      #=> { "Ruby" => 119387 }

Command line usage

Git Repository

A repository's languages stats can also be assessed from the command line using the github-linguist executable. Without any options, github-linguist will output the language breakdown by percentage and file size.

cd /path-to-repository
github-linguist

You can try running github-linguist on the root directory in this repository itself:

$ github-linguist
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Additional options

`--rev REV`

The --rev REV flag will change the git revision being analyzed to any gitrevisions(1) compatible revision you specify.

This is useful to analyze the makeup of a repo as of a certain tag, or in a certain branch.

For example, here is the popular Jekyll open source project.

$ github-linguist jekyll

70.64%  709959     Ruby
23.04%  231555     Gherkin
3.80%   38178      JavaScript
1.19%   11943      HTML
0.79%   7900       Shell
0.23%   2279       Dockerfile
0.13%   1344       Earthly
0.10%   1019       CSS
0.06%   606        SCSS
0.02%   234        CoffeeScript
0.01%   90         Hack

And here is Jekyll's published website, from the gh-pages branch inside their repository.

$ github-linguist jekyll --rev origin/gh-pages
100.00% 2568354    HTML

`--breakdown`

The --breakdown or -b flag will additionally show the breakdown of files by language.

You can try running github-linguist on the root directory in this repository itself:

$ github-linguist --breakdown
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Ruby:
Gemfile
Rakefile
bin/git-linguist
bin/github-linguist
ext/linguist/extconf.rb
github-linguist.gemspec
lib/linguist.rb
…

`--json`

The --json or -j flag output the data into JSON format.

$ github-linguist --json
{"Dockerfile":{"size":1212,"percentage":"0.31"},"Ruby":{"size":264519,"percentage":"66.84"},"C":{"size":97685,"percentage":"24.68"},"Lex":{"size":5098,"percentage":"1.29"},"Shell":{"size":1257,"percentage":"0.32"},"Go":{"size":25999,"percentage":"6.57"}}

This option can be used in conjunction with --breakdown to get a full list of files along with the size and percentage data.

$ github-linguist --breakdown --json
{"Dockerfile":{"size":1212,"percentage":"0.31","files":["Dockerfile","tools/grammars/Dockerfile"]},"Ruby":{"size":264519,"percentage":"66.84","files":["Gemfile","Rakefile","bin/git-linguist","bin/github-linguist","ext/linguist/extconf.rb","github-linguist.gemspec","lib/linguist.rb",...]}}

Single file

Alternatively you can find stats for a single file using the github-linguist executable.

You can try running github-linguist on files in this repository itself:

$ github-linguist grammars.yml
grammars.yml: 884 lines (884 sloc)
  type:      Text
  mime type: text/x-yaml
  language:  YAML

Docker

If you have Docker installed you can build an image and run Linguist within a container:

$ docker build -t linguist .
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist github-linguist --breakdown
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Ruby:
Gemfile
Rakefile
bin/git-linguist
bin/github-linguist
ext/linguist/extconf.rb
github-linguist.gemspec
lib/linguist.rb
…

Contributing

Please check out our contributing guidelines.

License

The language grammars included in this gem are covered by their repositories' respective licenses. vendor/README.md lists the repository for each grammar.

All other files are covered by the MIT license, see LICENSE.

linguist's People

Contributors

Stargazers

Watchers

Forkers

toothrot cpatni ananthrk carlosgaldino sen lethaldose bakkdoor rkh dals earl epictetus bratish kkaefer jettero tautologico sarahhodne softprops zlender cybershadow devinus mrorbita visgean txdywy lkuper bkerley flaviusb leto nibalizer gosu-tools kprevas pombredanne beckje01 kevinsawicki grosser simonoff mgdm robsimmons jwilkins ruelbargo maieul eeue56 gitlabhq kepstin btd open-turing-project sunaku stof sbisbee kennknowles dom96 sj26 mcobrien ahrokib timurb brynary utkarshkukreti alokmenghrajani robnewman leafo sylvestre abevoelker meirkriheli rankida valscion mleinart zhensydow geekontheway ab5tract jaxzin jpcs frostman pmoura derekv jstrachan keikubo abarrachina caerbannog lparenteau pao svenefftinge skoon igrigorik nolta pablof7z doubleotoo pwaller ealliaume dineshkummarc jongalloway burningtyger johan wildmichael ngarneau rlsosborne chiraag andyklock chochos sdressler protohub borgified

linguist's Issues

Support highlighting Twig templates

The syntax of Twig templates is equivalent of the Jinja one (but for PHP projects instead of Python ones) so it could probably be done by reusing the Jinja lexer.
Twig is the default templating engine for Symfony2 (which uses Github) so it would help a lot to have proper highlighting for .twig files.

Binary *.n files are Neko (haXe) applications, not Nemerle code

Linguist is flagging any file with a *.n extension as Nemerle, but the extension is used by Neko binary code.

Since this is compiled code, I don't think it should be counted towards any source code total -- but it should not be flagged as Nemerle!

For example, I have a project which includes haXe source code, that compiles to a Neko application for processing Javascript, building JS projects, etc. 68% of the file total is the compiled *.n application, while the rest is the haXe source code.

Ruby 1.9.2: file content encoding causes file blobs to fail

The creation of file blobs can fail on creation because the file contents might be encoded. This issue should only be present in Ruby 1.9+ as Ruby 1.8 did not care for encoded files.

A tempory solution is to do this in the file_blob.rb

    # Public: Read file contents.
    #
    # Returns a String.
    def data
      File.read(@path).encoding.to_s
    end

Only thing is the test cases fail now.

Note: If this project was only intended to only work with Ruby 1.8, then disregard this

Upgrade Pygments to 1.5

http://pygments.org/download/ -- Release 1.5 "Zeitdilatation" is out!

Doesn't Detect Languages

My Github repo is not getting any graph data. This is built on approx 98% PHP and a little bit of Javascript. Not sure why I am not getting stats anymore (I used to).

-Chris

Ship public gem

Theres already a linguist gem, we'll take github-linguist.

Shipped Classifier cannot be trained

It's not clear whether this limitation is intentional or if this is a side effect of the YAML loading, but it's not possible to update the Classifier instance with a new language.

I'm trying to learn new languages to the already existing classifier at the smallest cost possible and I'm trying to follow the following workplan:

Add new languages in the classifier and train the classifier with an "adequate" volume of data
Reduce the number of tokens for the new languages so that the number of classified tokens remains low to preserve performance (according to the rdoc, #gc should be the one I have to call, but according to the source, it does not do anything. It this something you plan to implement ?)

Do you think this is an acceptable use of your library ?

Right now, I'm duck typing Language to feed Classifier#train, this seems to be enough for it to work. Because the Classifier is not dependent on Language at all, maybe #train could simply use a String as parameter (and #classify return Strings too). This would greatly simplifies the interop with your lib :-)

Following, a simple test-case and patch that allows the test-case to pass.

Cheers,
Pierre.

Nimrod .nim files no longer are recognized after some change to linguist

ugh, anyone with Ruby experience want to figure out why github's linguist does not consider .nim files to be the Nimrod language anymore?

I'm quite sure it fails on my comp because I have the latest Ruby version and it doesn't support it.

I don't know what I need, all I want is to get linguist to run.

I also noticed that linguist fails with an error:

custom_require.rb:36:in `require': cannot load such file -- pygments (LoadError)

But I can't find any "gem install pygments"

Do people really HAVE to use bundler in order to try linguist? I don't like bundler
at all, it messes up things in ways I don't want to. :(

All we have to find out is why linguist no longer recognizes .nim files

Nimrod:
type: programming
color: "#37775b"
primary_extension: .nim
extensions:

.nimrod

It should work but it does not.

(.nim are default extensions for nimrod files)

.t far too generic for perl

We need to check these files contents. See this repo's tests. They're not perl.

Deep content inspection tweaking

I found the place where #! files are analyzed for the right language, but I don't see anywhere a way to extend it. In our case, the simplest way to identify a Racket file would be to look for a #lang line (see example here). A less precise but possibly more broadly useful heuristic is to look for an exec foo line near the top of the file.

Either way, it's not clear whether this is intended to be customizeable, and if so, how to do it.

Scores sent back by the lib are curious

Hello,

The documentation states that it should returns floats. On my installation, it returns negative numbers:

[[#<Linguist::Language name=PHP>, -66.98989614319586],
 [#<Linguist::Language name=JavaScript>, -68.77510897386178],
 [#<Linguist::Language name=Ruby>, -70.7837674453772],
 [#<Linguist::Language name=Perl>, -71.16156437444059],
 [#<Linguist::Language name=Gosu>, -72.90117504252562],
 [#<Linguist::Language name=Python>, -73.0532406574862],
 [#<Linguist::Language name=Objective-C>, -74.10993364147689],
 [#<Linguist::Language name=TeX>, -77.81775680913668],
 [#<Linguist::Language name=Java>, -78.66295010514327],
 [#<Linguist::Language name=Kotlin>, -79.19112391377584],
 [#<Linguist::Language name=Scala>, -79.596874273976],
 [#<Linguist::Language name=C++>, -80.16597822216151],
 [#<Linguist::Language name=CoffeeScript>, -83.44077180874064],
 [#<Linguist::Language name=Apex>, -83.80881093343098],
 [#<Linguist::Language name=C>, -85.47097078986161],
 [#<Linguist::Language name=AppleScript>, -85.68956917025051],
 [#<Linguist::Language name=SCSS>, -86.60214237229394],
 [#<Linguist::Language name=Groovy>, -86.89541966825266],
 [#<Linguist::Language name=Shell>, -87.43588353355483],
 [#<Linguist::Language name=Dart>, -87.459050333217],
 [#<Linguist::Language name=Coq>, -88.6740351917743],
 [#<Linguist::Language name=Rust>, -93.09294395196528],
 [#<Linguist::Language name=Nemerle>, -93.21419319559817],
 [#<Linguist::Language name=PowerShell>, -93.51902834727619],
 [#<Linguist::Language name=Arduino>, -93.5392310545937],
 [#<Linguist::Language name=Opa>, -93.78609113252523],
 [#<Linguist::Language name=XQuery>, -93.83645881136175],
 [#<Linguist::Language name=R>, -94.21217552783614],
 [#<Linguist::Language name=Delphi>, -94.35016127081002],
 [#<Linguist::Language name=SuperCollider>, -94.40855958019455],
 [#<Linguist::Language name=Verilog>, -94.8229388269385],
 [#<Linguist::Language name=OpenCL>, -96.50244013644215],
 [#<Linguist::Language name=Groovy Server Pages>, -96.56948552051941],
 [#<Linguist::Language name=Racket>, -97.8652823987905],
 [#<Linguist::Language name=OCaml>, -99.6352432871025],
 [#<Linguist::Language name=Matlab>, -101.76930665936734],
 [#<Linguist::Language name=XML>, -101.8170795450655],
 [#<Linguist::Language name=Haml>, -102.25666430330622],
 [#<Linguist::Language name=Scilab>, -102.64814316943966],
 [#<Linguist::Language name=INI>, -102.66212941141441],
 [#<Linguist::Language name=Logtalk>, -103.5329577692118],
 [#<Linguist::Language name=GAS>, -103.96895960118005],
 [#<Linguist::Language name=Sass>, -104.20257445236155],
 [#<Linguist::Language name=Turing>, -104.82161366076778],
 [#<Linguist::Language name=OpenEdge ABL>, -105.1428606897919],
 [#<Linguist::Language name=VimL>, -112.11353183520714],
 [#<Linguist::Language name=Standard ML>, -112.11353183520714],
 [#<Linguist::Language name=Nu>, -112.80667901576709],
 [#<Linguist::Language name=Parrot Assembly>, -112.80667901576709],
 [#<Linguist::Language name=Scheme>, -112.80667901576709],
 [#<Linguist::Language name=Julia>, -112.80667901576709],
 [#<Linguist::Language name=Ioke>, -112.80667901576709],
 [#<Linguist::Language name=Rebol>, -112.80667901576709],
 [#<Linguist::Language name=Parrot Internal Representation>,  -112.80667901576709],
 [#<Linguist::Language name=Emacs Lisp>, -112.80667901576709],
 [#<Linguist::Language name=Tea>, -112.80667901576709],
 [#<Linguist::Language name=Nimrod>, -112.80667901576709],
 [#<Linguist::Language name=VHDL>, -112.80667901576709],
 [#<Linguist::Language name=Diff>, -112.80667901576709],
 [#<Linguist::Language name=Markdown>, -112.80667901576709],
 [#<Linguist::Language name=Visual Basic>, -112.80667901576709],
 [#<Linguist::Language name=Prolog>, -112.80667901576709],
 [#<Linguist::Language name=AutoHotkey>, -112.80667901576709],
 [#<Linguist::Language name=XSLT>, -112.80667901576709],
 [#<Linguist::Language name=YAML>, -112.80667901576709]]

Still the results are in the correct order...

ruby --version
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]

The same behavior on x86_64 linux.

foundation detected as PHP

foundation detected as ~75% php.

But php files in foundation use a lot of php and one to three php instructions.

It should be detected as ~70% html and ~5% php

README

Write up a more complete README.

add .pl as Prolog extension

At the moment it is recognized as Perl.

edit: Spelling. Both English and Perl are not my native language ;-)

syntax highlighting for Coq .v files

Pygments now supports Coq .v files. See https://bitbucket.org/birkenfeld/pygments-main/issue/734/support-for-coq

Would it be possible to get this into Github?

Thanks.

language detection doesnt seem to update

create a repo like mine: https://github.com/borgified/linguist-test

populate with a couple perl scripts
commit and push to github
observe that linguist detects it as "perl"
delete all the perl scripts, replace with a whole bunch of php scripts
commit and push to github
observe that linguist still detects repo as "perl"

Nimrods, the whole lotta them.

Shall we add highlighting for it? https://github.assistly.com/agent/case/2839

do not process files in .linguist-ignore

It would be nice if linguist would be able to read a .linguist-ignore file at the root of the project (or any other name) to be able to not process some files. These files (which can either be auto-generated or imported) are usually not in the same language that the initial project, and may become eventually quite big, so making the statistics completely wrong.

If you thing that feature is useful, I'm happy to propose a patch.

Crucial invalid detection on Play!

play framework is a Java framework and I believe has a sloc ≥90% of Java. However it shows 76% of it is Python. What could be possibly wrong?

Git commit

I would be nice to have a highlighter for git commits, so I could paste the output of git show around "```commit" and it would look nice.

(p.s., I know about the diff highlighter, I'm mainly talking about making the message and the metadata before it look nice)

Blog Post

Draft Blog Post.

Coq / Verilog Misdetections

Linguist is getting Verilog and Coq confused (see Verilog projects
included in https://github.com/languages/Coq and Coq projects included
in https://github.com/languages/Verilog). Both use .v files. I've gone
through the commit history and the first place that I can get it to
fail is at 4484011, however it may be
failing one commit before that at
c114d71. I can't tell for the latter
commit as that fails the Matlab / obj-c case first. Everything passes
if you go one commit earlier.

I'm using some of my Verilog files to test it, specifically, the files
sitting in https://github.com/seldridge/verilog, and linguist just
isn't having it. Linguist continues to pass for the one test file
(sha-256-functions.v) currently in use. I'm no Ruby guy, so I haven't
attempted to look into this in any significant depth beyond the regex
in blob_helper.rb. This doesn't seem to be the issue as it's picking
up the important matches in my testcases, namely comment structure and
the "module" keyword.

Ruby library marked as Javascript

I started a Ruby projects thats a Rails generator. In github search, its categorized as Javascript project. Github support said linguist is behind the project categorization, so I though I'd file an issue.

Project: https://github.com/joshcrews/flexible_admin

https://github.com/search?type=Repositories&language=&q=flexible+admin

Upgrade to Pygments 1.5

Depends on pygments/pygments.rb#15, unless linguist is still using github/albino in production.
#129 depends on this issue.

python wsgi

There should be support for .wsgi files, they contain python code so it´s just another python file extension..
links: http://en.wikipedia.org/wiki/Wsgi , http://www.python.org/dev/peps/pep-3333/

Add Lasso programming language

We've submitted our pull request to Pygments to add Lasso as a programming language, and it's been accepted! Lasso now has a lexer:

https://bitbucket.org/birkenfeld/pygments-main/pull-request/95/new-lexer-for-the-lasso-language

What do I need to do next to get Lasso added into Linguist? I need to know which files I should edit for my pull request. Thank you!

JS suppression false positives

https://github.com/mishoo/UglifyJS/pull/172/files

Some JS files with just a couple long lines are getting marked as minified.

Description of test-suite running

The last part of the README file talks about using some bundle thing, which I guess is some ruby utility. Maybe add some more exact description for the uninitiated masses?

Binary detection issues on extensionless files

Check out the md, txt, and zip files in this repo. They all contain the same content, but the zip file is presented as a binary would be. That's not right!

Add .elf extension

I think that all *.elf files should be marked as binary automatically (without reading the file)

Invalid gemspec (missing authors)

I'm receiving the following error when I try to install linguist via bundle:

linguist at /usr/lib64/ruby/gems/1.9.1/bundler/gems/linguist-d8903afc12b1 did not have a valid gemspec.
This prevents bundler from installing bins or native extensions, but that may not affect its functionality.
The validation message from Rubygems was:
authors may not be empty

If I clone linguist locally and add an authors line to the .gemspec file, it works fine.

I'm on ruby 1.9.1

Drop mime-types

Try to get our current mime-type extensions pushed upstream to the mime-types lib. Then try to decouple integration from Linguist. Language detection shouldn't be dependent on any sort of mime type.

Modelica everywhere!

Since the new language breakdown bar was introduced, I keep seeing the Modelica language in most of my repositories, even if I didn't even know such a language existed.

Example: http://i.imgur.com/akW7P.png

https://github.com/scribu/wp-pagenavi

MaxMSP files still not recognized

Hello,

few weeks ago (remember ? #208) we added MaxMSP samples in the JSON folder ; but now files are detected as JavaScript. MaxMSP code/patcher is a graph of objects, dynamically load at runtime ; it is save as JSON but have nothing related to JavaScript.

IMHO the only solution should be to add extensions to "languages.yml" : ".mxt" is the old format (Max 4) ; Since Max 5 the extensions are ".maxpat" and ".maxhelp".

Objective-C wrong recognition

I can't understand why linguist detect my main project language as Objective-C. It's completely written in C++ (Qt). I don't know Ruby language, so I can't find problem. Can anyone help me?

P.S. My project does not have any *.mm or *.m files. It has only *.h, *.cpp, *.ui, *.qrc, *.css, *.png files.
P.P.S. Problem in GitHub "language color bar" (at the right top of repo page). It's OK with main language.

Pull Request Failure

Travisbot failed this request: #216

To be honest, fairly new to Github and while it looked like contributing to linguist would prove straightforward, something has clearly gone awry. Any idea what?

C code detected as Objective C

Hello,

I have a repository in Github, the Refu Library, which is a pure C project. For some reason the majority of the source files are identified as Objective C and so the project itself is tagged as Objective C. Here is the repository:
http://github.com/LefterisJP/Refu/

I have no knowledge of Ruby so I can't understand how the Linguist project works to find the problem. Any assistance with this matter will be appreciated.

Prolog files misclassified as Perl files

Prolog files are once again misclassified as Perl files. The disambiguation code seems to have been removed. The current specs for Prolog defines "primary_extension" as ".prolog", which nobody in the Prolog programming community uses and ever used. The default extension for Prolog is ".pl" (long before Perl ever existed). How to get the disambiguation functionality back?

Matlab extension .m

I've seen you consider Matlab's extension as .matlab, however it is popular to use .m (one of the standard extensions).

I know this conflicts with Objective-C's m files, but it would be interesting to have an option to make syntax checks to guess the extension in dubious cases.

This is confusing to me, as I have both Objective-C and Matlab repositories.

Lazy load repository blobs

https://github.com/github/linguist/blob/master/lib/linguist/repository.rb#L27

Repository requires all the repo blobs be allocated at once. We need to defer this for larger repos.

`startinline` option for PHP highlighting

I didn't see a way to pass options to each lexer from languages.yml but it would be great to have the startinline option in Pygments turned on for PHP. See Lexars for web-related languages and markup under PhpLexer:

startinline
If given and True the lexer starts highlighting with php code (i.e.: no starting <?php required).
The default is False.

Ideally, this sample snippet of PHP code from the Symfony2 project would be highlighted with ```php without having to include <?php:

/**
 * Client simulates a browser and makes requests to a Kernel object.
 *
 * @author Fabien Potencier <[email protected]>
 *
 * @api
 */
class Client extends BaseClient
{
    protected $kernel;

    /**
     * Constructor.
     *
     * @param HttpKernelInterface $kernel    An HttpKernel instance
     * @param array               $server    The server parameters (equivalent of $_SERVER)
     * @param History             $history   A History instance to store the browser history
     * @param CookieJar           $cookieJar A CookieJar instance to store the cookies
     */
    public function __construct(HttpKernelInterface $kernel, array $server = array(), History $history = null, CookieJar $cookieJar = null)
    {
        $this->kernel = $kernel;

        parent::__construct($server, $history, $cookieJar);

        $this->followRedirects = false;
    }
}

Allow specifying an ignore file for language statistics

Some repositories (like SignalR), have samples that include common javascript libraries like jQuery etc. and github ends up classifying the project as javascript instead of C# (in this particular case). Nothing is wrong with this at a high level since jQuery is javascript, but for project maintainers that want more control over statistics need a way to opt out of this behavior.

I see 2 options:

Short term hack: Exclude commonly used js files. This will handle some scenarios but you'll have to exclude multiple versions of the library (unless you had wildcard support).
Longer term solution: Allow a repository to have a .lignore or equivalent (I suck a naming) that uses glob syntax to exclude files to be processed for language statistics.

Classifier#to_yaml fails with shipped Classifier

Hi,

I'm trying to train the Classifier and hence to serialize it to disk. I run into an issue while trying to serialize the default Classifier:


irb(main):006:0>  Linguist::Classifier.instance.to_yaml($STDOUT)
ArgumentError: comparison of Array with Array failed
        from /home/oct/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/github-linguist-2.0.1/lib/linguist/classifier.rb:172:in `sort'
        from /home/oct/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/github-linguist-2.0.1/lib/linguist/classifier.rb:172:in `block in to_yaml'
        from /home/oct/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/github-linguist-2.0.1/lib/linguist/classifier.rb:170:in `each'
        from /home/oct/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/github-linguist-2.0.1/lib/linguist/classifier.rb:170:in `to_yaml'
        from (irb):6
        from /home/oct/.rbenv/versions/1.9.3-p194/bin/irb:12:in `<main>'

Add .psd1 extension

Add .psd1 (module manifest) into the PowerShell syntax group

Erlang escript bundle is treated as JavaScript

Escript bundle is a compressed Erlang script. Linguist detect it incorrectly as a JavaScript:

$ file ./rebar
./rebar: a escript script text executable
$ linguist ./rebar
./rebar: 0 lines (0 sloc)
  type:      Binary
  mime type: text/plain
  language:  JavaScript
$

...so many Erlang projects that are shipped with rebar build tool script may be detected as JavaScript projects alghough they are pure-Erlang!

Linguist::Blob does not exist

In the Readme, the example is:

Linguist::Blob.new("linguist.rb")

But that class does not exist.

pygments updated with improved version of autohotkey lexer

Could you please update your pygments. There is an updated version of the autohotkey lexer in it that is much better.
https://bitbucket.org/birkenfeld/pygments-main/changeset/1c549d7cb1db
Thanks

Move shebang script detection to classifier

The Classifier should be able to pick up on shebang scripts and detect them correctly.

Deprecate Pathname

Drop Linguist::Pathname.

Binary files detected as Perl

Compile in Linux this simple assembly program using ("as exit.s -o exit.o;ld exit.o -o exit;rm exit.o"):
.section .data
.section .text
.globl _start
_start:
movq $111, %rdi
movq $60, %rax
syscall
And run "bundle exec linguist folder" you will see this:
88% Perl
12% Assembly

github-linguist / linguist Goto Github PK

linguist's Introduction

Linguist

Documentation

Installation

Dependencies

Usage

Application usage

Command line usage

Git Repository

Additional options

--rev REV

--breakdown

--json

Single file

Docker

Contributing

License

linguist's People

Contributors

Stargazers

Watchers

Forkers

linguist's Issues

Recommend Projects

Recommend Topics

Recommend Org

`--rev REV`

`--breakdown`

`--json`