src-d / enry Goto Github PK
View Code? Open in Web Editor NEWA faster file programming language detector
Home Page: https://blog.sourced.tech/post/enry/
License: Apache License 2.0
A faster file programming language detector
Home Page: https://blog.sourced.tech/post/enry/
License: Apache License 2.0
django/djnago project: *.mo files are recognized as Modelica but are really some homebrew localization format.
"GCC Machine Description": [
"src/github.com/toqueteos/trie/README.md",
"src/gopkg.in/toqueteos/substring.v1/README.md"
],
it looks like so far only MacOS and Linux binaries are included.
Right now enry works on checked out repositories. However, the project it was ported from linguist does work on git repositories.
Advantages of supporting this include that files matching.gitignore
are skipped and it works on repositories that are bare. The git driver could be go-git or shell outs. The latter option would require a git binary however.
We cannot quickly see the version of the binary and it is critical to report bugs.
Please do not hardcode it but rather use smth like https://gist.github.com/jaymecd/31574e7efcd159aecd01 https://www.atatus.com/blog/golang-auto-build-versioning/
When I try to process https://github.com/willfarrell/Browsers I get the following error:
panic: runtime error: index out of range
goroutine 1 [running]:
gopkg.in/src-d/enry%2ev1.getInterpreter(0xc42198d080, 0x81, 0x281, 0x12, 0xbec0a0)
/tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:289 +0x396
gopkg.in/src-d/enry%2ev1.GetLanguagesByShebang(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0xbeb730, 0x0, 0x0, 0x0, 0x0, ...)
/tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:270 +0x43
gopkg.in/src-d/enry%2ev1.GetLanguages(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0x0, 0x0, 0xc420e3cd60)
/tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:126 +0x127
gopkg.in/src-d/enry%2ev1.GetLanguage(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0x0, 0x0)
/tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:38 +0x53
main.main.func1(0xc4200ea230, 0x6c, 0xbbd740, 0xc421d17a00, 0x0, 0x0, 0x0, 0x0)
/tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/cli/enry/main.go:80 +0x664
path/filepath.walk(0xc4200ea230, 0x6c, 0xbbd740, 0xc421d17a00, 0xc4209915f0, 0x0, 0x0)
/usr/lib/go-1.8/src/path/filepath/path.go:351 +0x81
path/filepath.walk(0xc4215483c0, 0x59, 0xbbd740, 0xc421d17930, 0xc4209915f0, 0x0, 0x0)
/usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc4215482a0, 0x53, 0xbbd740, 0xc421d17860, 0xc4209915f0, 0x0, 0x0)
/usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc420bd5d60, 0x4a, 0xbbd740, 0xc421d17790, 0xc4209915f0, 0x0, 0x0)
/usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc420173680, 0x33, 0xbbd740, 0xc420954a90, 0xc4209915f0, 0x0, 0x30)
/usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.Walk(0xc420173680, 0x33, 0xc4209915f0, 0x0, 0xc4209915c0)
/usr/lib/go-1.8/src/path/filepath/path.go:398 +0x14c
main.main()
Here is the full log:
stderr.txt
If you have strange symlinks in your directory, enry can not handle it and produce output about errors.
Assume we are in some directory and there is no target
subdirectory.
mkdir temp
cd temp
ln -s ../temp/ tmp
ln -s ../target tmp1
ln -s tmp2 tmp2
cd ..
ll temp
shows
total 24
lrwxr-xr-x 1 k staff 8B Jul 27 11:04 tmp -> ../temp/
lrwxr-xr-x 1 k staff 9B Jul 27 11:04 tmp1 -> ../target
lrwxr-xr-x 1 k staff 4B Jul 27 11:04 tmp2 -> tmp2
And call enry temp
shows
2017/07/27 11:05:34 read /Users/k/work/rep/ast2vec/temp/tmp: is a directory
2017/07/27 11:05:34 open /Users/k/work/rep/ast2vec/temp/tmp1: no such file or directory
2017/07/27 11:05:34 open /Users/k/work/rep/ast2vec/temp/tmp2: too many levels of symbolic links
So, there are some problems with symlinks handling.
Here is the error:
user@vm:~$ go get gopkg.in/src-d/enry.v1/...
# gopkg.in/src-d/enry.v1/cmd/enry
go/src/gopkg.in/src-d/enry.v1/cmd/enry/main.go:213: undefined: sort.Slice
The cmd is executed on Ubuntu 16.04.4 LTS, a VM with linux kernel 4.4.0-124-generic
When you use go generate when you have the .linguist in an old commit state it generates the code expected for that commit, but the commit hash from what is supposed to be extracted is still the last commit that .linguist has in his history.
In order to be easy to install and use enry, we should provide a precompiled version of it, for linux and osx.
Generated file commit.go hasn't been moved to package data
The CI is failing due to using the master branch of linguist
with a code generated with another commit of linguist
It would be nice to have proper
In terminal
Actual usage as of v1.2.0
go get go get github.com/src-d/simple-linguist/cli/slinguist
slinguist --help
Usage of slinguist:
Expected: to have some explanation about flags, usage patterns --breakdown
, json
, etc.
Here is example of original github/linguist
linguist --help
Linguist v5.0.8
Detect language type for a file, or, given a directory, determine language breakdown.
Usage: linguist <path>
linguist <path> [--breakdown] [--json]
linguist [--breakdown] [--json]
I run enry in the root of the following tree:
├── [4.0K] ast2vec
│ ├── [ 526] bblfsh_roles.py
│ ├── [2.2K] bigartm.py
│ ├── [5.1K] bow.py
│ ├── [8.9K] cloning.py
│ ├── [1.6K] coocc.py
│ ├── [2.1K] df.py
│ ├── [ 314] dump.py
│ ├── [3.8K] enry.py
│ ├── [1.9K] id2vec.py
│ ├── [ 11K] id_embedding.py
│ ├── [ 773] __init__.py
│ ├── [ 15K] __main__.py
│ ├── [4.0K] model2
│ │ ├── [4.9K] base.py
│ │ ├── [ 0] __init__.py
│ │ ├── [2.7K] join_bow.py
│ │ ├── [6.2K] proxbase.py
│ │ ├── [2.3K] prox.py
│ │ ├── [4.0K] __pycache__
│ │ │ ├── [5.1K] base.cpython-35.pyc
│ │ │ ├── [ 140] __init__.cpython-35.pyc
│ │ │ ├── [2.8K] join_bow.cpython-35.pyc
│ │ │ ├── [6.2K] proxbase.cpython-35.pyc
│ │ │ ├── [3.1K] prox.cpython-35.pyc
│ │ │ ├── [4.4K] source2bow.cpython-35.pyc
│ │ │ └── [3.4K] source2df.cpython-35.pyc
│ │ ├── [3.2K] source2bow.py
│ │ └── [2.4K] source2df.py
│ ├── [ 85] modelforgecfg.py
│ ├── [ 804] pickleable_logger.py
│ ├── [4.0K] __pycache__
│ │ ├── [ 654] bblfsh_roles.cpython-35.pyc
│ │ ├── [2.5K] bigartm.cpython-35.pyc
│ │ ├── [6.9K] bow.cpython-35.pyc
│ │ ├── [8.9K] cloning.cpython-35.pyc
│ │ ├── [2.3K] coocc.cpython-35.pyc
│ │ ├── [3.3K] df.cpython-35.pyc
│ │ ├── [ 461] dump.cpython-35.pyc
│ │ ├── [3.9K] enry.cpython-35.pyc
│ │ ├── [2.8K] id2vec.cpython-35.pyc
│ │ ├── [ 11K] id_embedding.cpython-35.pyc
│ │ ├── [1.2K] __init__.cpython-35.pyc
│ │ ├── [ 11K] __main__.cpython-35.pyc
│ │ ├── [1.8K] meta.cpython-35.pyc
│ │ ├── [7.7K] model.cpython-35.pyc
│ │ ├── [ 236] modelforgecfg.cpython-35.pyc
│ │ ├── [2.4K] nbow.cpython-35.pyc
│ │ ├── [1.4K] pickleable_logger.cpython-35.pyc
│ │ ├── [ 819] progress_bar.cpython-35.pyc
│ │ ├── [4.9K] publish.cpython-35.pyc
│ │ ├── [1.0K] resolve_symlink.cpython-35.pyc
│ │ ├── [2.6K] source.cpython-35.pyc
│ │ ├── [ 15K] swivel.cpython-35.pyc
│ │ ├── [2.1K] token_parser.cpython-35.pyc
│ │ ├── [4.6K] topics.cpython-35.pyc
│ │ ├── [4.4K] uast.cpython-35.pyc
│ │ ├── [1.9K] uast_ids_to_bag.cpython-35.pyc
│ │ ├── [1.9K] voccoocc.cpython-35.pyc
│ │ └── [1.3K] vw_dataset.cpython-35.pyc
│ ├── [4.0K] repo2
│ │ ├── [ 21K] base.py
│ │ ├── [3.3K] cooccbase.py
│ │ ├── [1.3K] coocc.py
│ │ ├── [ 0] __init__.py
│ │ ├── [2.1K] nbow.py
│ │ ├── [4.0K] __pycache__
│ │ │ ├── [ 20K] base.cpython-35.pyc
│ │ │ ├── [3.8K] cooccbase.cpython-35.pyc
│ │ │ ├── [2.6K] coocc.cpython-35.pyc
│ │ │ ├── [ 139] __init__.cpython-35.pyc
│ │ │ ├── [3.0K] nbow.cpython-35.pyc
│ │ │ ├── [2.6K] source.cpython-35.pyc
│ │ │ ├── [2.2K] uast.cpython-35.pyc
│ │ │ ├── [1.5K] voccoocc.cpython-35.pyc
│ │ │ └── [1.3K] xbow.cpython-35.pyc
│ │ ├── [1.9K] source.py
│ │ ├── [1.6K] uast.py
│ │ └── [1014] voccoocc.py
│ ├── [ 870] resolve_symlink.py
│ ├── [1.9K] source.py
│ ├── [ 20K] swivel.py
│ ├── [4.0K] tests
│ │ ├── [3.8M] bow_1000.asdf
│ │ ├── [4.0K] coocc
│ │ │ ├── [3.5M] astropy_coocc.asdf
│ │ │ ├── [4.8M] django_coocc.asdf
│ │ │ ├── [1.3K] empty_coocc.asdf
│ │ │ ├── [ 17] error.asdf -> ../nbow_1000.asdf
│ │ │ ├── [341K] flask_coocc.asdf
│ │ │ ├── [501K] jinja2_coocc.asdf
│ │ │ └── [6.6M] tensorflow_coocc.asdf
│ │ ├── [ 90K] coocc.asdf
│ │ ├── [6.0K] docfreq_1000.asdf
│ │ ├── [ 502] fake_requests.py
│ │ ├── [1.1M] id2vec_1000.asdf
│ │ ├── [ 458] __init__.py
│ │ ├── [4.0K] merge_bows
│ │ │ ├── [3.7K] nbow_github.com&src-d&ast2vec.asdf
│ │ │ ├── [3.2K] nbow_github.com&src-d&modelforge.asdf
│ │ │ └── [2.0K] nbow_github.com&src-d&vecino.asdf
│ │ ├── [ 538] models.py
│ │ ├── [3.8M] nbow_1000.asdf
│ │ ├── [4.0K] postproc
│ │ │ ├── [1.9M] col_embedding.tsv
│ │ │ ├── [798K] col_embedding.tsv.gz
│ │ │ ├── [1.9M] row_embedding.tsv
│ │ │ └── [797K] row_embedding.tsv.gz
│ │ ├── [4.0K] __pycache__
│ │ │ ├── [1.4K] fake_requests.cpython-35.pyc
│ │ │ ├── [ 676] __init__.cpython-35.pyc
│ │ │ ├── [ 707] models.cpython-35.pyc
│ │ │ ├── [2.2K] test_bow2vw.cpython-35.pyc
│ │ │ ├── [3.2K] test_bow.cpython-35.pyc
│ │ │ ├── [5.3K] test_cloning.cpython-35.pyc
│ │ │ ├── [1.6K] test_coocc.cpython-35.pyc
│ │ │ ├── [2.0K] test_df.cpython-35.pyc
│ │ │ ├── [6.9K] test_dump.cpython-35.pyc
│ │ │ ├── [1.7K] test_enry.cpython-35.pyc
│ │ │ ├── [1.5K] test_id2vec.cpython-35.pyc
│ │ │ ├── [ 10K] test_id_embedding.cpython-35.pyc
│ │ │ ├── [1.3K] test_join_bow.cpython-35.pyc
│ │ │ ├── [3.5K] test_main.cpython-35.pyc
│ │ │ ├── [4.1K] test_model2.cpython-35.pyc
│ │ │ ├── [1.1K] test_pickleable_logger.cpython-35.pyc
│ │ │ ├── [2.4K] test_repo2base.cpython-35.pyc
│ │ │ ├── [3.7K] test_repo2coocc.cpython-35.pyc
│ │ │ ├── [4.3K] test_repo2nbow.cpython-35.pyc
│ │ │ ├── [4.6K] test_repo2source.cpython-35.pyc
│ │ │ ├── [1.6K] test_repo2voccoocc.cpython-35.pyc
│ │ │ ├── [1.3K] test_resolve_symlink.cpython-35.pyc
│ │ │ ├── [1.1K] test_source2df.cpython-35.pyc
│ │ │ ├── [1.9K] test_source.cpython-35.pyc
│ │ │ ├── [1.9K] test_token_parser.cpython-35.pyc
│ │ │ └── [1.2K] test_voccoocc.cpython-35.pyc
│ │ ├── [4.0K] source
│ │ │ ├── [ 80K] [email protected]
│ │ │ ├── [ 71K] [email protected]
│ │ │ ├── [ 79K] [email protected]
│ │ │ ├── [ 78K] [email protected]
│ │ │ ├── [1.9K] test_example.asdf
│ │ │ └── [ 33] test_example.py
│ │ ├── [4.0K] swivel
│ │ │ ├── [ 15K] col_sums.txt
│ │ │ ├── [ 27K] col_vocab.txt
│ │ │ ├── [ 15K] row_sums.txt
│ │ │ ├── [ 27K] row_vocab.txt
│ │ │ ├── [ 10M] shard-000-000.pb
│ │ │ └── [3.2M] shard-000-000.pb.gz
│ │ ├── [ 959] test_bigartm.py
│ │ ├── [1.8K] test_bow2vw.py
│ │ ├── [2.0K] test_bow.py
│ │ ├── [5.2K] test_cloning.py
│ │ ├── [ 878] test_coocc.py
│ │ ├── [1.3K] test_df.py
│ │ ├── [6.0K] test_dump.py
│ │ ├── [1.1K] test_enry.py
│ │ ├── [1.0K] test_id2vec.py
│ │ ├── [9.1K] test_id_embedding.py
│ │ ├── [ 926] test_join_bow.py
│ │ ├── [3.2K] test_main.py
│ │ ├── [2.6K] test_model2.py
│ │ ├── [ 541] test_pickleable_logger.py
│ │ ├── [1.1K] test_prox.py
│ │ ├── [5.7K] test_repo2base.py
│ │ ├── [3.9K] test_repo2coocc.py
│ │ ├── [3.7K] test_repo2nbow.py
│ │ ├── [7.4K] test_repo2source.py
│ │ ├── [3.1K] test_repo2uast.py
│ │ ├── [ 983] test_repo2voccoocc.py
│ │ ├── [ 73] test_repos_list.txt
│ │ ├── [ 993] test_resolve_symlink.py
│ │ ├── [2.2K] test_source2bow.py
│ │ ├── [ 710] test_source2df.py
│ │ ├── [1.6K] test_source.py
│ │ ├── [1.4K] test_token_parser.py
│ │ ├── [2.1K] test_topics.py
│ │ ├── [ 909] test_uast.py
│ │ ├── [ 598] test_voccoocc.py
│ │ ├── [ 36K] topics.asdf
│ │ ├── [702K] topics_readable.txt
│ │ ├── [333K] uast.asdf
│ │ └── [ 88K] voccoocc.asdf
│ ├── [2.0K] token_parser.py
│ ├── [3.8K] topics.py
│ ├── [1.3K] uast_ids_to_bag.py
│ ├── [3.2K] uast.py
│ ├── [1.2K] voccoocc.py
│ └── [1.0K] vw_dataset.py
├── [4.0K] ast2vec.egg-info
│ ├── [ 1] dependency_links.txt
│ ├── [ 51] entry_points.txt
│ ├── [ 974] PKG-INFO
│ ├── [ 229] requires.txt
│ ├── [1.0K] SOURCES.txt
│ └── [ 8] top_level.txt
├── [ 24M] bow_matplotlib.asdf
├── [124M] decorr_readable.txt.gz
├── [4.0K] dist
│ ├── [ 21K] ast2vec-0.1.0a0.tar.gz
│ ├── [ 23K] ast2vec-0.1.1a0.tar.gz
│ ├── [ 23K] ast2vec-0.1.2a0.tar.gz
│ ├── [ 36K] ast2vec-0.2.0a0.tar.gz
│ ├── [ 36K] ast2vec-0.2.1a0.tar.gz
│ ├── [ 38K] ast2vec-0.2.2a0.tar.gz
│ ├── [ 38K] ast2vec-0.2.3a0.tar.gz
│ ├── [ 38K] ast2vec-0.2.4a0.tar.gz
│ └── [ 38K] ast2vec-0.2.5a0.tar.gz
├── [4.0K] doc
│ ├── [1.9K] ast2vec.rst
│ ├── [2.5K] ast2vec.tests.rst
│ ├── [4.0K] _build
│ │ ├── [4.0K] doctrees
│ │ │ ├── [263K] ast2vec.doctree
│ │ │ ├── [135K] ast2vec.tests.doctree
│ │ │ ├── [3.5M] environment.pickle
│ │ │ ├── [5.1K] index.doctree
│ │ │ └── [2.5K] modules.doctree
│ │ └── [4.0K] html
│ │ ├── [ 87K] ast2vec.html
│ │ ├── [ 52K] ast2vec.tests.html
│ │ ├── [ 36K] genindex.html
│ │ ├── [5.9K] index.html
│ │ ├── [4.0K] _modules
│ │ │ ├── [4.0K] ast2vec
│ │ │ │ ├── [8.0K] coocc.html
│ │ │ │ ├── [9.6K] df.html
│ │ │ │ ├── [9.8K] dump.html
│ │ │ │ ├── [ 14K] enry.html
│ │ │ │ ├── [8.6K] id2vec.html
│ │ │ │ ├── [ 57K] id_embedding.html
│ │ │ │ ├── [8.2K] meta.html
│ │ │ │ ├── [ 36K] model.html
│ │ │ │ ├── [ 13K] nbow.html
│ │ │ │ ├── [ 26K] publish.html
│ │ │ │ ├── [ 63K] repo2base.html
│ │ │ │ ├── [ 21K] repo2coocc.html
│ │ │ │ ├── [ 18K] repo2nbow.html
│ │ │ │ ├── [ 81K] swivel.html
│ │ │ │ ├── [4.0K] tests
│ │ │ │ │ ├── [6.5K] fake_requests.html
│ │ │ │ │ ├── [8.7K] test_coocc.html
│ │ │ │ │ ├── [10.0K] test_df.html
│ │ │ │ │ ├── [ 24K] test_dump.html
│ │ │ │ │ ├── [ 11K] test_enry.html
│ │ │ │ │ ├── [8.5K] test_id2vec.html
│ │ │ │ │ ├── [ 40K] test_id_embedding.html
│ │ │ │ │ ├── [ 12K] test_main.html
│ │ │ │ │ ├── [ 32K] test_model.html
│ │ │ │ │ ├── [9.6K] test_nbow.html
│ │ │ │ │ ├── [ 21K] test_publish.html
│ │ │ │ │ ├── [ 15K] test_repo2coocc.html
│ │ │ │ │ └── [ 13K] test_repo2nbow.html
│ │ │ │ └── [5.1K] tests.html
│ │ │ └── [4.6K] index.html
│ │ ├── [5.0K] modules.html
│ │ ├── [1.9K] objects.inv
│ │ ├── [9.2K] py-modindex.html
│ │ ├── [3.2K] search.html
│ │ ├── [ 14K] searchindex.js
│ │ ├── [4.0K] _sources
│ │ │ ├── [1.9K] ast2vec.rst.txt
│ │ │ ├── [2.5K] ast2vec.tests.rst.txt
│ │ │ ├── [ 375] index.rst.txt
│ │ │ └── [ 58] modules.rst.txt
│ │ └── [4.0K] _static
│ │ ├── [ 673] ajax-loader.gif
│ │ ├── [ 10K] alabaster.css
│ │ ├── [ 10K] basic.css
│ │ ├── [ 756] comment-bright.png
│ │ ├── [ 829] comment-close.png
│ │ ├── [ 641] comment.png
│ │ ├── [ 42] custom.css
│ │ ├── [8.0K] doctools.js
│ │ ├── [ 202] down.png
│ │ ├── [ 222] down-pressed.png
│ │ ├── [ 286] file.png
│ │ ├── [258K] jquery-3.1.0.js
│ │ ├── [ 84K] jquery.js
│ │ ├── [ 90] minus.png
│ │ ├── [ 90] plus.png
│ │ ├── [4.1K] pygments.css
│ │ ├── [ 25K] searchtools.js
│ │ ├── [ 34K] underscore-1.3.1.js
│ │ ├── [ 12K] underscore.js
│ │ ├── [ 203] up.png
│ │ ├── [ 214] up-pressed.png
│ │ └── [ 25K] websupport.js
│ ├── [5.0K] conf.py
│ ├── [4.0K] Doc
│ │ └── [3.2K] how_to_use_ast2vec.ipynb
│ ├── [ 375] index.rst
│ ├── [ 850] Makefile
│ ├── [ 58] modules.rst
│ └── [4.0K] _static
├── [ 23M] docfreq_1MM_serial.asdf
├── [1.2M] docfreq_matplotlib.asdf
├── [9.5M] enry
├── [2.3K] gcs.json
├── [ 18K] gimme
├── [1.0G] id2vec_1MM_serial.asdf
├── [ 33] index.json
├── [ 11K] LICENSE
├── [354M] nbow_1MM_serial.asdf
├── [4.0K] README.md
├── [ 214] requirements.txt
├── [1.9K] setup.py
├── [4.0K] source
├── [ 20M] token_docfreq.tsv.gz
└── [5.4K] topic_modeling.md
27 directories, 283 files
1,7G total
/usr/bin/time -v enry
reports:
User time (seconds): 1.08
System time (seconds): 2.06
Percent of CPU this job got: 72%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.31
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4385928
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 108
Minor (reclaiming a frame) page faults: 898379
Voluntary context switches: 3938
Involuntary context switches: 487
Swaps: 0
File system inputs: 3411248
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The RSS is over 4 gigs! Wow! And my system freezes a bit.
Using linguist/samples as a set against run test the following issues were found:
with hello.ms we can't detect the language (Unix Assembly) because we don't have a matcher in contentMatchers (content.go) for Unix Assembly. Linguist use this regexp which we can't port.
all files for SQL language fall to the classifier because we don't parse right this disambiguator expresion for "*.sql" files. This expression doesn't comply with the pattern for the rest of heuristics.rb file.
$ cd /path/to/go/git
$ enry
0.69% Shell
0.34% Markdown
0.34% Makefile
98.28% Go
0.34% Text
Compare with linguist
:
$ linguist
99.74% Go
0.18% Shell
0.09% Makefile
I propose to sort the results by significance.
We would like to call enry from code in the JVM. In order to do that, we need enry compiled as a shared library.
Since linguist --json
does not pretty-print JSON, I suggest to not pretty print in enry --json
too.
Same way as we have Java bindings for enry, wrapping a Go library built with -buildmode=c-shared
it would be nice to have one for Python using ctypes FFI cffi or something similar.
Particular use case: one want to use https://github.com/bblfsh/sonar-checks/ API that would require knowing a language the file-to-be-checked is written in, to choose the write checks.
pip install -e git+https://github.com/src-d/enry.git#egg=python
.whl
(linux, macOS)sourced@sourced-MacBookPro:/tmp/enry$ /home/sourced/Projects/ast2vec/enry -version
flag provided but not defined: -version
/home/sourced/Projects/ast2vec/enry v1.4.0 build: 09-07-2017_08_55_33 commit: 0fe0a97f67, based on linguist commit: 37979b2
/home/sourced/Projects/ast2vec/enry, A simple (and faster) implementation of github/linguist
usage: /home/sourced/Projects/ast2vec/enry <path>
/home/sourced/Projects/ast2vec/enry [-json] [-breakdown] <path>
/home/sourced/Projects/ast2vec/enry [-json] [-breakdown]
flag provided but not defined: -version
Update: I thought that there was such flag defined (the version was printed) but I was wrong. Thus I propose to add -version
to print the version.
Hi,
I'd like to use your library to detect the programming language from user input. So I won't have a file extension. I only have a string with source code. Can I achieve my goal using this project?
Thanks.
Nikolay
go get gopkg.in/src-d/enry.v1/...
package gopkg.in/src-d/enry.v1/internal/code-generator
imports gopkg.in/src-d/simple-linguist.v1/internal/code-generator/generator: use of internal package not allowed
go get gopkg.in/src-d/enry.v1/cli/enry
package gopkg.in/src-d/enry.v1/cli/enry: cannot find package "gopkg.in/src-d/enry.v1/cli/enry" in any of:
/usr/lib/go-1.8/src/gopkg.in/src-d/enry.v1/cli/enry (from $GOROOT)
/home/sourced/Projects/ast2vec/enry/src/gopkg.in/src-d/enry.v1/cli/enry (from $GOPATH)
This works after symlinking github.com/src-d/enry
to gopkg.in/src-d/enry.v1
go get github.com/src-d/enry/cli/enry
github/linguist
CLI has --json
cmdline arg to output the result in JSON. it prints in a human readable format otherwise. We need to be consistent because enry must be a drop-in replacement.
Could be really useful have the codecov reports.
This might already be planned but I was reviewing the PR for the blog post and realised that when coming to the GH repository there is no documentation on the CLI commands (could just be the help
output) in the README. Would be great to add before we announce (similar to pygments, most people will use it initially as a CLI).
Last version available in maven is 1.6.3
but the latest one is 1.6.5
https://search.maven.org/search?q=g:tech.sourced%20AND%20a:enry-java&core=gav
Since we got a 2x performance in the new version, we should update the README.md, or remove it.
enry /path --json
does not output json, e.g.:
94.44% Python
5.56% Text
From the readme:
Why Enry?
In the movie My Fair Lady, Professor Henry Higgins is one of the main characters. Henry is a linguist and at the very beginning of the movie enjoys guessing the nationality of people based on their accent.
Enry Iggins is how Eliza Doolittle, pronounces the name of the Professor during the first half of the movie.
Is he really guessing nationality or more like neighborhood where the person was raised?
Higgins claims that his knowledge of ‘simple phonetics’ (a branch of linguistics concerned with the study of the nature, production, and perception of sounds of speech) allows him to deduce a person’s origins to within six miles. Within London, he says, he can place a man within two miles, ‘sometimes within two streets’. It’s this knowledge that he uses to transform Eliza Doolittle into a socially acceptable semblance of a ‘lady’. The character of Higgins is said to have been inspired by Henry Sweet (1845–1912), a great phonetician whose works, including his History of English
Source: http://blog.oxforddictionaries.com/2013/03/my-fair-lady/
I ran the same source{d} engine pipeline on "100 repos" dataset (/data/siva
on science-3, @bzz knows) nearly 50 times and it worked without errors. On 51st, I got the following stack trace:
unexpected fault address 0x0
fatal error: fault
panic: runtime error: slice bounds out of range
goroutine 17 [running, locked to thread]:
bytes.Count(0x7ff5ec077fe0, 0xa, 0x0, 0x1c42005cc20, 0x1, 0x20, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7ff5ec077fe0, 0xa, 0x0, 0x66, 0x6, 0x7ff5d828aed8)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x7ff5a2524a73, 0xc, 0x1c42001cde0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x1c42005ce48, 0x1c420018500)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x1c420078098, 0x1c4200007e0)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
panic: runtime error: slice bounds out of range
goroutine 53 [running, locked to thread]:
bytes.Count(0x7ff5e4080020, 0xa, 0x0, 0x1c42005fc20, 0x1, 0x20, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7ff5e4080020, 0xa, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x7ff5a2524a73, 0xc, 0x1c4200206e0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x1c42005fe48, 0x1c4200c4040)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x1c4200781d8, 0x1c420085680)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x7ff5a23fbbb1]
goroutine 51 [running, locked to thread]:
runtime.throw(0x7ff5a24caf58, 0x5)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/panic.go:596 +0x97 fp=0x1c42005dc60 sp=0x1c42005dc40
runtime.sigpanic()
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/signal_unix.go:297 +0x290 fp=0x1c42005dcb0 sp=0x1c42005dc60
path/filepath.Base(0x7ff600003dc0, 0x7ff694011000, 0x2, 0x7ff600052360)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/path/filepath/path.go:431 +0x31 fp=0x1c42005dcc0 sp=0x1c42005dcb0
gopkg.in/src-d/enry%2ev1.GetLanguagesByFilename(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:264 +0x3b fp=0x1c42005dcf8 sp=0x1c42005dcc0
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x7ff5a2524a73, 0xc, 0x1c42001e0e0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129 fp=0x1c42005ddb0 sp=0x1c42005dcf8
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x1c42005de48, 0x1c4200c6040)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55 fp=0x1c42005de00 sp=0x1c42005ddb0
main.GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x1c420078138, 0x1c420084cc0)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55 fp=0x1c42005de48 sp=0x1c42005de00
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a fp=0x1c42005de90 sp=0x1c42005de48
runtime.call64(0x0, 0x7ff6197ea008, 0x7ff6197ea0a0, 0x38)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:515 +0x4a fp=0x1c42005dee0 sp=0x1c42005de90
runtime.cgocallbackg1(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:301 +0x1a1 fp=0x1c42005df58 sp=0x1c42005dee0
runtime.cgocallbackg(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86 fp=0x1c42005dfc0 sp=0x1c42005df58
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71 fp=0x1c42005dfe0 sp=0x1c42005dfc0
runtime.goexit()
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1 fp=0x1c42005dfe8 sp=0x1c42005dfe0
goroutine 17 [running, locked to thread]:
goroutine running on other thread; stack unavailable
goroutine 50 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078017, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005bf08)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1
goroutine 52 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078117, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005ef08)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1
goroutine 53 [running, locked to thread]:
goroutine running on other thread; stack unavailable
goroutine 54 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078117, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c4207bbf08)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1
goroutine 55 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078217, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c420058f08)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1
goroutine 56 [syscall, locked to thread]:
runtime.goexit()
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1
goroutine 57 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078217, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005af08)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
I cannot reproduce this easily, it is very rare. Engine version is 0.1.7.
At the moment, when enry
encounters something bad, it poisons the return code.
For example:
enry/bin/enry /home/sourced/Projects/ast2vec
2017/06/16 09:53:51 read /home/sourced/Projects/ast2vec/enry/src/gopkg.in/src-d/enry.v1: is a directory
{
"Go": [
"enry/src/github.com/src-d/enry/alias.go",
"enry/src/github.com/src-d/enry/classifier.go",
"enry/src/github.com/src-d/enry/cli/enry/main.go",
"enry/src/github.com/src-d/enry/common.go",
"enry/src/github.com/src-d/enry/common_test.go",
"enry/src/github.com/src-d/enry/content.go",
"enry/src/github.com/src-d/enry/documentation.go",
"enry/src/github.com/src-d/enry/extension.go",
"enry/src/github.com/src-d/enry/filename.go",
"enry/src/github.com/src-d/enry/frequencies.go",
"enry/src/github.com/src-d/enry/generate.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/aliases.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/documentation.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/extensions.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/filenames.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/generator.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/generator_test.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/heuristics.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/interpreters.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/langinfo.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/samplesfreq.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/types.go",
"enry/src/github.com/src-d/enry/internal/code-generator/generator/vendor.go",
"enry/src/github.com/src-d/enry/internal/code-generator/main.go",
"enry/src/github.com/src-d/enry/internal/tokenizer/tokenize.go",
"enry/src/github.com/src-d/enry/internal/tokenizer/tokenize_test.go",
"enry/src/github.com/src-d/enry/interpreter.go",
"enry/src/github.com/src-d/enry/modeline.go",
"enry/src/github.com/src-d/enry/shebang.go",
"enry/src/github.com/src-d/enry/type.go",
"enry/src/github.com/src-d/enry/utils.go",
"enry/src/github.com/src-d/enry/utils_test.go",
"enry/src/github.com/src-d/enry/vendor.go",
"enry/src/github.com/toqueteos/trie/example_test.go",
"enry/src/github.com/toqueteos/trie/trie.go",
"enry/src/gopkg.in/toqueteos/substring.v1/bytes.go",
"enry/src/gopkg.in/toqueteos/substring.v1/bytes_test.go",
"enry/src/gopkg.in/toqueteos/substring.v1/lib.go",
"enry/src/gopkg.in/toqueteos/substring.v1/lib_test.go",
"enry/src/gopkg.in/toqueteos/substring.v1/string.go",
"enry/src/gopkg.in/toqueteos/substring.v1/string_test.go"
],
"Makefile": [
"enry/src/github.com/src-d/enry/Makefile"
],
"Python": [
"ast2vec/__init__.py",
"ast2vec/__main__.py",
"ast2vec/dataset.py",
"ast2vec/df.py",
"ast2vec/glove_to_shards.py",
"ast2vec/id2vec.py",
"ast2vec/id_embedding.py",
"ast2vec/prep.py",
"ast2vec/repo2base.py",
"ast2vec/repo2coocc.py",
"ast2vec/repo2nbow.py",
"ast2vec/swivel.py"
],
"Ruby": [
"enry/src/github.com/src-d/enry/internal/code-generator/generator/test_files/heuristics.test.rb"
],
"Text": [
"requirements.txt"
]
}
Return code is 2.
This is bad because it breaks any Python subprocess
call as it checks the return code and raises an Exception if it is non-zero. Yet we've only got a non-critical warning.
I propose to return 0 in case of non-critical warnings. This is a standard convention of UNIX apps: if you survived, do not break the errcode.
I decided to list all of the languages that appear in Public Git Archive and I got a couple of surprising results.
Some of the results:
Any of these results above are technically languages, some are protocol, some are libraries, some are ... something else completely!
Should we add a clarification regarding these? I think it'd be interesting to have it for Public Git Archive, currently I get over 455 languages, but I suspect some of them are not technically languages.
sourced@sourced-MacBookPro:/tmp$ git clone https://github.com/src-d/enry &>/dev/null
sourced@sourced-MacBookPro:/tmp$ cd enry
sourced@sourced-MacBookPro:/tmp/enry$ time linguist
99.25% Go
0.35% Shell
0.24% Java
0.08% Ruby
0.05% Makefile
0.01% Scala
0.01% Gnuplot
real 0m1.945s
user 0m1.848s
sys 0m0.056s
sourced@sourced-MacBookPro:/tmp/enry$ time /home/sourced/Projects/ast2vec/enry
3.28% Makefile
63.93% Go
9.84% CSV
6.56% Shell
1.64% Gnuplot
1.64% Text
3.28% Ruby
3.28% Scala
6.56% Java
real 0m0.084s
user 0m0.072s
sys 0m0.008s
Something happened and now enry works even slower than linguist. Seriously, testing on src-d/ast2vec:
time ./enry .
{
"Python": [
"ast2vec/__init__.py",
"ast2vec/__main__.py",
"ast2vec/df.py",
"ast2vec/dump.py",
"ast2vec/enry.py",
"ast2vec/id2vec.py",
"ast2vec/id_embedding.py",
"ast2vec/meta.py",
"ast2vec/model.py",
"ast2vec/nbow.py",
"ast2vec/publish.py",
"ast2vec/repo2base.py",
"ast2vec/repo2coocc.py",
"ast2vec/repo2nbow.py",
"ast2vec/swivel.py",
"ast2vec/tests/__init__.py",
"ast2vec/tests/models.py",
"ast2vec/tests/test_dump.py",
"ast2vec/tests/test_enry.py"
],
"Text": [
"requirements.txt"
]
}
real 0m14.339s
user 0m10.740s
sys 0m1.972s
time linguist
100.00% Python
real 0m2.192s
user 0m2.016s
sys 0m0.104s
Flags must be provided before the path
$ enry -h
enry, A simple (and faster) implementation of github/linguist
usage: enry <path>
enry <path> [-json] [-breakdown]
enry [-json] [-breakdown]
$ enry internal
100.00% Go
$ enry internal -json
100.00% Go
$ enry -json internal
{"Go":["code-generator/generator/aliases.go","code-generator/generator/documentation.go","code-generator/generator/extensions.go","code-generator/generator/filenames.go","code-generator/generator/generator.go","code-generator/generator/generator_test.go","code-generator/generator/heuristics.go","code-generator/generator/interpreters.go","code-generator/generator/langinfo.go","code-generator/generator/linguist-commit.go","code-generator/generator/samplesfreq.go","code-generator/generator/types.go","code-generator/generator/vendor.go","code-generator/main.go","tokenizer/tokenize.go","tokenizer/tokenize_test.go"]}%
When run with -json
on a single file - output is not a JSON
$ go get gopkg.in/src-d/enry.v1/...
$ enry -json ../../github.com/src-d/lookout/vendor/golang.org/x/crypto/cast5/cast5.go
cast5.go: 526 lines (493 sloc)
type: Text
mime_type: text/x-go
language: Go
Same works for a dir:
enry -json ../../github.com/src-d/lookout/vendor/golang.org/x/crypto/cast5
{"Go":["cast5.go"]}
Linguist can analyze a simple file but enry no
In GetLanguageByContent
(https://github.com/src-d/enry/blob/master/common.go#L88) we can see the passed filename is ""
, but in GetLanguagesByContent
(https://github.com/src-d/enry/blob/master/common.go#L382), which is called by the aforementioned function, returns always nil if the extension is not matched, which always happens because GetLanguagesByContent
is explicitly passing an empty string.
We should either fix this or remove this exported function, because it does nothing.
I use Java bindings (maven version 1.6.2).
Here is the log:
panic: runtime error: slice bounds out of range
goroutine 26 [running, locked to thread]:
bytes.Count(0x7fc5dc00e3c0, 0x47, 0x0, 0x1c4206fbc20, 0x1, 0x20, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7fc5dc00e3c0, 0x47, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:166 +0xae
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x7fc5861f7c08, 0x0, 0x0, 0x0, 0x0
, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x7fc585723aae, 0xc, 0x1c420024de0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x1c4206fbe48, 0x1c4209cc040)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x1c42012e1d8, 0x1c420499340)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
Aborted (core dumped)
Any *.pb
file in protobuf format is detected as PureBasic in the most recent release 1.4.
We synchronized with github/linguist in November 2017, an update is long overdue ;)
Latest enry v1.6.7 from Oct 24, 2018 is based on Linguist v5.2.0 commit 4cd558 from Sep 17, 2017.
This is an ☂️ issue with the goal to make enry use of at least at least Linguist v7.1.3 from Dec 12, 2018:
heuristics.yml
instead of heuristics.rb
github/linguist#4087. WIP in #189Calling to GetLanguage
method, sometimes we receive this error:
panic: runtime error: slice bounds out of range
goroutine 17 [running, locked to thread]:
bytes.Count(0x7f627818c0d0, 0x4b, 0x0, 0x1c420038c20, 0x1, 0x20, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x7f62482e6c08, 0x0, 0x0, 0x0, 0x0, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x1c420038e48, 0x1c42001a500)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.