Code Monkey home page Code Monkey logo

enry's People

Contributors

abeaumont avatar ajnavarro avatar bzz avatar campoy avatar creachadair avatar darkowlzz avatar dennwc avatar dpaz avatar dpordomingo avatar dvrkps avatar eiso avatar erizocosmico avatar juanjux avatar lafriks avatar mcarmonaa avatar mcuadros avatar pratik97 avatar silvia-odwyer avatar smola avatar suhaibmujahid avatar vmarkovtsev avatar zjvandeweg avatar zkry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

enry's Issues

Support (bare) git repositories

Right now enry works on checked out repositories. However, the project it was ported from linguist does work on git repositories.

Advantages of supporting this include that files matching.gitignore are skipped and it works on repositories that are bare. The git driver could be go-git or shell outs. The latter option would require a git binary however.

Runtime Error

When I try to process https://github.com/willfarrell/Browsers I get the following error:

panic: runtime error: index out of range

goroutine 1 [running]:
gopkg.in/src-d/enry%2ev1.getInterpreter(0xc42198d080, 0x81, 0x281, 0x12, 0xbec0a0)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:289 +0x396
gopkg.in/src-d/enry%2ev1.GetLanguagesByShebang(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0xbeb730, 0x0, 0x0, 0x0, 0x0, ...)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:270 +0x43
gopkg.in/src-d/enry%2ev1.GetLanguages(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0x0, 0x0, 0xc420e3cd60)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:126 +0x127
gopkg.in/src-d/enry%2ev1.GetLanguage(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0x0, 0x0)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:38 +0x53
main.main.func1(0xc4200ea230, 0x6c, 0xbbd740, 0xc421d17a00, 0x0, 0x0, 0x0, 0x0)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/cli/enry/main.go:80 +0x664
path/filepath.walk(0xc4200ea230, 0x6c, 0xbbd740, 0xc421d17a00, 0xc4209915f0, 0x0, 0x0)
        /usr/lib/go-1.8/src/path/filepath/path.go:351 +0x81
path/filepath.walk(0xc4215483c0, 0x59, 0xbbd740, 0xc421d17930, 0xc4209915f0, 0x0, 0x0)
        /usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc4215482a0, 0x53, 0xbbd740, 0xc421d17860, 0xc4209915f0, 0x0, 0x0)
        /usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc420bd5d60, 0x4a, 0xbbd740, 0xc421d17790, 0xc4209915f0, 0x0, 0x0)
        /usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc420173680, 0x33, 0xbbd740, 0xc420954a90, 0xc4209915f0, 0x0, 0x30)
        /usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.Walk(0xc420173680, 0x33, 0xc4209915f0, 0x0, 0xc4209915c0)
        /usr/lib/go-1.8/src/path/filepath/path.go:398 +0x14c
main.main()

Here is the full log:
stderr.txt

Problem with symlinks to folders

Description

If you have strange symlinks in your directory, enry can not handle it and produce output about errors.

Examples

Assume we are in some directory and there is no target subdirectory.

mkdir temp
cd temp
ln -s ../temp/ tmp
ln -s ../target tmp1
ln -s tmp2 tmp2
cd ..

ll temp shows

total 24
lrwxr-xr-x  1 k  staff     8B Jul 27 11:04 tmp -> ../temp/
lrwxr-xr-x  1 k  staff     9B Jul 27 11:04 tmp1 -> ../target
lrwxr-xr-x  1 k  staff     4B Jul 27 11:04 tmp2 -> tmp2

And call enry temp shows

2017/07/27 11:05:34 read /Users/k/work/rep/ast2vec/temp/tmp: is a directory
2017/07/27 11:05:34 open /Users/k/work/rep/ast2vec/temp/tmp1: no such file or directory
2017/07/27 11:05:34 open /Users/k/work/rep/ast2vec/temp/tmp2: too many levels of symbolic links

So, there are some problems with symlinks handling.

So

  • Can you fix it?
  • Also, sometimes it is better just ignore symlinks at all.
  • May be you can add a flag for it?

undefined: sort.Slice when using go get

Here is the error:

user@vm:~$ go get gopkg.in/src-d/enry.v1/...
# gopkg.in/src-d/enry.v1/cmd/enry
go/src/gopkg.in/src-d/enry.v1/cmd/enry/main.go:213: undefined: sort.Slice

The cmd is executed on Ubuntu 16.04.4 LTS, a VM with linux kernel 4.4.0-124-generic

Go generate commit name

When you use go generate when you have the .linguist in an old commit state it generates the code expected for that commit, but the commit hash from what is supposed to be extracted is still the last commit that .linguist has in his history.

Print usage on `slinguist --help`

It would be nice to have proper
In terminal

Actual usage as of v1.2.0

go get go get github.com/src-d/simple-linguist/cli/slinguist
slinguist --help
Usage of slinguist:

Expected: to have some explanation about flags, usage patterns --breakdown, json, etc.

Here is example of original github/linguist

linguist --help                                                                                       

  Linguist v5.0.8
  Detect language type for a file, or, given a directory, determine language breakdown.

  Usage: linguist <path>
         linguist <path> [--breakdown] [--json]
         linguist [--breakdown] [--json]

enry consumes an extreme amount of memory

I run enry in the root of the following tree:

├── [4.0K]  ast2vec
│   ├── [ 526]  bblfsh_roles.py
│   ├── [2.2K]  bigartm.py
│   ├── [5.1K]  bow.py
│   ├── [8.9K]  cloning.py
│   ├── [1.6K]  coocc.py
│   ├── [2.1K]  df.py
│   ├── [ 314]  dump.py
│   ├── [3.8K]  enry.py
│   ├── [1.9K]  id2vec.py
│   ├── [ 11K]  id_embedding.py
│   ├── [ 773]  __init__.py
│   ├── [ 15K]  __main__.py
│   ├── [4.0K]  model2
│   │   ├── [4.9K]  base.py
│   │   ├── [   0]  __init__.py
│   │   ├── [2.7K]  join_bow.py
│   │   ├── [6.2K]  proxbase.py
│   │   ├── [2.3K]  prox.py
│   │   ├── [4.0K]  __pycache__
│   │   │   ├── [5.1K]  base.cpython-35.pyc
│   │   │   ├── [ 140]  __init__.cpython-35.pyc
│   │   │   ├── [2.8K]  join_bow.cpython-35.pyc
│   │   │   ├── [6.2K]  proxbase.cpython-35.pyc
│   │   │   ├── [3.1K]  prox.cpython-35.pyc
│   │   │   ├── [4.4K]  source2bow.cpython-35.pyc
│   │   │   └── [3.4K]  source2df.cpython-35.pyc
│   │   ├── [3.2K]  source2bow.py
│   │   └── [2.4K]  source2df.py
│   ├── [  85]  modelforgecfg.py
│   ├── [ 804]  pickleable_logger.py
│   ├── [4.0K]  __pycache__
│   │   ├── [ 654]  bblfsh_roles.cpython-35.pyc
│   │   ├── [2.5K]  bigartm.cpython-35.pyc
│   │   ├── [6.9K]  bow.cpython-35.pyc
│   │   ├── [8.9K]  cloning.cpython-35.pyc
│   │   ├── [2.3K]  coocc.cpython-35.pyc
│   │   ├── [3.3K]  df.cpython-35.pyc
│   │   ├── [ 461]  dump.cpython-35.pyc
│   │   ├── [3.9K]  enry.cpython-35.pyc
│   │   ├── [2.8K]  id2vec.cpython-35.pyc
│   │   ├── [ 11K]  id_embedding.cpython-35.pyc
│   │   ├── [1.2K]  __init__.cpython-35.pyc
│   │   ├── [ 11K]  __main__.cpython-35.pyc
│   │   ├── [1.8K]  meta.cpython-35.pyc
│   │   ├── [7.7K]  model.cpython-35.pyc
│   │   ├── [ 236]  modelforgecfg.cpython-35.pyc
│   │   ├── [2.4K]  nbow.cpython-35.pyc
│   │   ├── [1.4K]  pickleable_logger.cpython-35.pyc
│   │   ├── [ 819]  progress_bar.cpython-35.pyc
│   │   ├── [4.9K]  publish.cpython-35.pyc
│   │   ├── [1.0K]  resolve_symlink.cpython-35.pyc
│   │   ├── [2.6K]  source.cpython-35.pyc
│   │   ├── [ 15K]  swivel.cpython-35.pyc
│   │   ├── [2.1K]  token_parser.cpython-35.pyc
│   │   ├── [4.6K]  topics.cpython-35.pyc
│   │   ├── [4.4K]  uast.cpython-35.pyc
│   │   ├── [1.9K]  uast_ids_to_bag.cpython-35.pyc
│   │   ├── [1.9K]  voccoocc.cpython-35.pyc
│   │   └── [1.3K]  vw_dataset.cpython-35.pyc
│   ├── [4.0K]  repo2
│   │   ├── [ 21K]  base.py
│   │   ├── [3.3K]  cooccbase.py
│   │   ├── [1.3K]  coocc.py
│   │   ├── [   0]  __init__.py
│   │   ├── [2.1K]  nbow.py
│   │   ├── [4.0K]  __pycache__
│   │   │   ├── [ 20K]  base.cpython-35.pyc
│   │   │   ├── [3.8K]  cooccbase.cpython-35.pyc
│   │   │   ├── [2.6K]  coocc.cpython-35.pyc
│   │   │   ├── [ 139]  __init__.cpython-35.pyc
│   │   │   ├── [3.0K]  nbow.cpython-35.pyc
│   │   │   ├── [2.6K]  source.cpython-35.pyc
│   │   │   ├── [2.2K]  uast.cpython-35.pyc
│   │   │   ├── [1.5K]  voccoocc.cpython-35.pyc
│   │   │   └── [1.3K]  xbow.cpython-35.pyc
│   │   ├── [1.9K]  source.py
│   │   ├── [1.6K]  uast.py
│   │   └── [1014]  voccoocc.py
│   ├── [ 870]  resolve_symlink.py
│   ├── [1.9K]  source.py
│   ├── [ 20K]  swivel.py
│   ├── [4.0K]  tests
│   │   ├── [3.8M]  bow_1000.asdf
│   │   ├── [4.0K]  coocc
│   │   │   ├── [3.5M]  astropy_coocc.asdf
│   │   │   ├── [4.8M]  django_coocc.asdf
│   │   │   ├── [1.3K]  empty_coocc.asdf
│   │   │   ├── [  17]  error.asdf -> ../nbow_1000.asdf
│   │   │   ├── [341K]  flask_coocc.asdf
│   │   │   ├── [501K]  jinja2_coocc.asdf
│   │   │   └── [6.6M]  tensorflow_coocc.asdf
│   │   ├── [ 90K]  coocc.asdf
│   │   ├── [6.0K]  docfreq_1000.asdf
│   │   ├── [ 502]  fake_requests.py
│   │   ├── [1.1M]  id2vec_1000.asdf
│   │   ├── [ 458]  __init__.py
│   │   ├── [4.0K]  merge_bows
│   │   │   ├── [3.7K]  nbow_github.com&src-d&ast2vec.asdf
│   │   │   ├── [3.2K]  nbow_github.com&src-d&modelforge.asdf
│   │   │   └── [2.0K]  nbow_github.com&src-d&vecino.asdf
│   │   ├── [ 538]  models.py
│   │   ├── [3.8M]  nbow_1000.asdf
│   │   ├── [4.0K]  postproc
│   │   │   ├── [1.9M]  col_embedding.tsv
│   │   │   ├── [798K]  col_embedding.tsv.gz
│   │   │   ├── [1.9M]  row_embedding.tsv
│   │   │   └── [797K]  row_embedding.tsv.gz
│   │   ├── [4.0K]  __pycache__
│   │   │   ├── [1.4K]  fake_requests.cpython-35.pyc
│   │   │   ├── [ 676]  __init__.cpython-35.pyc
│   │   │   ├── [ 707]  models.cpython-35.pyc
│   │   │   ├── [2.2K]  test_bow2vw.cpython-35.pyc
│   │   │   ├── [3.2K]  test_bow.cpython-35.pyc
│   │   │   ├── [5.3K]  test_cloning.cpython-35.pyc
│   │   │   ├── [1.6K]  test_coocc.cpython-35.pyc
│   │   │   ├── [2.0K]  test_df.cpython-35.pyc
│   │   │   ├── [6.9K]  test_dump.cpython-35.pyc
│   │   │   ├── [1.7K]  test_enry.cpython-35.pyc
│   │   │   ├── [1.5K]  test_id2vec.cpython-35.pyc
│   │   │   ├── [ 10K]  test_id_embedding.cpython-35.pyc
│   │   │   ├── [1.3K]  test_join_bow.cpython-35.pyc
│   │   │   ├── [3.5K]  test_main.cpython-35.pyc
│   │   │   ├── [4.1K]  test_model2.cpython-35.pyc
│   │   │   ├── [1.1K]  test_pickleable_logger.cpython-35.pyc
│   │   │   ├── [2.4K]  test_repo2base.cpython-35.pyc
│   │   │   ├── [3.7K]  test_repo2coocc.cpython-35.pyc
│   │   │   ├── [4.3K]  test_repo2nbow.cpython-35.pyc
│   │   │   ├── [4.6K]  test_repo2source.cpython-35.pyc
│   │   │   ├── [1.6K]  test_repo2voccoocc.cpython-35.pyc
│   │   │   ├── [1.3K]  test_resolve_symlink.cpython-35.pyc
│   │   │   ├── [1.1K]  test_source2df.cpython-35.pyc
│   │   │   ├── [1.9K]  test_source.cpython-35.pyc
│   │   │   ├── [1.9K]  test_token_parser.cpython-35.pyc
│   │   │   └── [1.2K]  test_voccoocc.cpython-35.pyc
│   │   ├── [4.0K]  source
│   │   │   ├── [ 80K]  [email protected]
│   │   │   ├── [ 71K]  [email protected]
│   │   │   ├── [ 79K]  [email protected]
│   │   │   ├── [ 78K]  [email protected]
│   │   │   ├── [1.9K]  test_example.asdf
│   │   │   └── [  33]  test_example.py
│   │   ├── [4.0K]  swivel
│   │   │   ├── [ 15K]  col_sums.txt
│   │   │   ├── [ 27K]  col_vocab.txt
│   │   │   ├── [ 15K]  row_sums.txt
│   │   │   ├── [ 27K]  row_vocab.txt
│   │   │   ├── [ 10M]  shard-000-000.pb
│   │   │   └── [3.2M]  shard-000-000.pb.gz
│   │   ├── [ 959]  test_bigartm.py
│   │   ├── [1.8K]  test_bow2vw.py
│   │   ├── [2.0K]  test_bow.py
│   │   ├── [5.2K]  test_cloning.py
│   │   ├── [ 878]  test_coocc.py
│   │   ├── [1.3K]  test_df.py
│   │   ├── [6.0K]  test_dump.py
│   │   ├── [1.1K]  test_enry.py
│   │   ├── [1.0K]  test_id2vec.py
│   │   ├── [9.1K]  test_id_embedding.py
│   │   ├── [ 926]  test_join_bow.py
│   │   ├── [3.2K]  test_main.py
│   │   ├── [2.6K]  test_model2.py
│   │   ├── [ 541]  test_pickleable_logger.py
│   │   ├── [1.1K]  test_prox.py
│   │   ├── [5.7K]  test_repo2base.py
│   │   ├── [3.9K]  test_repo2coocc.py
│   │   ├── [3.7K]  test_repo2nbow.py
│   │   ├── [7.4K]  test_repo2source.py
│   │   ├── [3.1K]  test_repo2uast.py
│   │   ├── [ 983]  test_repo2voccoocc.py
│   │   ├── [  73]  test_repos_list.txt
│   │   ├── [ 993]  test_resolve_symlink.py
│   │   ├── [2.2K]  test_source2bow.py
│   │   ├── [ 710]  test_source2df.py
│   │   ├── [1.6K]  test_source.py
│   │   ├── [1.4K]  test_token_parser.py
│   │   ├── [2.1K]  test_topics.py
│   │   ├── [ 909]  test_uast.py
│   │   ├── [ 598]  test_voccoocc.py
│   │   ├── [ 36K]  topics.asdf
│   │   ├── [702K]  topics_readable.txt
│   │   ├── [333K]  uast.asdf
│   │   └── [ 88K]  voccoocc.asdf
│   ├── [2.0K]  token_parser.py
│   ├── [3.8K]  topics.py
│   ├── [1.3K]  uast_ids_to_bag.py
│   ├── [3.2K]  uast.py
│   ├── [1.2K]  voccoocc.py
│   └── [1.0K]  vw_dataset.py
├── [4.0K]  ast2vec.egg-info
│   ├── [   1]  dependency_links.txt
│   ├── [  51]  entry_points.txt
│   ├── [ 974]  PKG-INFO
│   ├── [ 229]  requires.txt
│   ├── [1.0K]  SOURCES.txt
│   └── [   8]  top_level.txt
├── [ 24M]  bow_matplotlib.asdf
├── [124M]  decorr_readable.txt.gz
├── [4.0K]  dist
│   ├── [ 21K]  ast2vec-0.1.0a0.tar.gz
│   ├── [ 23K]  ast2vec-0.1.1a0.tar.gz
│   ├── [ 23K]  ast2vec-0.1.2a0.tar.gz
│   ├── [ 36K]  ast2vec-0.2.0a0.tar.gz
│   ├── [ 36K]  ast2vec-0.2.1a0.tar.gz
│   ├── [ 38K]  ast2vec-0.2.2a0.tar.gz
│   ├── [ 38K]  ast2vec-0.2.3a0.tar.gz
│   ├── [ 38K]  ast2vec-0.2.4a0.tar.gz
│   └── [ 38K]  ast2vec-0.2.5a0.tar.gz
├── [4.0K]  doc
│   ├── [1.9K]  ast2vec.rst
│   ├── [2.5K]  ast2vec.tests.rst
│   ├── [4.0K]  _build
│   │   ├── [4.0K]  doctrees
│   │   │   ├── [263K]  ast2vec.doctree
│   │   │   ├── [135K]  ast2vec.tests.doctree
│   │   │   ├── [3.5M]  environment.pickle
│   │   │   ├── [5.1K]  index.doctree
│   │   │   └── [2.5K]  modules.doctree
│   │   └── [4.0K]  html
│   │       ├── [ 87K]  ast2vec.html
│   │       ├── [ 52K]  ast2vec.tests.html
│   │       ├── [ 36K]  genindex.html
│   │       ├── [5.9K]  index.html
│   │       ├── [4.0K]  _modules
│   │       │   ├── [4.0K]  ast2vec
│   │       │   │   ├── [8.0K]  coocc.html
│   │       │   │   ├── [9.6K]  df.html
│   │       │   │   ├── [9.8K]  dump.html
│   │       │   │   ├── [ 14K]  enry.html
│   │       │   │   ├── [8.6K]  id2vec.html
│   │       │   │   ├── [ 57K]  id_embedding.html
│   │       │   │   ├── [8.2K]  meta.html
│   │       │   │   ├── [ 36K]  model.html
│   │       │   │   ├── [ 13K]  nbow.html
│   │       │   │   ├── [ 26K]  publish.html
│   │       │   │   ├── [ 63K]  repo2base.html
│   │       │   │   ├── [ 21K]  repo2coocc.html
│   │       │   │   ├── [ 18K]  repo2nbow.html
│   │       │   │   ├── [ 81K]  swivel.html
│   │       │   │   ├── [4.0K]  tests
│   │       │   │   │   ├── [6.5K]  fake_requests.html
│   │       │   │   │   ├── [8.7K]  test_coocc.html
│   │       │   │   │   ├── [10.0K]  test_df.html
│   │       │   │   │   ├── [ 24K]  test_dump.html
│   │       │   │   │   ├── [ 11K]  test_enry.html
│   │       │   │   │   ├── [8.5K]  test_id2vec.html
│   │       │   │   │   ├── [ 40K]  test_id_embedding.html
│   │       │   │   │   ├── [ 12K]  test_main.html
│   │       │   │   │   ├── [ 32K]  test_model.html
│   │       │   │   │   ├── [9.6K]  test_nbow.html
│   │       │   │   │   ├── [ 21K]  test_publish.html
│   │       │   │   │   ├── [ 15K]  test_repo2coocc.html
│   │       │   │   │   └── [ 13K]  test_repo2nbow.html
│   │       │   │   └── [5.1K]  tests.html
│   │       │   └── [4.6K]  index.html
│   │       ├── [5.0K]  modules.html
│   │       ├── [1.9K]  objects.inv
│   │       ├── [9.2K]  py-modindex.html
│   │       ├── [3.2K]  search.html
│   │       ├── [ 14K]  searchindex.js
│   │       ├── [4.0K]  _sources
│   │       │   ├── [1.9K]  ast2vec.rst.txt
│   │       │   ├── [2.5K]  ast2vec.tests.rst.txt
│   │       │   ├── [ 375]  index.rst.txt
│   │       │   └── [  58]  modules.rst.txt
│   │       └── [4.0K]  _static
│   │           ├── [ 673]  ajax-loader.gif
│   │           ├── [ 10K]  alabaster.css
│   │           ├── [ 10K]  basic.css
│   │           ├── [ 756]  comment-bright.png
│   │           ├── [ 829]  comment-close.png
│   │           ├── [ 641]  comment.png
│   │           ├── [  42]  custom.css
│   │           ├── [8.0K]  doctools.js
│   │           ├── [ 202]  down.png
│   │           ├── [ 222]  down-pressed.png
│   │           ├── [ 286]  file.png
│   │           ├── [258K]  jquery-3.1.0.js
│   │           ├── [ 84K]  jquery.js
│   │           ├── [  90]  minus.png
│   │           ├── [  90]  plus.png
│   │           ├── [4.1K]  pygments.css
│   │           ├── [ 25K]  searchtools.js
│   │           ├── [ 34K]  underscore-1.3.1.js
│   │           ├── [ 12K]  underscore.js
│   │           ├── [ 203]  up.png
│   │           ├── [ 214]  up-pressed.png
│   │           └── [ 25K]  websupport.js
│   ├── [5.0K]  conf.py
│   ├── [4.0K]  Doc
│   │   └── [3.2K]  how_to_use_ast2vec.ipynb
│   ├── [ 375]  index.rst
│   ├── [ 850]  Makefile
│   ├── [  58]  modules.rst
│   └── [4.0K]  _static
├── [ 23M]  docfreq_1MM_serial.asdf
├── [1.2M]  docfreq_matplotlib.asdf
├── [9.5M]  enry
├── [2.3K]  gcs.json
├── [ 18K]  gimme
├── [1.0G]  id2vec_1MM_serial.asdf
├── [  33]  index.json
├── [ 11K]  LICENSE
├── [354M]  nbow_1MM_serial.asdf
├── [4.0K]  README.md
├── [ 214]  requirements.txt
├── [1.9K]  setup.py
├── [4.0K]  source
├── [ 20M]  token_docfreq.tsv.gz
└── [5.4K]  topic_modeling.md

27 directories, 283 files
1,7G	total

/usr/bin/time -v enry reports:

	User time (seconds): 1.08
	System time (seconds): 2.06
	Percent of CPU this job got: 72%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.31
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 4385928
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 108
	Minor (reclaiming a frame) page faults: 898379
	Voluntary context switches: 3938
	Involuntary context switches: 487
	Swaps: 0
	File system inputs: 3411248
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

The RSS is over 4 gigs! Wow! And my system freezes a bit.

language detection difference between enry and linguist

Using linguist/samples as a set against run test the following issues were found:

  • with hello.ms we can't detect the language (Unix Assembly) because we don't have a matcher in contentMatchers (content.go) for Unix Assembly. Linguist use this regexp which we can't port.

  • all files for SQL language fall to the classifier because we don't parse right this disambiguator expresion for "*.sql" files. This expression doesn't comply with the pattern for the rest of heuristics.rb file.

Results are not sorted

$ cd /path/to/go/git
$ enry
0.69%	Shell
0.34%	Markdown
0.34%	Makefile
98.28%	Go
0.34%	Text

Compare with linguist:

$ linguist
99.74%  Go
0.18%   Shell
0.09%   Makefile

I propose to sort the results by significance.

Python bindings for enry

Same way as we have Java bindings for enry, wrapping a Go library built with -buildmode=c-shared it would be nice to have one for Python using ctypes FFI cffi or something similar.

Particular use case: one want to use https://github.com/bblfsh/sonar-checks/ API that would require knowing a language the file-to-be-checked is written in, to choose the write checks.

TODOs

  • Initial PoC: exposes 1-2 API (e.g \wo slices) #245
  • Minimal: expose only high-level language detection API #250
    (usable from Jupiter, after a manual build to enable #246)
  • Installable: expose all existing API, with the documentation and setup script
    pip install -e git+https://github.com/src-d/enry.git#egg=python
  • Publish: add new release profile to CI for
    • building .whl (linux, macOS)
    • publishing on pypi

Feature request: add -version flag

sourced@sourced-MacBookPro:/tmp/enry$ /home/sourced/Projects/ast2vec/enry -version
flag provided but not defined: -version
 /home/sourced/Projects/ast2vec/enry v1.4.0 build: 09-07-2017_08_55_33 commit: 0fe0a97f67, based on linguist commit: 37979b2
 /home/sourced/Projects/ast2vec/enry, A simple (and faster) implementation of github/linguist
 usage: /home/sourced/Projects/ast2vec/enry <path>
        /home/sourced/Projects/ast2vec/enry [-json] [-breakdown] <path>
        /home/sourced/Projects/ast2vec/enry [-json] [-breakdown]

flag provided but not defined: -version

Update: I thought that there was such flag defined (the version was printed) but I was wrong. Thus I propose to add -version to print the version.

Question: GetLanguageByContent

Hi,

I'd like to use your library to detect the programming language from user input. So I won't have a file extension. I only have a string with source code. Can I achieve my goal using this project?

Thanks.

Nikolay

fails to build: incorrect usage of internal package

go get gopkg.in/src-d/enry.v1/...
package gopkg.in/src-d/enry.v1/internal/code-generator
	imports gopkg.in/src-d/simple-linguist.v1/internal/code-generator/generator: use of internal package not allowed
go get gopkg.in/src-d/enry.v1/cli/enry
package gopkg.in/src-d/enry.v1/cli/enry: cannot find package "gopkg.in/src-d/enry.v1/cli/enry" in any of:
	/usr/lib/go-1.8/src/gopkg.in/src-d/enry.v1/cli/enry (from $GOROOT)
	/home/sourced/Projects/ast2vec/enry/src/gopkg.in/src-d/enry.v1/cli/enry (from $GOPATH)

This works after symlinking github.com/src-d/enry to gopkg.in/src-d/enry.v1

go get github.com/src-d/enry/cli/enry

CLI documented in README

This might already be planned but I was reviewing the PR for the blog post and realised that when coming to the GH repository there is no documentation on the CLI commands (could just be the help output) in the README. Would be great to add before we announce (similar to pygments, most people will use it initially as a CLI).

Fix Henry Higgins guessing abilities in the README.md

From the readme:

Why Enry?

In the movie My Fair Lady, Professor Henry Higgins is one of the main characters. Henry is a linguist and at the very beginning of the movie enjoys guessing the nationality of people based on their accent.

Enry Iggins is how Eliza Doolittle, pronounces the name of the Professor during the first half of the movie.

Is he really guessing nationality or more like neighborhood where the person was raised?

Higgins claims that his knowledge of ‘simple phonetics’ (a branch of linguistics concerned with the study of the nature, production, and perception of sounds of speech) allows him to deduce a person’s origins to within six miles. Within London, he says, he can place a man within two miles, ‘sometimes within two streets’. It’s this knowledge that he uses to transform Eliza Doolittle into a socially acceptable semblance of a ‘lady’. The character of Higgins is said to have been inspired by Henry Sweet (1845–1912), a great phonetician whose works, including his History of English

Source: http://blog.oxforddictionaries.com/2013/03/my-fair-lady/

Weird SIGSEGV

I ran the same source{d} engine pipeline on "100 repos" dataset (/data/siva on science-3, @bzz knows) nearly 50 times and it worked without errors. On 51st, I got the following stack trace:

unexpected fault address 0x0
fatal error: fault
panic: runtime error: slice bounds out of range

goroutine 17 [running, locked to thread]:
bytes.Count(0x7ff5ec077fe0, 0xa, 0x0, 0x1c42005cc20, 0x1, 0x20, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7ff5ec077fe0, 0xa, 0x0, 0x66, 0x6, 0x7ff5d828aed8)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x7ff5a2524a73, 0xc, 0x1c42001cde0)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x1c42005ce48, 0x1c420018500)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x1c420078098, 0x1c4200007e0)
	/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x0, 0x0)
	command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
panic: runtime error: slice bounds out of range

goroutine 53 [running, locked to thread]:
bytes.Count(0x7ff5e4080020, 0xa, 0x0, 0x1c42005fc20, 0x1, 0x20, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7ff5e4080020, 0xa, 0x0, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x7ff5a2524a73, 0xc, 0x1c4200206e0)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x1c42005fe48, 0x1c4200c4040)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x1c4200781d8, 0x1c420085680)
	/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x0, 0x0)
	command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x7ff5a23fbbb1]

goroutine 51 [running, locked to thread]:
runtime.throw(0x7ff5a24caf58, 0x5)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/panic.go:596 +0x97 fp=0x1c42005dc60 sp=0x1c42005dc40
runtime.sigpanic()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/signal_unix.go:297 +0x290 fp=0x1c42005dcb0 sp=0x1c42005dc60
path/filepath.Base(0x7ff600003dc0, 0x7ff694011000, 0x2, 0x7ff600052360)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/path/filepath/path.go:431 +0x31 fp=0x1c42005dcc0 sp=0x1c42005dcb0
gopkg.in/src-d/enry%2ev1.GetLanguagesByFilename(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:264 +0x3b fp=0x1c42005dcf8 sp=0x1c42005dcc0
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x7ff5a2524a73, 0xc, 0x1c42001e0e0)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129 fp=0x1c42005ddb0 sp=0x1c42005dcf8
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x1c42005de48, 0x1c4200c6040)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55 fp=0x1c42005de00 sp=0x1c42005ddb0
main.GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x1c420078138, 0x1c420084cc0)
	/home/travis/build/src-d/enry/shared/enry.go:11 +0x55 fp=0x1c42005de48 sp=0x1c42005de00
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x0, 0x0)
	command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a fp=0x1c42005de90 sp=0x1c42005de48
runtime.call64(0x0, 0x7ff6197ea008, 0x7ff6197ea0a0, 0x38)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:515 +0x4a fp=0x1c42005dee0 sp=0x1c42005de90
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:301 +0x1a1 fp=0x1c42005df58 sp=0x1c42005dee0
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86 fp=0x1c42005dfc0 sp=0x1c42005df58
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71 fp=0x1c42005dfe0 sp=0x1c42005dfc0
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1 fp=0x1c42005dfe8 sp=0x1c42005dfe0

goroutine 17 [running, locked to thread]:
	goroutine running on other thread; stack unavailable

goroutine 50 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078017, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005bf08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 52 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078117, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005ef08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 53 [running, locked to thread]:
	goroutine running on other thread; stack unavailable

goroutine 54 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078117, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c4207bbf08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 55 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078217, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c420058f08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 56 [syscall, locked to thread]:
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 57 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078217, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005af08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

I cannot reproduce this easily, it is very rare. Engine version is 0.1.7.

Bug: the return code is non-zero in case of warnings

At the moment, when enry encounters something bad, it poisons the return code.

For example:

enry/bin/enry /home/sourced/Projects/ast2vec
2017/06/16 09:53:51 read /home/sourced/Projects/ast2vec/enry/src/gopkg.in/src-d/enry.v1: is a directory
{
  "Go": [
    "enry/src/github.com/src-d/enry/alias.go",
    "enry/src/github.com/src-d/enry/classifier.go",
    "enry/src/github.com/src-d/enry/cli/enry/main.go",
    "enry/src/github.com/src-d/enry/common.go",
    "enry/src/github.com/src-d/enry/common_test.go",
    "enry/src/github.com/src-d/enry/content.go",
    "enry/src/github.com/src-d/enry/documentation.go",
    "enry/src/github.com/src-d/enry/extension.go",
    "enry/src/github.com/src-d/enry/filename.go",
    "enry/src/github.com/src-d/enry/frequencies.go",
    "enry/src/github.com/src-d/enry/generate.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/aliases.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/documentation.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/extensions.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/filenames.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/generator.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/generator_test.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/heuristics.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/interpreters.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/langinfo.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/samplesfreq.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/types.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/vendor.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/main.go",
    "enry/src/github.com/src-d/enry/internal/tokenizer/tokenize.go",
    "enry/src/github.com/src-d/enry/internal/tokenizer/tokenize_test.go",
    "enry/src/github.com/src-d/enry/interpreter.go",
    "enry/src/github.com/src-d/enry/modeline.go",
    "enry/src/github.com/src-d/enry/shebang.go",
    "enry/src/github.com/src-d/enry/type.go",
    "enry/src/github.com/src-d/enry/utils.go",
    "enry/src/github.com/src-d/enry/utils_test.go",
    "enry/src/github.com/src-d/enry/vendor.go",
    "enry/src/github.com/toqueteos/trie/example_test.go",
    "enry/src/github.com/toqueteos/trie/trie.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/bytes.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/bytes_test.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/lib.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/lib_test.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/string.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/string_test.go"
  ],
  "Makefile": [
    "enry/src/github.com/src-d/enry/Makefile"
  ],
  "Python": [
    "ast2vec/__init__.py",
    "ast2vec/__main__.py",
    "ast2vec/dataset.py",
    "ast2vec/df.py",
    "ast2vec/glove_to_shards.py",
    "ast2vec/id2vec.py",
    "ast2vec/id_embedding.py",
    "ast2vec/prep.py",
    "ast2vec/repo2base.py",
    "ast2vec/repo2coocc.py",
    "ast2vec/repo2nbow.py",
    "ast2vec/swivel.py"
  ],
  "Ruby": [
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/test_files/heuristics.test.rb"
  ],
  "Text": [
    "requirements.txt"
  ]
}

Return code is 2.

This is bad because it breaks any Python subprocess call as it checks the return code and raises an Exception if it is non-zero. Yet we've only got a non-critical warning.

I propose to return 0 in case of non-critical warnings. This is a standard convention of UNIX apps: if you survived, do not break the errcode.

Surprising programming languages

I decided to list all of the languages that appear in Public Git Archive and I got a couple of surprising results.

Some of the results:

  • desktop
  • Regular Expression
  • Raw token data
  • Public Key
  • HTTP
  • NumPy

Any of these results above are technically languages, some are protocol, some are libraries, some are ... something else completely!

Should we add a clarification regarding these? I think it'd be interesting to have it for Public Git Archive, currently I get over 455 languages, but I suspect some of them are not technically languages.

The results differ from linguist much

sourced@sourced-MacBookPro:/tmp$ git clone https://github.com/src-d/enry &>/dev/null
sourced@sourced-MacBookPro:/tmp$ cd enry
sourced@sourced-MacBookPro:/tmp/enry$ time linguist
99.25%  Go
0.35%   Shell
0.24%   Java
0.08%   Ruby
0.05%   Makefile
0.01%   Scala
0.01%   Gnuplot

real    0m1.945s
user    0m1.848s
sys    0m0.056s
sourced@sourced-MacBookPro:/tmp/enry$ time /home/sourced/Projects/ast2vec/enry 
3.28%    Makefile
63.93%    Go
9.84%    CSV
6.56%    Shell
1.64%    Gnuplot
1.64%    Text
3.28%    Ruby
3.28%    Scala
6.56%    Java

real    0m0.084s
user    0m0.072s
sys    0m0.008s

Severe performance degradation

Something happened and now enry works even slower than linguist. Seriously, testing on src-d/ast2vec:

time ./enry .
{
  "Python": [
    "ast2vec/__init__.py",
    "ast2vec/__main__.py",
    "ast2vec/df.py",
    "ast2vec/dump.py",
    "ast2vec/enry.py",
    "ast2vec/id2vec.py",
    "ast2vec/id_embedding.py",
    "ast2vec/meta.py",
    "ast2vec/model.py",
    "ast2vec/nbow.py",
    "ast2vec/publish.py",
    "ast2vec/repo2base.py",
    "ast2vec/repo2coocc.py",
    "ast2vec/repo2nbow.py",
    "ast2vec/swivel.py",
    "ast2vec/tests/__init__.py",
    "ast2vec/tests/models.py",
    "ast2vec/tests/test_dump.py",
    "ast2vec/tests/test_enry.py"
  ],
  "Text": [
    "requirements.txt"
  ]
}

real	0m14.339s
user	0m10.740s
sys	0m1.972s
time linguist
100.00% Python

real	0m2.192s
user	0m2.016s
sys	0m0.104s

cli usage message is wrong

Flags must be provided before the path

$ enry -h
enry, A simple (and faster) implementation of github/linguist 
usage: enry <path>
       enry <path> [-json] [-breakdown]
       enry [-json] [-breakdown]
$ enry internal 
100.00%	Go
$ enry internal -json
100.00%	Go
$ enry -json internal
{"Go":["code-generator/generator/aliases.go","code-generator/generator/documentation.go","code-generator/generator/extensions.go","code-generator/generator/filenames.go","code-generator/generator/generator.go","code-generator/generator/generator_test.go","code-generator/generator/heuristics.go","code-generator/generator/interpreters.go","code-generator/generator/langinfo.go","code-generator/generator/linguist-commit.go","code-generator/generator/samplesfreq.go","code-generator/generator/types.go","code-generator/generator/vendor.go","code-generator/main.go","tokenizer/tokenize.go","tokenizer/tokenize_test.go"]}%

CLI output for a single file is not JSON encoded

When run with -json on a single file - output is not a JSON

$ go get gopkg.in/src-d/enry.v1/...
$ enry -json ../../github.com/src-d/lookout/vendor/golang.org/x/crypto/cast5/cast5.go
cast5.go: 526 lines (493 sloc)
  type:      Text
  mime_type: text/x-go
  language:  Go

Same works for a dir:

enry -json ../../github.com/src-d/lookout/vendor/golang.org/x/crypto/cast5
{"Go":["cast5.go"]}

GetLanguageByContent does nothing

In GetLanguageByContent (https://github.com/src-d/enry/blob/master/common.go#L88) we can see the passed filename is "", but in GetLanguagesByContent (https://github.com/src-d/enry/blob/master/common.go#L382), which is called by the aforementioned function, returns always nil if the extension is not matched, which always happens because GetLanguagesByContent is explicitly passing an empty string.

We should either fix this or remove this exported function, because it does nothing.

enry core

I use Java bindings (maven version 1.6.2).

Here is the log:

panic: runtime error: slice bounds out of range
goroutine 26 [running, locked to thread]:
bytes.Count(0x7fc5dc00e3c0, 0x47, 0x0, 0x1c4206fbc20, 0x1, 0x20, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7fc5dc00e3c0, 0x47, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:166 +0xae
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x7fc5861f7c08, 0x0, 0x0, 0x0, 0x0
, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x7fc585723aae, 0xc, 0x1c420024de0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x1c4206fbe48, 0x1c4209cc040)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x1c42012e1d8, 0x1c420499340)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
Aborted (core dumped)

Sync with github/linguist

We synchronized with github/linguist in November 2017, an update is long overdue ;)


Latest enry v1.6.7 from Oct 24, 2018 is based on Linguist v5.2.0 commit 4cd558 from Sep 17, 2017.

This is an ☂️ issue with the goal to make enry use of at least at least Linguist v7.1.3 from Dec 12, 2018:

Slice out of range error

Calling to GetLanguage method, sometimes we receive this error:

panic: runtime error: slice bounds out of range
goroutine 17 [running, locked to thread]:
bytes.Count(0x7f627818c0d0, 0x4b, 0x0, 0x1c420038c20, 0x1, 0x20, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x7f62482e6c08, 0x0, 0x0, 0x0, 0x0, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x1c420038e48, 0x1c42001a500)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.