seq-lang / seq Goto Github PK
View Code? Open in Web Editor NEWA high-performance, Pythonic language for bioinformatics
Home Page: https://seq-lang.org
License: Apache License 2.0
A high-performance, Pythonic language for bioinformatics
Home Page: https://seq-lang.org
License: Apache License 2.0
Nested try-except is currently not supported. This issue is for adding nested try-except support.
Is eg.: Set([ ... ])
or [ ... ] |> some_collection_lib.unique
available?
Fix documentation:
Two approaches:
self.foo
as in Python and then deduce types/members during the compile stage (might be easier?)Idea: try (2) and see will it work
Depends on:
Support cases such as:
a = []
...
a.append(1)
or
a = None
...
if foo:
a = X()
...
return a
Seq:
print len('\x00') # 0
Python:
print len('\x00') # 1
Support Pythonic built-ins:
k[3]
, k[:4]
)Incorporate LLVM debug info in the generated IR to allow interoperability with tools like GDB/LLDB.
Implement JIT.
with
can be transformed into try
...finally
at the compiler level:
with <expr> as v:
<block>
can be converted to:
# new scope so `v` is not accessible outside `with` body
v = <expr>
v.__enter__()
try:
<block>
finally:
v.__exit__()
with
s with multiple clauses can be converted to single-clause nested with
s:
with <e1> as v1, <e2> as v2, ..., <eN> as vN:
<block>
becomes
with <e1> as v1:
with <e2> as v2:
...
with <eN> as vN:
<block>
before applying the preceding transformation.
What needs to be done to run HapTreeX with Seq
Compiler tasks:
Manual interventions:
{}, set()
etc)Parallelism can perhaps be tied to branching. One possible syntax is to have a special parallel pipe operator like |>(42)
which would use 42 threads to process the previous stage's output.
The following code:
a, b = 0, 0
x, y, z = 0, 0, 0
Errors with "tuple index 2 out of bounds (len: 2)" on line 2. Commenting either line avoids the error, so assuming this is a parsing issue.
Support named arguments:
(work being on ocaml-generics branch).
It would be helpful to have a complete list of all standard Python modules and a guide as to which ones are supported, in progress, going to be supported, or not going to be supported ever. For example, is the threading module ever going to be supported? Will threading be handled with Seq's pipelines only? Will the multiprocessing module be supported?
Having this list will encourage people to help your efforts if it looks like most of what they need is going to eventually be part of Seq, and will also help them decide whether Seq will never be what they need.
Typical pattern matching from e.g. OCaml or Rust should be fairly easy to implement.
More interesting is pattern matching for sequence types. Many sequence computations can potentially be expressed with this. For example, a simple hash function:
fun hash(s: Seq) -> Int:
s match A t => 0 + 4*hash(t)
| C t => 1 + 4*hash(t)
| G t => 2 + 4*hash(t)
| T t => 3 + 4*hash(t)
| _ => 0
We should provide .messages
file to Menhir or switch to incremental API to provide more informative error messages during the parsing (currently any malformed grammar yields just Menhir error
).
This case:
x = 0
def f():
x += 5 # should error here
Currently no error is reported.
Fix __argv__
(i.e. link that symbol to module->getArgVar()
; accessible with get_module_arg
in ocaml.cpp)
Multiple for-loop iteration variables: for a,b in c: ...
(same for comprehensions). Can be converted in parser to:
for _x in c:
a = _x[0]
b = _x[1]
...
Explicitly realizing generic methods (on master these should be MethodExpr
objects -- similar to GetElemExpr
but with type parameters; see method_expr
in ocaml.cpp) :
class A():
def foo[`s](s: `s):
pass
A.foo[int](10) # error: cannot find method '__getitem__' for type 'function[void,`s]' with specified argument types (int)
A().foo[int](10) # similar error
global
variables (should be as simple as calling ->setGlobal()
on corresponding Var*
)
assert
(not on master?)
Generators (done on LLVM side; see gen_expr
in ocaml.cpp)
Lambdas (I suggest we disallow outer variable references in lambdas for now, as they just complicate things)
Exceptions (need LLVM-side support)
realize_type
and realize_func
throw uncaught exceptions (low priority)
Allow referencing generics in return type of a function. Example:
def none[`t]() -> `t: # right now the `t return type fails -- symbol not found
return None
Allow implicit generator argument. Example:
print sum(i*i for i in range(10)) # fails: parsing error
print sum((i*i for i in range(10))) # works
Improve support for None
s and optional types.
Support
def __mul__(x: Secure[`t], y): # Errors as y is generic
"""
Protocol 5: MultiplyPublic (p.8)
"""
return Secure[`t](x.sh * y)
def __mul__(a: Secure[`t], b: Secure[`t]): # Only call this if b is Secure
"""
Protocol 9: EvaluatePolynomial (p.12)
restricted to f := xy
"""
ar, am = __mpc_env__.beaver_partition(a.sh)
br, bm = __mpc_env__.beaver_partition(b.sh)
c = __mpc_env__.beaver_mult(ar, am, br, bm)
return Secure[`t](__mpc_env__.beaver_reconstruct(c))
These are generics corner-cases that seem to be hard to deal with, and currently break the generics system. Will update this as I find more bad cases.
EDIT: This is fixed in the latest commit.
This is a case where we end up with T -> S -> Float
and Q -> Int
. This breaks somewhere along the line, and we end up with the error "generic type 'S' not yet realized"
. Have not been able to simplify this case any further while preserving the error.
class B[T](b of T):
def bar[Q](self of B[T]):
pass
def foo[S](s of S):
b = B[S](s)
b.bar[Int]()
foo[Float](3.14)
Provide proper Pythonic import syntax:
import x
from x import y
from x import *
Remaining items (higher priority is at the top):
[hard]
Better error reporting (i.e. no more "Menhir error")import from
, import
scoping[hard]
Extern support for Python and R[hard]
macros via sexpsNone
for pointersis
/is not
for pointers (e.g. if self.root is None
)>==
and maybe <|>
for collecting?)[hard?]
ADTs OR exceptions (i.e. how to indicate that an item is not found?)a, b = f(x)
t = (1, 2); f(*t)
(star-unpacking)f[1, 2]
[1:2:3]
, reverse [::-1]
[1,2], [:,3:4]
global
statementassert
statementPossible bioinformatics features:
k
-mer would be represented by a <k x i2>
or something similar.Provide full support for closures:
Shouldn't this code work? I thought that if a function was called with different arg types, the compiler would generate two different functions, one for each arg type. Instead, it looks like the compiler is complaining because it is inferring the type of x to be both int and str.
[root@hbseq ~]# cat argtype2.py
def fn(x):
if x == 'abc':
print 'str'
elif x == 1:
print 'int'
else:
print 'other'
fn('abc')
fn(1)
fn([1,2])
[root@hbseq ~]# python2 argtype2.py
str
int
other
[root@hbseq ~]# ./s argtype2
+ seqc -o argtype2.bc argtype2.seq
seqc: /lib64/libtinfo.so.5: no version information available (required by /root/.seq/libseq.so)
argtype2.py:4:9: error: unsupported operand type(s) for ==: 'str' and 'int'
Only one character is needed to reproduce this:
'
Output:
Uncaught exception:
(Failure "lexing: empty token")
Raised by primitive operation at file "lexing.ml", line 65, characters 15-37
Called from file "seqaml/lexer.ml", line 1298, characters 8-65
Called from file "grammar.ml", line 24008, characters 15-27
Called from file "grammar.ml", line 22305, characters 22-49
Called from file "seqaml/codegen.ml", line 17, characters 14-56
Called from file "runner.ml", line 15, characters 12-77
Called from file "runner.ml", line 32, characters 4-28
Called from file "runner.ml", line 61, characters 19-43
I am trying to download/install the prebuilt binaries for seq, but the install.sh
script does not seem to do the job. I am running the command wget -O - https://raw.githubusercontent.com/seq-lang/seq/master/install.sh | bash
as found in the install documentation.
On CentOS 7.6.180:
$> wget -O - https://raw.githubusercontent.com/seq-lang/seq/master/install.sh | bash
--2020-01-10 11:30:54-- https://github.com/seq-lang/seq/releases/latest/download/seq-linux-x86_64.tar.gz
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/seq-lang/seq/releases/download/v0.9.2/seq-linux-x86_64.tar.gz [following]
--2020-01-10 11:30:55-- https://github.com/seq-lang/seq/releases/download/v0.9.2/seq-linux-x86_64.tar.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/118039967/d6bed000-2060-11ea-8c79-7fd1a81d11ec?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200110%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200110T163055Z&X-Amz-Expires=300&X-Amz-Signature=0c835357968e918d639ad5887954c1cb7d52d8bd1276464bcf01576d622c8427&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dseq-linux-x86_64.tar.gz&response-content-type=application%2Foctet-stream [following]
--2020-01-10 11:30:55-- https://github-production-release-asset-2e65be.s3.amazonaws.com/118039967/d6bed000-2060-11ea-8c79-7fd1a81d11ec?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200110%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200110T163055Z&X-Amz-Expires=300&X-Amz-Signature=0c835357968e918d639ad5887954c1cb7d52d8bd1276464bcf01576d622c8427&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dseq-linux-x86_64.tar.gz&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.141.164
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.141.164|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 44849156 (43M) [application/octet-stream]
Saving to: 'STDOUT'
100%[======================================================================================================================================================================>] 44,849,156 12.3MB/s in 3.7s
2020-01-10 11:30:59 (11.7 MB/s) - written to stdout [44849156/44849156]
--2020-01-10 11:30:59-- https://github.com/seq-lang/seq/releases/latest/download/seq-stdlib.tar.gz
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/seq-lang/seq/releases/download/v0.9.2/seq-stdlib.tar.gz [following]
--2020-01-10 11:30:59-- https://github.com/seq-lang/seq/releases/download/v0.9.2/seq-stdlib.tar.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/118039967/d7effd00-2060-11ea-9a3e-34d1163a9de2?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200110%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200110T163059Z&X-Amz-Expires=300&X-Amz-Signature=817015814cf6b79955ad0caccebbe1c8ec74a32b29f2f32fadec17bdb7fa38da&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dseq-stdlib.tar.gz&response-content-type=application%2Foctet-stream [following]
--2020-01-10 11:30:59-- https://github-production-release-asset-2e65be.s3.amazonaws.com/118039967/d7effd00-2060-11ea-9a3e-34d1163a9de2?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200110%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200110T163059Z&X-Amz-Expires=300&X-Amz-Signature=817015814cf6b79955ad0caccebbe1c8ec74a32b29f2f32fadec17bdb7fa38da&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dseq-stdlib.tar.gz&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.141.204
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.141.204|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 59956 (59K) [application/octet-stream]
Saving to: 'STDOUT'
100%[======================================================================================================================================================================>] 59,956 --.-K/s in 0.03s
2020-01-10 11:30:59 (1.65 MB/s) - written to stdout [59956/59956]
Seq installed at: /workdir/err87/.seq
Make sure to add the following lines to ~/.bash_profile:
export PATH="/workdir/err87/.seq:$PATH"
export SEQ_PATH="/workdir/err87/.seq/stdlib"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/workdir/err87/.seq"
After adding the last three lines above to ~/.bash_profile
and starting a new shell:
$> seqc -h
seqc: error while loading shared libraries: libomp.so.5: cannot open shared object file: No such file or directory
I can see that the symlink at /workdir/err87/.seq/libomp.so.5
is broken, as there is nothing in /usr/lib64
matching the prefix /usr/lib64/libomp.so*
, so I'm guessing this is just a missing dependency. However, I don't have root privileges on this server, (university HPC) so I am not able to manually install the library.
I tried the same install command on two other machines which I DO have root privileges on, one running macOS and the other Ubuntu, but encountered similar problems.
On macOS (10.14.6)
output of install command is except LD_LIBRARY_PATH is DYLD_LIBRARY_PATH
$> seqc -h
dyld: Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib
Referenced from: /Users/err87-admin/.seq/seqc
Reason: image not found
Abort trap: 6
On Ubuntu (18.04.3):
The install command also complains:
find: ‘/usr/lib64/’: No such file or directory
find: ‘/usr/lib64/’: No such file or directory
But seqc
returns the same error as on CentOS:
$> seqc -h
seqc: error while loading shared libraries: libomp.so.5: cannot open shared object file: No such file or directory
I'm guessing the fix is to install libomp, but the documentation doesn't mention this as necessary when using the pre-built binaries.
Looking forward to trying out the language!
A file test.seq
containing
import test
will fail to parse with a stack overflow OCaml exception. We should catch these cases and handle them gracefully.
Support basic ADTs, e.g.
type Foo = A | B(foo: int) | C(bar: str, i: int) | D
y = Foo.A
match y:
case B(foo): ...
Creating Int
, UInt
or Kmer
types with an out-of-bounds width/length causes an uncaught C++ exception. This exception should be caught on the OCaml side. Example:
Int[-1](0)
This fails miserably:
N = 5
class Generator:
def __init__(self: Generator):
self.state = array[u64](N)
self.state_size = 0
self.initf = 0
self.next = 0
g = Generator()
Error:
Assertion failed: (type), function getType, file /Users/inumanag/Desktop/Projekti/seq/test/compiler/lang/var.cpp, line 159.
Ultimately we have to start writing real programs and benchmarking them, both in terms of runtime/memory and code usage.
Potential applications:
We don't necessarily need to reimplement the entire application, just some key kernels of it. Using Seq should lead to some performance boost in each of these cases.
Another interesting avenue (perhaps down the road) is to look at applications using GPUs. LLVM supports an NVPTX back-end that makes targeting the GPU relatively easy. Potential applications to look at would be e.g. MEGAHIT or BarraCUDA.
_PyTime_GetMonotonicClock
We require C interoperability, which should be easy to implement using LLVM. Python and R interoperability will likely be harder to implement, but are also essential.
One possibility for language-level syntax:
extern c fun my_c_function(n: Int, s: Str) -> Int # defines external C function
extern py fun my_py_function(n: Int, s: Str) -> Int # defines external Python function
extern r fun my_r_function(n: Int, s: Str) -> Int # defines external R function
Just like standard PATH
: e.g. /some/path:/some/other/path
, where both given paths would be searched.
++1
, --1
and ~~1
fail to parse. These work as expected in Python.
Add htslib bindings to Seq for SAM/CRAM/VCF I/O.
I ran a quick 10M {int:int} dict benchmark with Python 2.7 and Seq and was quite impressed. I'ts posted on your Hacker News announcement. The Python version used 1.1 GB and 8 seconds, Seq used 395 MB and 5 seconds. I didn't add any type info, just changed one line that created an empty dict. Congratulations!
I did some poking around and it looks like supporting the Python stdlib requires porting all the C code to Seq. I also saw that Seq can import regular Python code with pyimport. What I was wondering is, is it possible or would it make sense to allow importing Cython C extensions by providing an interface spec file, like os.pyi for example. This spec file would document the types, classes, return values, etc. and then allow a Seq program to use the standard Python built-ins without a rewrite.
There would have to be a wrapper generated for each function called, or even multiple wrappers if different types are used. The wrapper would have to create CPython objects for each argument, take the GIL, call the C extension, and then convert any returned values from CPython objects to Seq objects. The interface spec file would have to document not only the types, but which arguments might be modified. For example, passing a list to a C extension might cause the list to be modified (sort for example), or the list might not be modified, eg, len(list).
Hi Jordan,
let's start with the string library:
https://docs.python.org/3/library/stdtypes.html#textseq
You can avoid string.format part for now.
Some functions (like str.split
) are implemented in multiple places: for example, split on a single character is different than a split that operates on multi-character patterns.
Also, for each stdlib file, add the docs and implement a test suite as follows. For str.seq
, add test_str.seq
and there test each function, e.g.:
str.seq:
class str:
def isspace(self: str) -> bool:
"""
Doc
"""
...
test_str.seq:
def test_isspace():
assert ' '.isspace() == True
assert 'x'.isspace() == False
# ... etc
Please check the function once done:
This works:
'''
'a'
'''
This fails:
'''
"a"
'''
with error
Uncaught exception:
(Scanf.Scan_failure
"scanf: bad input at char number 3: end of input not found")
Raised at file "scanf.ml" (inlined), line 444, characters 18-40
Called from file "scanf.ml", line 1164, characters 4-75
Called from file "seqaml/lexer.mll", line 52, characters 25-33
Called from file "grammar.ml", line 23209, characters 15-27
Called from file "grammar.ml", line 23236, characters 24-51
Called from file "seqaml/codegen.ml", line 17, characters 14-56
Called from file "runner.ml", line 15, characters 12-77
Called from file "runner.ml", line 32, characters 4-28
Called from file "runner.ml", line 61, characters 19-43
Support tuple unpacking and indexing:
a, b = f(x)
t = (1, 2); f(*t)
(star-unpacking)f[1, 2]
f[:,1:2]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.