Code Monkey home page Code Monkey logo

hs-conllu's People

Contributors

arademaker avatar k-bx avatar odanoburu avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hs-conllu's Issues

feature values can be lists

plus they can have the weird bracket-thing, which I don't know for what use it is.

8	قوللىرى	قول	NOUN	N	Case=Nom|Number=Plur|Number[psor]=Plur,Sing|Person[psor]=3	10	nsubj	_	Translit=qolliri

from the Uyghur UD test set.

[source]

suggestion: two low level functions for parsing and printing

For users of your parser who do not use it standalone, but integrate it with other code, two simpler functions would be beneficial:

parseConllu :: P.Parser [T.Sentence] -> Text -> ErrOrVal [T.Sentence] 
-- | parse a text (no IO) to sentences
parseConllu parser text = 
    case r of 
            Left err -> Left (s2t $ M.parseErrorPretty err)
            Right ss -> Right   ss
    where 
        r = M.parse parser "" (t2s text)   -- why a textname required ?? 

prettyPrintConlluSentence ::T.Sentence -> Text 
-- | prettyprint a single sentence
prettyPrintConlluSentence  = s2t . Pr.fromDiffList . Pr.printSent 

where ErrOrVal is Either Text and s2t is a conversion is pack from Data.Text.

In case you wonder - I use your package to parse output from coreNLP (udfeats) and stick them into a triplestore. Thank you for your effort!

Full validation feature

Is there a plan to incorporate the same level of validation performed by the Universal Dependency tool, validate.py, found here https://github.com/UniversalDependencies/tools? Also Is there a plan to do a performance comparison between the official Universal Dependency validation python script and hs-conllu?

The universal dependency organization's validation software provides 5 levels of validation. Because it is written in python, I suspect it would be slower than written in Haskell or C++.

parse with recovery

so that we can accumulate all errors and report them, instead of getting them one at a time.

error in dependencies

% cabal install hs-conllu
Resolving dependencies...
cabal: Could not resolve dependencies:
[__0] trying: hs-conllu-0.1.2 (user goal)
[__1] next goal: megaparsec (dependency of hs-conllu)
[__1] rejecting: megaparsec-9.0.1 (conflict: hs-conllu => megaparsec>=6 && <7)
[__1] skipping: megaparsec-9.0.0, megaparsec-8.0.0, megaparsec-7.0.5,
megaparsec-7.0.4, megaparsec-7.0.3, megaparsec-7.0.2, megaparsec-7.0.1,
megaparsec-7.0.0 (has the same characteristics that caused the previous
version to fail: excluded by constraint '>=6 && <7' from 'hs-conllu')
[__1] trying: megaparsec-6.5.0
[__2] next goal: base (dependency of hs-conllu)
[__2] rejecting: base-4.13.0.0/installed-4.13.0.0 (conflict: megaparsec =>
base>=4.7 && <4.13)
[__2] skipping: base-4.14.1.0, base-4.14.0.0, base-4.13.0.0 (has the same
characteristics that caused the previous version to fail: excluded by
constraint '>=4.7 && <4.13' from 'megaparsec')
[__2] rejecting: base-4.12.0.0, base-4.11.1.0, base-4.11.0.0, base-4.10.1.0,
base-4.10.0.0, base-4.9.1.0, base-4.9.0.0, base-4.8.2.0, base-4.8.1.0,
base-4.8.0.0, base-4.7.0.2, base-4.7.0.1, base-4.7.0.0, base-4.6.0.1,
base-4.6.0.0, base-4.5.1.0, base-4.5.0.0, base-4.4.1.0, base-4.4.0.0,
base-4.3.1.0, base-4.3.0.0, base-4.2.0.2, base-4.2.0.1, base-4.2.0.0,
base-4.1.0.0, base-4.0.0.0, base-3.0.3.2, base-3.0.3.1 (constraint from
non-upgradeable package requires installed instance)
[__2] fail (backjumping, conflict set: base, hs-conllu, megaparsec)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: base, megaparsec, hs-conllu

printSent produces unnecessary newline

The function`` printSentseems to add a newline at the beginning of a sentence (and one at the end as well) which causes theprint . read` not to be identy. perhaps you remove the line which is commented out:

printSent :: Sentence -> DiffList Char
printSent ss =
  mconcat
    [ printComments (_meta ss)
--    , diffLSpace   -- causes an extra space initially  
    , printTks (_tokens ss)
    , diffLSpace
    ]

Plans for the library

I wrote the text below as an open reply to @arademaker for our conversation on #32 about plans for the library.


I'd like to change the structure of the library a bit: first have a really dumb parser that would accept anything remotely matching the conllu format, then do light validation on top of it according to user specification. This would mean not to hardcode deprels and other stuff, but read a file that lists the acceptable entities (these files already exist for the canonical validating script, but the user could tweak them if they wanted to).

I also think that the megaparsec library might be unnecessary since the conllu format is so simple, but its performance is not bad and the error-reporting facilities are great (are we using them as well as we could?), so maybe I'd leave that be. If there's a performance need, then we might think about it.

I don't think it's worth it to implement full conllu validation, for the reasons I said on #34.

At some point I had plans for a query interface like the one in http://match.grew.fr (see master...query), but honestly I don't think it's worth implementing it since just loading the data on a graph database would give better-performing queries and facilities for visualization for free :)

Finally, I started writing this library a long time ago when I first started learning Haskell, so I would also change the code quite a bit to reflect some of what I learned since then.

missing UD : dobj

I used the parser on a small example and found that dobj is missing in the Dep enumerated type.
It might be better to move the tagset into separate modules to be imported qualified (to avoid construction like AUXpos and similar). and to publish the tagset independenly in a separate package for all to use and to improve (I have constructed one which I hope is more complete adn may submit a PR).

error in the cabal installation

Resolving dependencies...
cabal: Could not resolve dependencies:
trying: Hs-conllu-0.0.1 (user goal)
next goal: base (dependency of Hs-conllu-0.0.1)
rejecting: base-4.8.1.0/installed-075... (conflict: Hs-conllu => base>=4.9 &&
<5)
rejecting: base-4.10.0.0, 4.9.1.0, 4.9.0.0, 4.8.2.0, 4.8.1.0, 4.8.0.0,
4.7.0.2, 4.7.0.1, 4.7.0.0, 4.6.0.1, 4.6.0.0, 4.5.1.0, 4.5.0.0, 4.4.1.0,
4.4.0.0, 4.3.1.0, 4.3.0.0, 4.2.0.2, 4.2.0.1, 4.2.0.0, 4.1.0.0, 4.0.0.0,
3.0.3.2, 3.0.3.1 (global constraint requires installed instance)
Dependency tree exhaustively searched.```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.