stemmatology's People
Forkers
shadowcat-mststemmatology's Issues
make_tradition.pl script shouldn't create empty traditions
The make_tradition script flags an error if the given input type doesn't exist, but a tradition is still created (and put in the database, if applicable.) This may also happen when an error is thrown. Either way the empty tradition should not be created.
TEI parser can break if run twice in the same Perl invocation
If the TEI parser has to parse a file with namespaces more than once (or, presumably, a file without namespaces after a file with), it will break with an error like this:
XPath error : Invalid expression
//tei:tei:listWit/tei:tei:witness
^ at /opt/local/lib/perl5/site_perl/5.16.1/Text/Tradition/Parser/TEI.pm line 128.
Overhaul CTE to cope with intermixed witStart/witEnd and multiply-specified witness readings
Text direction should be configurable
so that graph export for RTL texts goes from R to L.
Finish and test method for (re-)rooting stemma graph
Integration with a phylogeny or other stemma generation package means that we will sometimes have unrooted stemmas. We need to give the user some way to assign a root (archetype) to an unrooted stemma.
Need to deal with Moose exceptions being objects
The exception raising mechanism in Text::Tradition::Error tends to assume that whatever error it is asked to raise is a string. Moose has turned all of its exceptions into objects, which causes an exception if we try to treat it as a string when throwing an exception (confused yet?) We need to add a check for this.
Collation merge_readings can be unpredictable if propagation hasn't happened
Depending on the order returned by $collation->readings(), if you try to merge/collapse readings by relationship type and the relationship type has not been applied transitively (i.e. propagated) then the merge may or may not be complete. We need to do this recursively to be sure.
TEI parallel segmentation parsing should recognise 'lem' tags
A <lem>
tag in a TEI_PS file should result in readings with is_lemma
set to true.
Need ability to read & save undirected stemma graphs
The Stemma object needs to be able correctly to parse a dot file with an undirected graph description, as generated from a Newick specification.
Failing Collation.pm test on Text::Tradition
t/text_tradition_collation.t .................... 9/?
# Failed test 'Reading r7.5 correctly removed'
# at t/text_tradition_collation.t line 89.
# got: ''
# expected: '1'
# Failed test 'Reading r7.6 correctly retained'
# at t/text_tradition_collation.t line 89.
# got: '1'
# expected: ''
t/text_tradition_collation.t .................... 159/? # Looks like you failed 2 tests of 171.
t/text_tradition_collation.t .................... Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/171 subtests
Support TEI double-endpoint-attachment output
This needs to conform to what Classical Text Editor expects.
Add facility to parse Stemweb results into one or more stemmata for a tradition
Need to implement the conversion of a Stemweb calculation into one or more Text::Tradition::Stemma objects tied to a particular tradition, as described in the API given here:
http://treeoftexts.arts.kuleuven.be/?p=58
CTE parser not recognizing witStart and witEnd tags
A use case was submitted that makes use of the apparatus codicum, so that witStart tags and witEnd tags are present in the XML. These need to be parsed correctly.
Write tests for CSV and TSV export
In fixing #7 I found that the CSV export was breaking on a spurious decode_utf8() call. That should have been in a test.
Need to know what relationships disappear when reading is duplicated.
The call to duplicate_readings can cause certain relationships to no longer be valid, and it will remove them. Right now this is done silently, but for UI purposes the relationship removal needs to be propagated.
Add facility to export tab-separated "CSV"
The CSV export should allow tabs as well as commas for separation purposes.
UTF-8 bug in mysql storage
Need finally to trace and zap the UTF-8 encoding bug in tradition names in the MySQL tables.
Perl 5.18 doesn't like open file handles on (char) strings
Tests start failing when we open a file handle to read from a UTF-8 character string. We need to stop doing that, by converting them to byte strings first.
Restriction on merge_readings throws up bug in equivalence graph
It turns out that, when a check is made to prevent merge of readings that shouldn't be merged (i.e. as in 27e161b), a tight cycle is found in the equivalence graph. This breaks things.
Failing POD test on Text::Tradition
# Failed test 'POD test for blib/lib/Text/Tradition/Collation.pm'
# at /opt/perl-5.21.5/lib/site_perl/5.21.5/Test/Pod.pm line 186.
# blib/lib/Text/Tradition/Collation.pm (653): Non-ASCII character seen before =encoding in ''Mü11475''. Assuming UTF-8
# Looks like you failed 1 test of 18.
t/02pod.t .......................................
Dubious, test returned 1 (wstat 256, 0x100)`
Failing POD coverage test on Text::Tradition
not ok 2 - Pod coverage on Text::Tradition::Parser::CTE
# Failed test 'Pod coverage on Text::Tradition::Parser::CTE'
# at t/03podcoverage.t line 29.
# Coverage for Text::Tradition::Parser::CTE is 50.0%, with 1 naked subroutine:
# do_warn
Collect IDP utility functions into a proper library
All the IDP scripts need to be refactored around a central library.
JSON parsing gets the ranks wrong
Looks like a straightforward off-by-one error.
Make decent workaround for Graph::Reader::Dot
The Graph::Reader::Dot module fails to parse name tokens with characters outside the ASCII \w range unless the tokens are wrapped in double-quotes. Various hacks exist but this needs a real workaround.
Add logic concerning a reading's normal form when there is a lemma set
The term 'lemma' is annoyingly overloaded, but...
When a reading is chosen as a lemma, its spelling and orthographic (and arguably punctuation) variants should have the same normal form as the lemma.
Make a test file for CTE parsing
The CTE parser pretty badly needs some tests added to it...
CollateX JSON output format has changed
...and our JSON parser ought to reflect this.
Make IDP solver URL configurable
Native XML export should include witness information
At the moment, any witness information is thrown away. This is unfortunate.
Need to be able to restore traditions with active Stemweb job IDs.
JSON parser doesn't work with a.c. witnesses
...because ' (a.c.)' fails XML Name validation.
CSV/TSV formats need option to exclude a.c. wits
At the moment it makes little sense to include a.c. witnesses in stemmatological reckoning. So when generating the collation table for those, we should exclude them. Related to tla/stemmaweb#29
compress_readings fails when there are readings with join_prior or join_next
The compress_readings method is failing when it shouldn't be. This is because of a naive means of assembling the original text in the sanity checking code.
Analysis: Transposition symmetry check evidently relying on arbitrary order of witnesses
It seems that when we check for transposition symmetry in the Analysis module, we've been comparing stringified versions of witness sets without sorting them first. Not sure how this ever reliably worked.
Consider making CTE parser cope with 'post X transp.' notation in the apparatus.
It is very common for scholars using CTE to note a transposition via 'post X transp.' in the apparatus. This has all kinds of pitfalls, but it might be nice if we make a first pass at solving it.
Support TEI parallel-segmentation output
Repetition relationship can't be set on witness + a.c. witness
It should be possible to set a repetition relationship between a reading from X and a later reading from X (a.c.) - at the moment this is excluded.
CTE parsing does not handle witStart and witEnd tags correctly.
Need to revamp handling of witStart and witEnd, to accord with what the XML is actually trying to represent. Current implementation is a misinterpretation.
Pull graph / dot manipulation functions into StemmaUtil
We should not be calling ::Stemma for graph manipulation that doesn't concern an actual Stemma object. Thus need to refactor in order to avoid circular dependency on ::Stemma <-> ::StemmaUtil.
CollateX format
I realise that there hasn't been much development on this in the last decade, but I'm still interested as a perl user of 25 years and working on the text editing and analysis tool at menotag.ku.dk
I'm putting in CollateX JSON output but running into two problems: CollateX may have changed its format since this tool was updated, because (when using the JSON string):
Can't use an undefined value as an ARRAY reference at /usr/local/share/perl5/Text/Tradition/Parser/JSON.pm line 160, line 41.
Additionally, I'm using string input because Text::Tradition won't read the JSON file:
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/local/share/perl5/Text/Tradition/Parser/JSON.pm line 117.
Where the JSON file is perfectly fine.
Reading duplication can cause invalid graphs
It seems that the duplicate_reading function can, if used unwisely, lead to a bad graph. There needs to be a check to ensure that neither the new reading nor the old reading will get dissociated from the graph, which is to say, they both need to have at least one witness afterward.
CollateX input parser should account for a.c. witnesses
Some users might want to collate a.c. readings in their manuscripts. If a witness is marked a.c., then the CollateX parser should treat it as a variant of the base witness and not as a new witness in its own right.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.