beelsebob / coreparse Goto Github PK

View Code? Open in Web Editor NEW

366.0 366.0 49.0 709 KB

A shift/reduce parsing framework for Mac OS X and iOS

License: BSD 3-Clause "New" or "Revised" License

Objective-C 100.00%

coreparse's People

Contributors

Stargazers

Watchers

Forkers

mobilehub mtrubnikov mysteriouspants chrisballinger tpcarlson andrimarjonsson aspitz sipo1209 stardav madcreep jbrennan safedoorpm franklefebvre woolie phatmann skyprayerstudio siuying samdmarshall infinitenil nghialv ipinka tempbottle annovae luiseduardohdbackup frenya tbmvp shaps80 vedboon liqiushui akaking houweifeng h0medev tactile erduoniba sstadelman veritech 765235992 gnachman enquora catap mtywoniuk

coreparse's Issues

A new version tag

There are some fixed since 1.1, if you feel comfortable, would you please push a new version tag to github?

Parethesised sections of rules do not allow |s

The grammar parser currently will not parse this rule:

A ::= ( | <C>) because of the use of '|' inside parentheses.

Core Parse should be able to parallelise better

At the moment a tokeniser must produce its entire output before the parser can start, these two should be able to run as producer/consumer on separate threads.

Problem creating parser: "Could not insert reduce in action table for state 14, token -"

I was creating a parser:

Expression  ::= (<Orientation>)?
        (<Superview><Connection>)?
        <View> (<Connection><View>)*
        (<Connection><Superview>)?;
Connection  ::= '-' | '-' 'Number' '-';
View        ::= '[' 'Identifier' ']';
Superview   ::= '|';
Orientation ::= 'V:' | 'H:';

I encountered error when I try to create parsers (SLR/LR1/LALR1). ("Could not insert reduce in action table for state 14, token -")

Test Case:

    NSError* error;
    NSString* grammarPath = [[NSBundle mainBundle] pathForResource:@"VFL" ofType:@"grammar"];
    NSString* grammarString = [NSString stringWithContentsOfFile:grammarPath
                                                        encoding:NSUTF8StringEncoding
                                                           error:&error];
    CPGrammar* grammar = [[CPGrammar alloc] initWithStart:@"Expression"
                                           backusNaurForm:grammarString
                                                    error:&error];
    NSLog(@"grammar: %@", grammarString);

    CPTokeniser* tokenizer = [[CPTokeniser alloc] init];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"V:"]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"H:"]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"|"]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"-"]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@","]];

    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"("]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@")"]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"["]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"]"]];

    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@">="]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"<="]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"=="]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@">"]];
    [tokenizer addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"<"]];
    [tokenizer addTokenRecogniser:[CPIdentifierRecogniser identifierRecogniser]];
    [tokenizer addTokenRecogniser:[CPNumberRecogniser numberRecogniser]];
    [tokenizer addTokenRecogniser:[CPWhiteSpaceRecogniser whiteSpaceRecogniser]];

    CPTokenStream* stream = [tokenizer tokenise:@"H:|-50-[label]-50-|"];
    NSLog(@"stream: %@", stream);

    CPParser* parser = [[CPLALR1Parser alloc] initWithGrammar:grammar];
    CPSyntaxTree* ast = [parser parse:stream];
    NSLog(@"ast: %@", ast);

If I remove (<Connection><View>)* from the grammar it will work, I wonder can I rewrite the grammar such that I can make it work with CoreParse?

Tokens should keep track of their line & column numbers.

As title.

Push latest version to CocoaPods

Thanks in advance!

CPTokeniser should be able to work on a stream of input rather than the whole NSString

This would allow it to begin tokenising mid way through downloading content, rather than requiring it all to be in memory before beginning.

Tokeniser error recovery

When the tokeniser fails to recognise anything, a delegate method should be called allowing the user to skip over characters and/or insert an error token in the token stream.

Summary:
At the moment, CPParseResult implementations are highly reliant on the position of elements within productions. This means that if the grammar is edited, some significant modifications to the relevant initWithSyntaxTree: method. It would be extremely useful to provide a method of accessing elements without relying on position in the rule.

Proposal:
Allow the use of keys as so in productions:
A ::= b@ c@<C> ('{' d@* '}')?
CPSyntaxTree would gain an additional method -childForKey:. In this instance, with the following results:
[tree childForKey:@"b"] – equal to the current [[tree children] objectAtIndex:0].
[tree childForKey:@"c"] – equal to the current [[tree children] objectAtIndex:1].
[tree childForKey:@"d"] – equal to the current [[[[[tree children] objectAtIndex:2] objectAtIndex:0] objectAtIndex:0] objectAtIndex:1] if the ? production exists, nil otherwise.
[tree childForKey:@"e"] – nil.

The use of keys inside ()*s and ()+s would be disallowed, as would duplicate keys on the same section of an alternative. All of the following would be invalid:
A ::= b@ b@
A ::= ( b@ <C> )*
A ::= ( b@ <C> )+
This however would be allowed:
A :: = b@ | b@<C>

Notes: It has been suggested that : is a better character than @ for keys:
A ::= b: c:<C> ('{' d:* '}')?

CoreParse cannot handle ambiguous grammars

CoreParse needs a method of handling shift/reduce and reduce/reduce conflicts to allow for dealing with ambiguous grammars.

Syntax for specifying grammars.

Grammars should be specifyable using single quote marks to surround terminals to avoid escaping double quotes in strings all the time.

Lack of Documentation

CoreParse needs documentation for all classes.

Including CoreParse in an ARC-enabled iOS project is not trivial

I'm new to Xcode and iOS and the procedure to add CoreParse into my iOS project is not straightforward. When I do the basic stuff I am still getting linker errors, I believe because of CoreParse's use of categories.

Step-by-step instructions on how best to do this would be helpful.

No LALR(1) Parser

CoreParse does not currently support creating LALR(1) parsers.

CoreParse should work on iOS

Currently the project contains only a Mac OS X project, an iOS static library would be nice too.

Add better error handling support

Core parse currently just bombs out printing the remaining token stream when it fails to parse – this needs to be improved, both in terms of making it return errors rather than logging, and by making it able to recover from errors and continue parsing.

How do you tokenize Python style blocks?

What's a recommend way to create a tokenizer that will handle Python style blocks where the blocks are denoted by indentation?

Remember to Push a New Tag for CocoaPods

There's been a lot of interesting changes in CP recently, including some API changes and a large number of optimizations.

This is just a reminder from your friendly CocoaPod spec maintainer to push up a new tag when you feel like you have a stable/finished release. When you do, close this issue and I'll have an updated Podspec committed in a short while. (Existing spec resides at here).

Thanks!

Grammar error: "Could not insert reduce in action table for state 4, token #"

I'm working on a grammar that should accept optional white space. I added the optional white space rule and the parser complain error "Could not insert reduce in action table for state 4, token #".

I could not tell if this is problem in grammar or it is not supported by CoreParse. Appreciate any pointer to this error!

Before I added optional space:

CSSSelectors                    ::= <CSSSelectorSequence> (<CSSCombinator> <CSSSelectorSequence>)*;
CSSCombinator                   ::= <Greater>  'Whitespace'* | <Plus>  'Whitespace'* | <Tilde> 'Whitespace'*| 'Whitespace'+;
Plus                            ::= '+';
Greater                         ::= '>';
Tilde                           ::= '~';

After I added optional space, parser return error:

CSSSelectors                    ::= <CSSSelectorSequence> (<CSSCombinator> <CSSSelectorSequence>)*;
CSSCombinator                   ::= <Greater>  'Whitespace'* | <Plus>  'Whitespace'* | <Tilde> 'Whitespace'*| 'Whitespace'+;
Plus                            ::= 'Whitespace'* '+';
Greater                         ::= 'Whitespace'* '>';
Tilde                           ::= 'Whitespace'* '~';

Full grammar: https://github.com/siuying/CSSSelectorConverter/blob/core-parse/CSSSelectorConverter/CSSSelectorGrammar.txt

Allow recovery from parse errors.

Parsers should have a method of recovering from errors and continuing parsing.

Memory leak

We are developing an app that uses CoreParse for parsing specific strings. I realised that the library leaks memory at some point. The most critical leak is in CPShiftReduceParser when instantiating a CPRHSItemResult

In order to fix this leak i just modified this line:
result = [(id)[c alloc] initWithSyntaxTree:tree];

and added the autorelease call
result = [[(id)[c alloc] initWithSyntaxTree:tree] autorelease];

This fixed several leaks for me.

Example code implemented

I've fleshed out the example described on the main CoreParse web page:

https://github.com/gavineadie/ParseTest

Some grammar is not properly archived on iOS platform

I noticed for some CPGrammar, when archived and then unarchived, will be broken and cannot be used, and this only happened on iOS platform.

An example test case: siuying@1344c73

Somehow the order of the rules changed after archive/unarchive, and the resulting grammar is not correct.