Code Monkey home page Code Monkey logo

Comments (2)

thekevinscott avatar thekevinscott commented on June 27, 2024 1

As far as I can tell, that line of documentation was authored by @ejones - maybe they have some insight into the original intent?

My initial instinct would be to tighten the rules, in case the parser needs to be refactored in the future. In the examples you provided ---dashmaster--- and 12345 seem like particularly egregious rule names that could conflict. (Also, presumably --- is a valid rule too? That seems nasty.)

On the other hand, if it doesn't cause any parsing issues with the current state, maybe it's worth leaving as is and just updating the documentation, particularly since your integration test above now explicitly tests for these edge cases.

from llama.cpp.

HanClinto avatar HanClinto commented on June 27, 2024

@thekevinscott Thank you for the report!

I just wanted to write and confirm that your report is indeed correct.

Looking through the code, the set of valid identifiers is much larger, and includes anything in the realm of a-zA-Z0-9, and "-". This means that we can have identifiers that are all numbers, pure dashes, identifiers that have the same name but are different cases ("upper-lower-case" and "UPPER-LOWER-CASE" are distinct from each other), etc.

I built an integration test that compiles and runs correctly -- this is a valid grammar (according to the engine) that seems to stretch the rules, and certainly violate the documentation (as you noted):

root ::= simple-identifier
simple-identifier ::= ---dashmaster--- | mIxEd-CaSe | UPPER-LOWER-CASE | upper-lower-case | 12345
---dashmaster--- ::= [-_]+
mIxEd-CaSe ::= [a-z][A-Z][a-z][A-Z]
UPPER-LOWER-CASE ::= [A-Z][A-Z]
upper-lower-case ::= [a-z][a-z]
12345 ::= "67890"

@ochafik / @ggerganov -- feels like a minor issue, so I feel a little bad pinging y'all on this one, but I also struggle with direction-level decisions on my own. Do either of you have opinions? Should we:

  • Tighten up the rules re: identifiers (this might break existing grammars that people have been relying on -- I.E. the example from c.gbnf)
  • Or loosen the documentation to note the (greatly) increased flexibility of identifier naming (compared with the current restrictions from the docs) -- such as permitting numbers and upper-case letters?
  • Or some mix of the above...?

from llama.cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.