Code Monkey home page Code Monkey logo

parser-utils's Introduction

Renovate parser utils

Code parsing library filling the gap between ad-hoc regular expressions and parsers generated with complete grammar description.

License: MIT Trunk

Motivation

We are creating Renovate as a multi-language tool for keeping dependency versions up to date. While most package managers can rely on the programming language they're written in, we need some uniform way to deal with the variety of dependency description conventions using only TypeScript.

Some package managers use the relatively simple JSON format, like Node.js for example. Other tools, mostly related to DevOps, use the more elaborate YAML format. The trickiest thing is to deal with dependencies described by particular programming languages: for example, Gemfiles and Podfiles are written in Ruby, build.gradle files use Groovy, sbt leverages Scala, while bazel created its own language Starlark.

One approach is to use regular expressions, which is very easy but doesn't scale well to cover all syntactic variations. For example, we want to treat string literals 'foobar', "foobar" and """foobar""" as equivalent.

Another approach could be that we describe languages with tools like PEG.js or nearley.js. Although these are great tools, this approach has downsides for our use-case:

  • We have to define and test the complete grammar for each language, even if we're interested mostly in string literals, variable definitions and their scopes
  • Even small source errors lead to rejecting the whole file, while we want to skip the fragments that are misunderstood by the parser
  • We still would need to deal with a variety of language-specific AST tree formats, which may or may not have things in common.

The parser-utils library is an attempt to fill the gap between the approaches mentioned above. We leverage the moo library to produce tokens, which we group into the tree available for your queries. The query API is inspired by parsimmon, though it operates on the token level instead of the raw character sequence.

Goals

  • Be good enough for source code written well enough.
  • Go much further than is possible with regular expressions.
  • Respect location info. Once something interesting is found, it can be located in the source test via offset info. Once something is written, it should not affect the whole document formatting.
  • Incorporate poorly recognized fragments into the output and continue parsing.
  • Expressive API which helps you focus on syntactic structure, not the space or quote variations.
  • Allow to define a language of interest quickly. Provide definitions for popular languages out-of-the-box.

Non-goals

  • Catch all syntactic edge-cases
  • Compete with parsing tools with strict grammar definitions

Install

npm install @renovatebot/parser-utils

or

yarn add @renovatebot/parser-utils

Details

The library is divided into multiple levels of abstraction, from the lowest to the highest one:

Configures the moo tokenizer for specific language features such as:

  • Brackets: (), {}, [], etc
  • Strings: 'foo', "bar", """baz""", etc
  • Templates: ${foo}, {{bar}}, $(baz), etc
  • Single-line comments: #..., //..., etc
  • Multi-line comments: /*...*/, (*...*), etc
  • Identifiers: foo, Bar, _baz123, etc
  • Line joins: if the line ends with \, the next one will be treated as its continuation

Refer to the LexerConfig interface for more details. Also check out our usage example for Python.

This layer is responsible for transforming the token sequence to the nested tree with the tokens as leafs. Internally, we're using functional zipper data structure to perform queries on the tree.

To understand parser-utils queries, it's useful to keep in mind the principle of how regular expressions work. Each query represents sequence of adjacent tokens and tree elements.

For example, consider following query:

q.num('2').op('+').num('2').op('=').num('4');

It will match on the following fragments 2 + 2 = 4 or 2+2=4, but won't match on 2+2==4 nor 4=2+2.

Once brackets are defined, their inner contents will be wrapped into a tree node. It's possible to query tree nodes:

q.tree({
  search: q.num('2').op('+').num('2'),
})
  .op('=')
  .num('4');

The above query will match these strings:

  • (2 + 2) = 4
  • [2 + 2] = 4
  • (1 + 2 + 2 - 1) = 4
  • (1 + (2 + 2) - 1) = 4

It won't match 2 + 2 = 4 because there is no any nesting.

Contributing

Add link to CONTRIBUTING.md file that will explain how to get started developing for this package. This can be done once things stabilize enough for us to accept external contributions.

parser-utils's People

Contributors

renovate[bot] avatar zharinov avatar honkinggoose avatar viceice avatar

Stargazers

IKEDA Sho avatar

Watchers

James Cloos avatar  avatar

parser-utils's Issues

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Pending Approval

These branches will be created by Renovate only once you click their checkbox below.

  • chore: update dependency jest-watch-typeahead to v2

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

  • chore(deps): lock file maintenance

Rate Limited

These updates are currently rate limited. Click on a checkbox below to force their creation now.

  • chore(deps): update dependency release-it to v15.1.3
  • chore(deps): update dependency ts-jest to v28.0.7
  • chore(deps): update jest monorepo (@types/jest, jest)
  • chore(deps): update actions/setup-node action to v3.4.1

Pending Branch Automerge

These updates await pending status checks before automerging. Click on a checkbox to abort the branch automerge, and create a PR instead.

  • chore(deps): update dependency @types/node to v16.11.45
  • chore(deps): update dependency comment-parser to v1.3.1
  • chore(deps): update dependency http-server to v14.1.1
  • chore(deps): update dependency jest-watch-typeahead to v1.1.0
  • chore(deps): update dependency prettier to v2.7.1
  • chore(deps): update dependency ts-node to v10.9.1
  • chore(deps): update dependency typescript to v4.7.4
  • chore(deps): update linters (@typescript-eslint/eslint-plugin, @typescript-eslint/parser, eslint, eslint-config-prettier, eslint-plugin-import)

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

github-actions
.github/workflows/pr.yml
  • actions/checkout v3.0.2@2541b1294d2704b0964813337f33b291d3f8596b
  • actions/setup-node v3.3.0@eeb10cff27034e7acf239c5d29f62154018672fd
.github/workflows/trunk.yml
  • actions/checkout v3.0.2@2541b1294d2704b0964813337f33b291d3f8596b
  • actions/setup-node v3.3.0@eeb10cff27034e7acf239c5d29f62154018672fd
  • actions/checkout v3.0.2@2541b1294d2704b0964813337f33b291d3f8596b
  • EndBug/version-check v2.1.0@2512d2ac9130c439e045344eb6603a81f4b0f915
  • actions/setup-node v3.3.0@eeb10cff27034e7acf239c5d29f62154018672fd
  • actions/setup-node v3.3.0@eeb10cff27034e7acf239c5d29f62154018672fd
npm
package.json
  • @thi.ng/zipper 1.0.3
  • @types/moo 0.5.5
  • deep-freeze-es6 1.4.1
  • klona 2.0.5
  • moo 0.5.1
  • @homer0/prettier-plugin-jsdoc 4.0.6
  • @renovate/eslint-plugin v0.0.4
  • @types/jest 28.1.4
  • @types/node 16.11.33
  • @typescript-eslint/eslint-plugin 5.4.0
  • @typescript-eslint/parser 5.4.0
  • eslint 8.2.0
  • eslint-config-prettier 8.3.0
  • eslint-plugin-import 2.25.3
  • eslint-plugin-only-warn 1.0.3
  • http-server 14.0.0
  • husky 8.0.1
  • jest 28.1.2
  • jest-watch-select-projects 2.0.0
  • jest-watch-suspend 1.1.2
  • jest-watch-typeahead 1.0.0
  • npm-run-all 4.1.5
  • prettier 2.4.1
  • pretty-quick 3.1.3
  • release-it 15.1.1
  • ts-jest 28.0.5
  • ts-node 10.7.0
  • ttypescript 1.5.13
  • typescript 4.5.4
  • upath 2.0.1
  • comment-parser 1.3.0

  • Check this box to trigger a request for Renovate to run again on this repository

Add branch protection rules that require certain tests to pass

Right now we don't require any tests to pass before hitting the "Squash and merge" button on a incoming PR.
This should be changed so that the 6 tests we run now, all pass before we can merge... ๐Ÿ˜‰

This prevents broken stuff from getting merged by accident.

Improve issue templates

What

Improve issue templates, migrate to Issue Forms, standardize forms.

Why

Make issue templates looks more like the ones we have over on the main Renovate repository.
Right now they're using an outdated format, and are harder to fill out than having a form.

How

I propose we move them to a Issue Form, and make them similar to the ones we use on the main Renovate repository.

What kind of issue forms do we need? I think we need at least these:

  • Bug report form
  • Feature request form
  • Refactor form for other things like refactoring, docs proposals, and so on

Use same labels as we do over on the main Renovate repo

What

Add status:*, type:* and priority:* that are the same as the ones we use on the main renovatebot/renovate repository.

Why

If I'm going to the trouble of fixing up our issue templates/forms in #32, I'd like to be able to add proper labels as well.

How

Manually add labels, pick the same colors as on the main repository.

I do need to know which types of labels you want/need before I start, or else I spend time making labels that you do not want in the first place... ๐Ÿ˜‰

Improve Renovate bot configuration on this repository

What

Standardize and improve our Renovate bot setup.

Why

We're not consistent with the setup we use on this repository, compared to the setup we have on the main renovate bot repo.

In any case we're missing these things that I think are important/handy to have:

  • We're not extending from our global renovatebot/.github Renovate config
  • Wait 3 days until getting updates for npm packages (can be unpublished within 72 hours)
  • Pin GitHub Actions to SHA / Digest

We also have some overly large groups, not sure what we want to do with them:

"packageRules": [
{
"groupName": "renovate-meta",
"updateTypes": ["lockFileMaintenance", "pin"],
"labels": ["type/chore"],
"semanticCommitType": "chore",
"semanticCommitScope": "deps"
},
{
"groupName": "dependencies (non-major)",
"depTypeList": ["dependencies"],
"updateTypes": ["patch", "minor"],
"labels": ["type/deps"],
"semanticCommitType": "deps"
},
{
"groupName": "devDependencies (major)",
"depTypeList": ["devDependencies"],
"updateTypes": ["major"],
"labels": ["type/chore"],
"semanticCommitType": "chore",
"semanticCommitScope": "deps"
},
{
"groupName": "devDependencies (non-major)",
"depTypeList": ["devDependencies"],
"updateTypes": ["patch", "minor"],
"labels": ["type/chore"],
"semanticCommitType": "chore",
"semanticCommitScope": "deps"
}

I also don't like how the bot adds labels, our other Renovate bot PRs do not add any labels, so we might want to strip the labeling part out of the config as well.

How

Discuss what changes we want/need to our Renovate config on this repository.
Make PR to make the changes.

Add supported engines to package.json

Perceived Problem

Users can use this library on older unsupported node versions and can get failures

Ideas / Proposed Solution(s)

We should add supported node engines to package.json.

Support for HEREDOC strings

Perceived Problem

We can't define them at the lexer level because code may contain non-standard begin/end markers.

Ideas / Proposed Solution(s)

These two moo.Lexer methods should be used:

save(): LexerState;
reset(chunk?: string, state?: LexerState): this;

The idea:

  • Stop normal processing: call lexer.save() once HEREDOC start is encountered
  • Process HEREDOC fragment using custom code or sub-lexer and return single string token as result
  • Continue normal processing: call lexer.reset(input, state) with the the input and state values corresponding the position immediately after HEREDOC

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.