Code Monkey home page Code Monkey logo

flourite's People

Contributors

aldy505 avatar elianiva avatar farhan443 avatar hosein2398 avatar niek avatar seanf avatar semla avatar ts95 avatar vinmaster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

flourite's Issues

Missing boundaries to separate keywords from another strings

Many regex patterns in many languages are missing boundaries to separate the keywords from other strings. Which means they can be matched even if they're inside another word.

Example:

Python's regex that matches class keyword:

/class\s*\w+(\(\s*\w+\s*\))?\s*:/

It can match:

  • def upper-class (param):
  • subclass name(param):
  • classroom1(3):
  • classmate__(_):
  • classic(a):

They're not class declarations but they're still get matched because the regex just look whether they contain "class", and doesn't check whether they're surrounded by another letters.

A simple solution would be to surround the keywords with \b. This will prevent them from being matched when next to other word characters ( [A-Za-z0-9_] ). However, they will still get matched if they're next to punctuations.

This can or can't be a problem depending on the language and the punctuation. In JavaScript, any statement can be preceded by a semicolon, because semicolons are used to terminate statements. The same thing might not be the case in other languages.

Another solution which is pretty common is to surround the keywords with \s. This ensures that they can only be surrounded by whitespaces. This brings another problem because now they can't be matched if they're at the start or the end of the line.

An optimal solution would be to use an alternation and a custom character set to manually define the possible separators. e.g., (^|[\s;,]). While this would be effective, it could be harder to implement because you have to know precisely what are the valid positions and/or characters that could surround them.

refactor points system

current point system seems dumb. it's only +1, +2, -1 and -50. what if we were faced by unique keywords in certain language?

might change it to a range from -20 to +5.

new language: Elixir

new language: Pascal

Regular expression caused exponential backtracking on Java

  1. This part of the regular expression may cause exponential backtracking on strings starting with 'class' and containing many repetitions of 'a'.
    { pattern: /(public\s*)?class\s*(.*)+(\s)?\{/, type: 'keyword' },

According to the LGTM rule (click that link to see detailed rule):

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

See LGTM for detailed issue.

new language: Dart

bug: JS/HTML wrongly detected

This was detected as Julia:

<script>
	import 'nprogress/nprogress.css'
	import nprogress from 'nprogress'
	import { onMount } from 'svelte'

	onMount(() => {
		const onNavigationStart = () => {
			nprogress.start()
		}

		const onNavigationEnd = () => {
			nprogress.done()
		}

		window.addEventListener('sveltekit:navigation-start', onNavigationStart)
		window.addEventListener('sveltekit:navigation-end', onNavigationEnd)

		return () => {
			window.removeEventListener('sveltekit:navigation-start', onNavigationStart)
			window.removeEventListener('sveltekit:navigation-end', onNavigationEnd)
		}
	})
</script>

<slot />

Thanks to @lamualfa https://gist.github.com/lamualfa/fc53f45eaac1bc630e7c0329e4c821d9

Publish v1.3.0

I created this issue so that I remember to do it later this weekend. I have a long weekend on Thursday - Sunday this week, so I should be able to do this.

Notable changes that will be included on this release:

Regular expression causes exponential backtracking on Lua

  1. This part of the regular expression may cause exponential backtracking on strings starting with '{!,' and containing many repetitions of '!,'.

    { pattern: /{\s*(\S+)((,|;)\s*\S+)*\s*}/, type: 'constant.array' },

  2. This part of the regular expression may cause exponential backtracking on strings starting with '{!=!,' and containing many repetitions of '!=!,'.

    { pattern: /{\s*(\S+\s*=\s*\S+)((,|;)\s*\S+\s*=\s*\S+)*\s*}/, type: 'constant.dictionary' },

According to the LGTM rule (click that link to see detailed rule):

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

See LGTM for the detailed issue.

new language: Dockerfile

new language: Perl

refactor algorithm & do benchmark

self explanatory. using lots of regexp on lots of languages will slow everything down.

now we're using array.map array.reduce and so on forth. that might be changed to a more performant code.

also need to do some benchmark with kruonis, but still waiting for my PR to be merged here most-inesctec/kruonis#5

new language: Swift

Optimize the usage of regular expressions

Some regular expressions that are currently on the /src/language directory are somewhat inefficient.

Take this as an example:

{ pattern: /^(\s+)?\(ns(\s+)(.*)(\))?$/, type: 'meta.module' },
{ pattern: /^(\s+)?\(print(ln)?(\s+)(.*)(\))$/, type: 'keyword.print' },
{ pattern: /^(\s+)?\((de)?fn(-)?(\s+)(.*)(\))?$/, type: 'keyword.function' },
{ pattern: /^(\s+)?\((let|def)(\s+)(.*)(\))?$/, type: 'keyword.variable' },

That pattern might be optimized to /^\s*\(ns(\s+)(.*)(\))?$/ to minimize the steps needed to achieve the same result.

I believe there are more patterns like that, that can be optimized.

[RFC]: add shebang support

Flourite should detect #!/usr/bin/env language and return the correct language instead of guessing the rest of the file. No one writes #!/usr/bin/env bash and then proceeds to write javascript. idk maybe put like a ridiculous amount of points for shebang like 99999

I found Serious Problem In Regular Expression

IN MARKDOWN REAGULAR EXPRESSION

{ pattern: /^(?!!)(=|-){2,}(?<!>)$/, type: 'meta.module' }

"?<!" negative lookbehind regular expression not supported Safari and FireFox
How Can I fix it?

TODO: Rename repository to be more catchy

language-detector is straightforward, yes, but there's a package already exists with the same name on NPM. I don't feel like creating a new organization just to publish this name.

Seems better to have a fresh new name tho.

C# detected incorrectly

Expected: C#
Actual: Java

namespace ConsoleApplication13
{
    class Program
    {
        static void Main(string[] args)
        {
            int[] a = { 5, 3, 6, 4, 2, 9, 1, 8, 7 };
            QuickSort(a);
        }

        static void QuickSort(int[] a)
        {
            QuickSort(a, 0, a.Length - 1);
        }

        static void QuickSort(int[] a, int start, int end)
        {
            if (start >= end)
            {
                return;
            }

            int num = a[start];

            int i = start, j = end;

            while (i < j)
            {
                while (i < j && a[j] > num)
                {
                    j--;
                }

                a[i] = a[j];

                while (i < j && a[i] < num)
                {
                    i++;
                }

                a[j] = a[i];
            }

            a[i] = num;
            QuickSort(a, start, i - 1);
            QuickSort(a, i + 1, end);
        }
    }
}

Stuffs to clean up

Few things up for grabs while I'm gone this week:

  1. Change these to flourite. Also it'd be nice if the README is also changed.

    "name": "language-detector",

    "url": "https://github.com/teknologi-umum/language-detector"

    "url": "https://github.com/teknologi-umum/language-detector/issues"

    "homepage": "https://github.com/teknologi-umum/language-detector#readme",

  2. Replace this with husky install, then delete the prepare.cjs file.

    "prepare": "node prepare.cjs",

  3. Add proper file path so it could be imported by other projects as a dependency. Reference:
    https://github.com/aldy505/sql-dsl/blob/5d6c461ccf32004b91d8c7af79f219227935e7ca/package.json#L15-L25

  4. Append "prettier"

    "extends": ["eslint:recommended", "plugin:@typescript-eslint/recommended"],

Regular expression caused exponential backtracking on Ruby

  1. This part of the regular expression may cause exponential backtracking on strings starting with 'do|' and containing many repetitions of 'a' and on strings starting with 'do|a,' and containing many repetitions of 'aaa,'.
    { pattern: /(do\s*\|(\w+(,\s*\w+)?)+\|)/, type: 'keyword.control' },

According to the LGTM rule (click that link to see detailed rule):

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

See LGTM for the detailed issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.