teknologi-umum / flourite Goto Github PK
View Code? Open in Web Editor NEWAutomatically detects a programing language from a given string
License: MIT License
Automatically detects a programing language from a given string
License: MIT License
Many regex patterns in many languages are missing boundaries to separate the keywords from other strings. Which means they can be matched even if they're inside another word.
Example:
Python's regex that matches class keyword:
/class\s*\w+(\(\s*\w+\s*\))?\s*:/
It can match:
They're not class declarations but they're still get matched because the regex just look whether they contain "class", and doesn't check whether they're surrounded by another letters.
A simple solution would be to surround the keywords with \b
. This will prevent them from being matched when next to other word characters ( [A-Za-z0-9_]
). However, they will still get matched if they're next to punctuations.
This can or can't be a problem depending on the language and the punctuation. In JavaScript, any statement can be preceded by a semicolon, because semicolons are used to terminate statements. The same thing might not be the case in other languages.
Another solution which is pretty common is to surround the keywords with \s
. This ensures that they can only be surrounded by whitespaces. This brings another problem because now they can't be matched if they're at the start or the end of the line.
An optimal solution would be to use an alternation and a custom character set to manually define the possible separators. e.g., (^|[\s;,])
. While this would be effective, it could be harder to implement because you have to know precisely what are the valid positions and/or characters that could surround them.
bedanya cuma clisp ga punya [] atau {}, () semua
current point system seems dumb. it's only +1, +2, -1 and -50. what if we were faced by unique keywords in certain language?
might change it to a range from -20 to +5.
Add Elixir language as the supported language to detect
You can learn Elixir first here:
Some test cases:
test cases:
language references:
'class'
and containing many repetitions of 'a'
.flourite/src/languages/java.ts
Line 15 in 313def7
According to the LGTM rule (click that link to see detailed rule):
Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.
See LGTM for detailed issue.
Test cases:
you can add more test if necessary
Add Dart as a supported language to detect.
You can learn Dart here:
Some test cases:
This was detected as Julia:
<script>
import 'nprogress/nprogress.css'
import nprogress from 'nprogress'
import { onMount } from 'svelte'
onMount(() => {
const onNavigationStart = () => {
nprogress.start()
}
const onNavigationEnd = () => {
nprogress.done()
}
window.addEventListener('sveltekit:navigation-start', onNavigationStart)
window.addEventListener('sveltekit:navigation-end', onNavigationEnd)
return () => {
window.removeEventListener('sveltekit:navigation-start', onNavigationStart)
window.removeEventListener('sveltekit:navigation-end', onNavigationEnd)
}
})
</script>
<slot />
Thanks to @lamualfa https://gist.github.com/lamualfa/fc53f45eaac1bc630e7c0329e4c821d9
I created this issue so that I remember to do it later this weekend. I have a long weekend on Thursday - Sunday this week, so I should be able to do this.
Notable changes that will be included on this release:
This part of the regular expression may cause exponential backtracking on strings starting with '{!,'
and containing many repetitions of '!,'
.
Line 30 in 313def7
This part of the regular expression may cause exponential backtracking on strings starting with '{!=!,'
and containing many repetitions of '!=!,'
.
Line 32 in 313def7
According to the LGTM rule (click that link to see detailed rule):
Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.
See LGTM for the detailed issue.
Add Dockerfile as a supported language to detect
You can learn Dockerfile here:
Some test cases:
As always, you may add your own test cases.
Add Perl language support for the detection.
You can learn Perl first here:
Some test cases:
self explanatory. using lots of regexp on lots of languages will slow everything down.
now we're using array.map array.reduce and so on forth. that might be changed to a more performant code.
also need to do some benchmark with kruonis, but still waiting for my PR to be merged here most-inesctec/kruonis#5
Add Swift as a supported language to detect
Learn swift here:
Some test cases:
Some regular expressions that are currently on the /src/language
directory are somewhat inefficient.
Take this as an example:
flourite/src/languages/clojure.ts
Lines 4 to 7 in 32bc6b7
That pattern might be optimized to /^\s*\(ns(\s+)(.*)(\))?$/
to minimize the steps needed to achieve the same result.
I believe there are more patterns like that, that can be optimized.
Flourite should detect #!/usr/bin/env language
and return the correct language instead of guessing the rest of the file. No one writes #!/usr/bin/env bash
and then proceeds to write javascript. idk maybe put like a ridiculous amount of points for shebang like 99999
IN MARKDOWN REAGULAR EXPRESSION
{ pattern: /^(?!!)(=|-){2,}(?<!>)$/, type: 'meta.module' }
"?<!" negative lookbehind regular expression not supported Safari and FireFox
How Can I fix it?
language-detector is straightforward, yes, but there's a package already exists with the same name on NPM. I don't feel like creating a new organization just to publish this name.
Seems better to have a fresh new name tho.
Expected: C#
Actual: Java
namespace ConsoleApplication13
{
class Program
{
static void Main(string[] args)
{
int[] a = { 5, 3, 6, 4, 2, 9, 1, 8, 7 };
QuickSort(a);
}
static void QuickSort(int[] a)
{
QuickSort(a, 0, a.Length - 1);
}
static void QuickSort(int[] a, int start, int end)
{
if (start >= end)
{
return;
}
int num = a[start];
int i = start, j = end;
while (i < j)
{
while (i < j && a[j] > num)
{
j--;
}
a[i] = a[j];
while (i < j && a[i] < num)
{
i++;
}
a[j] = a[i];
}
a[i] = num;
QuickSort(a, start, i - 1);
QuickSort(a, i + 1, end);
}
}
}
I get this error
flourite__WEBPACK_IMPORTED_MODULE_12___default(...) is not a function
When running the example: const code = flourite('console.log("Hello World");');
Few things up for grabs while I'm gone this week:
Change these to flourite. Also it'd be nice if the README is also changed.
Line 2 in 91e59f4
Line 21 in 91e59f4
Line 48 in 91e59f4
Line 50 in 91e59f4
Replace this with husky install
, then delete the prepare.cjs
file.
Line 9 in 91e59f4
Add proper file path so it could be imported by other projects as a dependency. Reference:
https://github.com/aldy505/sql-dsl/blob/5d6c461ccf32004b91d8c7af79f219227935e7ca/package.json#L15-L25
Append "prettier"
Line 7 in 91e59f4
'do|'
and containing many repetitions of 'a'
and on strings starting with 'do|a,'
and containing many repetitions of 'aaa,'
.flourite/src/languages/ruby.ts
Line 23 in 313def7
According to the LGTM rule (click that link to see detailed rule):
Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.
See LGTM for the detailed issue.
In https://github.com/teknologi-umum/flourite/blob/master/src/languages/javascript.ts#L19 :
function\*?(\s+[$\w]+\s*\(.*\)|\s*\(.*\))
This regex can match:
They should not match because function name can't start with a digit.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.