anthonynsimon / jurl Goto Github PK
View Code? Open in Web Editor NEWFast and simple URL parsing for Java, with UTF-8 and path resolving support
License: MIT License
Fast and simple URL parsing for Java, with UTF-8 and path resolving support
License: MIT License
possibly related to #2 but I have some URLs that are not roundtripping, e.g. if there is a trailing ?
scala> import com.anthonynsimon.url.URL
scala> URL.parse("http://example.com/?")
res0: com.anthonynsimon.url.URL = http://example.com/
I'm not sure if this because of jitpack.io, but I cant download the sources jar for jurl.
This makes debugging and understanding the library more difficult.
Maybe only an option is missing in the gradle build file.
Thanks for the cool library!
Hi,
How did you get "4+ million URLs per second"?
I have tried parsing 10.000.000 urls, it took 75 416 ms
final int count = 10000000;
val urls = new ArrayList<String>(count);
for (int i = 0; i < count; i++) {
urls.add("http://user@domain" + i + ".com:12345/a/great/path/?with=query&unicode_parameter=๐¬hing#cool");
}
long start = System.currentTimeMillis();
for (String url : urls) {
com.anthonynsimon.url.URL.parse(url);
}
System.out.println("time: " + (System.currentTimeMillis() - start) + " ms");
Your parser is the best solution I've found, but it does not parse port. Is there any reason why you didn't implement port parsing?
The string serialization seems incorrect. Maybe not all URL segments are properly urlencoded ?
Also UTF-8 is assumed in urlencoded parts, but in my corpus from the web there are some examples of latin1 encoded ones. I'n not sure about what the standard says, but chrome recognizes it.
import com.anthonynsimon.url.URL;
public class JurlTest {
public static void main(String[] args) {
test("http://abc.net/1160x%3E/quality/");
test("http://db-engines.com/en/system/PostgreSQL%3BRocksDB");
test("http://xzy.org/test/hei%DFfl"); // latin1
test("http://www.net/decom/category/AA/A_%26_BBB/AAA_%26_BBB/"); // !!!
test("https://en.wikipedia.org/wiki/Eat_one%27s_own_dog_food");
}
private static void test(String url) {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
}
try {
URL parse = URL.parse(url);
if(!parse.toString().equals(url)) {
System.out.print("NOT EQUAL: ");
System.out.println(url);
System.out.println(parse.toString());
System.out.println();
}
} catch (Exception e) {
System.out.print("KAPUTT: ");
System.out.println(url);
e.printStackTrace();
System.out.println();
}
}
}
results in
NOT EQUAL: http://abc.net/1160x%3E/quality/
http://abc.net/1160x>/quality/
NOT EQUAL: http://db-engines.com/en/system/PostgreSQL%3BRocksDB
http://db-engines.com/en/system/PostgreSQL;RocksDB
java.lang.StringIndexOutOfBoundsException: String index out of range: 15
at java.lang.String.substring(String.java:1963)
at com.anthonynsimon.url.PercentEncoder.decode(PercentEncoder.java:181)
at com.anthonynsimon.url.DefaultURLParser.parse(DefaultURLParser.java:85)
at com.anthonynsimon.url.URL.parse(URL.java:73)
at JurlTest.test(JurlTest.java:20)
at JurlTest.main(JurlTest.java:9)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
KAPUTT: http://xzy.org/test/hei%DFfl
NOT EQUAL: http://www.net/decom/category/AA/A_%26_BBB/AAA_%26_BBB/
http://www.net/decom/category/AA/A_&_BBB/AAA_&_BBB/
NOT EQUAL: https://en.wikipedia.org/wiki/Eat_one%27s_own_dog_food
https://en.wikipedia.org/wiki/Eat_one's_own_dog_food
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.