Code Monkey home page Code Monkey logo

dateparser's People

Contributors

fdlsk2r avatar kiruthikaaarthi avatar sisyphsu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dateparser's Issues

Strange timezone offsets

Hi,

public static void main(String[] args) {
final DateParser dp = DateParser.newBuilder().build();
final String date = "2020-06-08T13:45:05-00:00";
System.out.println(dp.parseDate(date).toString());
System.out.println(dp.parseDateTime(date).toString());
System.out.println(dp.parseOffsetDateTime(date).toString());
}

The code above gives me the results:

Mon Jun 08 14:45:05 CEST 2020
2020-06-08T14:45:05
2020-06-08T13:45:05Z

My local time was 13:45:05 I'm using GMT+2.
I expected to get something like 15:45:05 (13:45:05 + 2 hours)
Am I doing something wrong?

Version 1.0.2-1.0.4

Version 1.0.0-1.0.1
Gives me the following:

Mon Jun 08 15:45:05 CEST 2020
2020-06-08T15:45:05
2020-06-08T13:45:05Z

Support modern JDK versions by updating lombok to 1.18.10 (or newer)

See original Lombok issue here: projectlombok/lombok#2790

It looks like updating to a more recent version of lombok will do the trick!

Personally, I use OpenJDK 17 and see the error:

java.lang.RuntimeException: java.lang.IllegalAccessError: class lombok.javac.apt.LombokProcessor (in unnamed module @0x4d0cbf3f) cannot access class com.sun.tools.javac.processing.JavacProcessingEnvironment (in module jdk.compiler) because module jdk.compiler does not export com.sun.tools.javac.processing to unnamed module @0x4d0cbf3f

during compilation.

Thank you for this wonderful date parsing solution!

(How to?) Improve performance when parsing many strings in the same format

I was wondering if there is an option to improve the performance even further when parsing many strings that are all in the same format.
My use-case is parsing timestamps from a CSV file where the CSV file has million of rows and each of the timestamps is in the same format.
It would be ideal if I could just say to the parser: "remember that format you detected for the previous string. I'm pretty sure this string is in the same format, so try that first when parsing this string".

To illustrate this, my situation is similar to this benchmark

package com.github.sisyphsu.dateparser.benchmark;

import com.github.sisyphsu.dateparser.DateParser;
import org.openjdk.jmh.annotations.*;

import java.util.Random;
import java.util.concurrent.TimeUnit;

@Warmup(iterations = 2, time = 2)
@BenchmarkMode(Mode.AverageTime)
@Fork(2)
@Measurement(iterations = 3, time = 3)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MultiSameBenchmark {

    private static String[] TEXTS;

    static {
        Random random = new Random(123456789l);
        TEXTS = new String[10000000];
        for(int i = 0; i < TEXTS.length; i++){
            TEXTS[i] = String.format("2020-0%d-1%d 00:%d%d:00 UTC",
                    random.nextInt(8) + 1,
                    random.nextInt(8) + 1,
                    random.nextInt(5),
                    random.nextInt(9));
        }
    }

    @Benchmark
    public void parser() {
        DateParser parser = DateParser.newBuilder().build();
        for (String text : TEXTS) {
            parser.parseDate(text);
        }
    }
}

Is there already such an option on the parser that I overlooked ?

Custom Rule Failure

I have the following code, which I would expect to be able to extract dates like 28X01X2020 (ddXmmXyyyy):

DateParser parser = DateParser.newBuilder()
    .addRule("(?<day>[0-9]{2})X(?<month>[0-9]{2})X(?<year>[0-9]{4})")
    .build();

System.out.println(parser.parseDate("28X01X2020"));

However, on running this, I get the following error:

java.time.format.DateTimeParseException: Text 28X01X2020 cannot parse at 0

I can't see how this differs significantly from the example in the README. Am I doing something wrong, or is this a bug?

Locale support

Currently local versions of dates like "23. Mรคrz 1999" for 23nd of March 1999 in german aren't detected

Weird bug - Custom date parser for parsing dates with zero prefixes

For a specific usecase, I needed to parse a date of format yyyy-mm-dd where each component might be prefixed by a zero

I am trying to write a custom parser which is able to parse this

for eg

02022-012-009 should be parsed as 2022-12-09

This is my code


import com.github.sisyphsu.dateparser.DateParser;

public class DateUtilsApplication  {

    public static void main(String[] args) {
        DateParser dateParser = DateParser.newBuilder()
                .addRule("0?(?<year>\\d{4})\\W{1}0?(?<month>\\d{1,2})\\W{1}0?(?<day>\\d{1,2})")
                .build();;

        //example 1  (no zeros)
        System.out.println(dateParser.parseDateTime("2022-12-09").toLocalDate());
        //prints "2022-12-09"

        //example 2 (year, month and date have zero prefix)
        System.out.println(dateParser.parseDateTime("02022-012-009").toLocalDate());
        //prints "2022-12-09"

        //example 3 (month and date have zero prefix)
        System.out.println(dateParser.parseDateTime("2022-012-009").toLocalDate());
        //prints "2022-12-09"

        //example 4 (date has zero prefix)
        System.out.println(dateParser.parseDateTime("2022-12-009").toLocalDate());
        //expected  "2022-12-09", but errors out


    }

}

All examples use the same dateparser but the fourth errors out.
very weird because I have given 0? for all 3 components.

What is the problem here?

Fatal Exception: java.lang.NoClassDefFoundError

We get a crash on Android 6 and 7 devices (LG and Samsung so far):

Fatal Exception: java.lang.NoClassDefFoundError: Failed resolution of: Ljava/time/ZoneId;
       at com.github.sisyphsu.dateparser.DateBuilder.<clinit>(DateBuilder.java:21)
       at com.github.sisyphsu.dateparser.DateParser.<init>(DateParser.java:23)
       at com.github.sisyphsu.dateparser.DateParserBuilder.build(DateParserBuilder.java:207)
       at com.github.sisyphsu.dateparser.DateParserUtils.<clinit>(DateParserUtils.java:20)
       at com.github.sisyphsu.dateparser.DateParserUtils.parseDate(DateParserUtils.java:29)

https://developer.android.com/reference/java/time/ZoneId is Android 8+ only so your library doesn't support older OS versions.

Get date format

Hi,
I would like to be able to get the date format from a string. For example, one could add a getFormatPattern method to get a format string that can be used further for business logic.
My case:
I am doing a csv parser, I need to be able to define the data type in a column, this library could help me if it had this function.

Returning a hint whether the parsing was deterministic or not

We have a use case where we need to get all possible combinations of date in case the format is non deterministic. For example, 3/4/2023 where parser won't know which is date and which is month. In this case, could we provide one of the options:

  • Either provide both possible dates
  • Or, return a hint that the group was "dayOfMonth" in which case, the caller can use the parsed date to convert to the alternate format and later resolve the conflict based on the use case.

preferMonthFirst is not reset when month is given greater than 12

Hi ,

While using DateParser lib for en_US language, where it considers month first, it works as expected if we provide any values within 12 for month. Example: 09/21/2021. But when we provide greater than 12( example, 17/03/2021), it is behaving a bit different. From my understanding from document, it should not consider a preferMonthFirst if it is greater than 12. So, expected date in this case is March 17, 2021. But it displays, May 3, 2022.

Case 1: prefer date first:
In the below example, when we provided the month greater than 12, it prefers first value as a month. This is expected and correct behaviour

dateParser.setPreferMonthFirst(false);
dateParser.parseDate("12/17/21")
Result: Date@64 "Fri Dec 17 00:00:00 IST 2021"

Case 2: prefer month first:
In the below example, when we provide month greater than 12, it ideally should reset to prefer date first.

dateParser.setPreferMonthFirst(true);
dateParser.parseDate("17/09/2021")
Result: Date@72 "Mon May 09 00:00:00 IST 2022"
Expected Value: : Mon Sept 17 00:00:00 IST 2021

Not sure why the behaviour is like this. We would also expect it to parse without considering preferMonthFirst.

Could you please help us here on how we can handle this so that we get date as first prefered only in this edge case?

Dateparser parsed the date incorrectly

This is string : "Sat, 29 Feb 2020 01:21:19+5:30"

This is the output by dateparser : Sat, 29 Feb 2020 00:00:19 UTC

Expected output : Fri Feb 28 19:51:19 UTC 2020

The code is used :

                Date date = DateParserUtils.parseDate("Sat, 29 Feb 2020 01:21:19+5:30");
		SimpleDateFormat dateFormat = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss z");
		dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));
		System.out.println("Output: "+ dateFormat.format(date));

Unneeded patterns/rules influence the result of the parsing

This might be an issue with retree rather than with the dateparser though.

The following test (which you cannot execute via the public API) fails:

    @Test
    public void parserWithLimitedPatterns(){
        List<String> rules = Arrays.asList(
          "(?<year>\\d{4})\\W{1}(?<month>\\d{1,2})\\W{1}(?<day>\\d{1,2})[^\\d]?",
          "\\W*(?:at )?(?<hour>\\d{1,2}):(?<minute>\\d{1,2})(?::(?<second>\\d{1,2}))?(?:[.,](?<ns>\\d{1,9}))?(?<zero>z)?",
          " ?(?<zoneOffset>[-+]\\d{1,2}:?(?:\\d{2})?)"
        );

        DateParser dateParser = new DateParser(rules, new HashSet<>(rules), Collections.emptyMap(), true, false);
        String input = "2022-08-09 19:04:31.600000+00:00";
        Date date = dateParser.parseDate(input);
        assertEquals(parser.parseDate(input), date);
    }

Note how those 3 rules should be sufficient to parse the date.

  • There is a rule for the year-month-day part
  • There is a rule for the hours:minutes:seconds.ns part
  • There is a rule for the zone offset part

However, during parsing the zoneoffset rule is never used. Instead, it uses the rule for the hours twice.

The weird thing is that when I add a rule that should not be used (`" ?(?\d{4})$"), the test suddenly succeeds:

    @Test
    public void parserWithLimitedPatterns(){
        List<String> rules = Arrays.asList(
          "(?<year>\\d{4})\\W{1}(?<month>\\d{1,2})\\W{1}(?<day>\\d{1,2})[^\\d]?",
          " ?(?<year>\\\\d{4})$",
          "\\W*(?:at )?(?<hour>\\d{1,2}):(?<minute>\\d{1,2})(?::(?<second>\\d{1,2}))?(?:[.,](?<ns>\\d{1,9}))?(?<zero>z)?",
          " ?(?<zoneOffset>[-+]\\d{1,2}:?(?:\\d{2})?)"
        );

        DateParser dateParser = new DateParser(rules, new HashSet<>(rules), Collections.emptyMap(), true, false);
        String input = "2022-08-09 19:04:31.600000+00:00";
        Date date = dateParser.parseDate(input);
        assertEquals(parser.parseDate(input), date);
    }

The position where I add that additional rule is important. For example adding it at the end of the list instead of at index 1 makes the test fail again.

I bumped into this issue for PR #28 , where I try to reduce the number of rules that are used for parsing to improve the performance.

Date Parsing Rpoblem

Hi,
In Slovenia date format is "d. M. yyyy" (Example: "13. 4. 2022") and the problem is that this dateParser can't parse it:

Exception in thread "main" java.time.format.DateTimeParseException: Text 12. 4. 2022 cannot parse at 0
at com.github.sisyphsu.dateparser.DateParser.error(DateParser.java:401)
at com.github.sisyphsu.dateparser.DateParser.error(DateParser.java:397)
at com.github.sisyphsu.dateparser.DateParser.parse(DateParser.java:131)
at com.github.sisyphsu.dateparser.DateParser.parseDate(DateParser.java:67)
at com.github.sisyphsu.dateparser.DateParserUtils.parseDate(DateParserUtils.java:29)
at si.zzi.eforms.wp.utils.DateJSFConverter.main(DateJSFConverter.java:51)

Can you help? regards

Missing date format

Hi,

The following format throws an exception:

2020-30-03T18:28:47.382Z

I tried to add a customized rule:
DateParser parser = DateParser.newBuilder().addRule("(?\d{4})\W{1}(?\d{1,2})\W{1}(?\d{1,2})[^\\d]?(?\d{1,2}):(?\d{1,2})(?:(?\d{1,2}))?(?:.,)?(?z)?").build();

But it didn't work. What am I doing wrong?

I need to support both YYYY-mm-dd and YYYY-dd-mm.

Thank you.

Format exception

hi,I want to format the input "17JAN2023/00:00";
I use the rule "(?\d{2})\W+(?%s)\W+(?\d{4})./(?\d{2})$" but has error like that;
Exception in thread "main" java.time.format.DateTimeParseException: Text 17JAN2023/00:00 cannot parse at 0

Wrong selection of a matching rule

Hi, I'm trying to parse dates in a format of month-year. This format without a day is very common for documents like CV. But I found that I cannot add a custom rule e.g. for the following dates:

September 2010
September/2003

DateParser parser = DateParser.newBuilder()
                    .addRule("(?<month>september)\\s{1,4}(?<year>\\d{4})")
                    .addRule("(?<month>\\w+)\\s{1,4}(?<year>\\d{4})")
                    .addRule("(?<month>\\w+)/(?<year>\\d{4})")
                    .build();
Calendar calendar = parser.parseCalendar(date.toLowerCase());

I added custom rules and checked that these must be working fine as a common Regex. But I'm getting an error Text september 2010 cannot parse at 12. The reason is, in the code:

    private void DateParser::parse(final CharArray input) {
        matcher.reset(input);
        int offset = 0;
        int oldEnd = -1;
        while (matcher.find(offset)) {
       // ....
        }
        if (offset != input.length()) {
            throw error(offset);
        }
    }

every time I see matcher.re() is equal to (?<month>september)\W+(?<day>\d{1,2})(?:th)?\W* with offset equal to 12 instead of 14 and, definitely, this doesn't cover whole template.

Is any way to force matching by a longest match instead of taking first one? Or give a bunch of matches instead of a total break?

Thank you so much for writing this library!

I was about to write my own "lenient" datetime parser when I stumbled across this project. It works so well, and saved me many hours. Thank you!

(Feel free to close this ๐Ÿ™‚ )

ISO formatted string is parsed wrong

    @Test
    public void testISOString() throws ParseException {
        String input = "2016-10-29T09:20:19.000Z";
        Date simpleDateFormatDate = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSX").parse(input);
        Date instantDate = Date.from(Instant.parse(input));
        Date parsedDate = parser.parseDate(input);

        assert Objects.equals(simpleDateFormatDate, instantDate);
        //The following assert fails
        assert Objects.equals(simpleDateFormatDate, parsedDate);
    }

fails.

The input should parse to (as the JDK code does):

Sat Oct 29 11:20:19 CEST 2016

But you get

Sat Oct 29 10:20:19 CEST 2016

The ISO timestamp is expressed in GMT. CEST is Central Europe Summer Time and is GMT+2. So 09:20 in GMT becomes 11:20 CEST. To me it looks like the JDK code is correct, and this parser is 1 hour off.

This was tested with OpenJDK11, with

  • Locale.getDefault(Locale.Category.FORMAT): en_US
  • TimeZone.getDefault(): id="Europe/Brussels"

Parsing a date with some negative offsets raises an exception.

Steps to reproduce:
Try to parse a date with negative time zone offset and minutes set in 30 - โ€œ2020-12-31 01:33-09:30โ€ and โ€œ2020-12-31 07:33-03:30โ€. Error is raised: Zone offset minutes and seconds must be negative because hours is negative..โ€
Please, note: parsing works for most of time zones (negative and positive). The problem happened only when negative time zone has non-zero minutes.

UTC-09:30 and UTC-03:30 are real offsets:
https://en.wikipedia.org/wiki/List_of_UTC_time_offsets

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.