Code Monkey home page Code Monkey logo

dateparser's Introduction

Python parser for human readable dates

PyPI - Downloads PypI - Version Code Coverage Github - Build Readthedocs - Docs

Key FeaturesHow To UseInstallationCommon use casesYou may also like...License

Key Features

  • Support for almost every existing date format: absolute dates, relative dates ("two weeks ago" or "tomorrow"), timestamps, etc.
  • Support for more than 200 language locales.
  • Language autodetection
  • Customizable behavior through settings.
  • Support for non-Gregorian calendar systems.
  • Support for dates with timezones abbreviations or UTC offsets ("August 14, 2015 EST", "21 July 2013 10:15 pm +0500"...)
  • Search dates in longer texts.

Online demo

Do you want to try it out without installing any dependency? Now you can test it quickly by visiting this online demo!

How To Use

The most straightforward way to parse dates with dateparser is to use the dateparser.parse() function, that wraps around most of the functionality of the module.

>>> import dateparser

>>> dateparser.parse('Fri, 12 Dec 2014 10:55:50')
datetime.datetime(2014, 12, 12, 10, 55, 50)

>>> dateparser.parse('1991-05-17')
datetime.datetime(1991, 5, 17, 0, 0)

>>> dateparser.parse('In two months')  # today is 1st Aug 2020
datetime.datetime(2020, 10, 1, 11, 12, 27, 764201)

>>> dateparser.parse('1484823450')  # timestamp
datetime.datetime(2017, 1, 19, 10, 57, 30)

>>> dateparser.parse('January 12, 2012 10:00 PM EST')
datetime.datetime(2012, 1, 12, 22, 0, tzinfo=<StaticTzInfo 'EST'>)

As you can see, dateparser works with different date formats, but it can also be used directly with strings in different languages:

>>> dateparser.parse('Martes 21 de Octubre de 2014')  # Spanish (Tuesday 21 October 2014)
datetime.datetime(2014, 10, 21, 0, 0)

>>> dateparser.parse('Le 11 Décembre 2014 à 09:00')  # French (11 December 2014 at 09:00)
datetime.datetime(2014, 12, 11, 9, 0)

>>> dateparser.parse('13 января 2015 г. в 13:34')  # Russian (13 January 2015 at 13:34)
datetime.datetime(2015, 1, 13, 13, 34)

>>> dateparser.parse('1 เดือนตุลาคม 2005, 1:00 AM')  # Thai (1 October 2005, 1:00 AM)
datetime.datetime(2005, 10, 1, 1, 0)

>>> dateparser.parse('yaklaşık 23 saat önce')  # Turkish (23 hours ago), current time: 12:46
datetime.datetime(2019, 9, 7, 13, 46)

>>> dateparser.parse('2小时前')  # Chinese (2 hours ago), current time: 22:30
datetime.datetime(2018, 5, 31, 20, 30)

You can control multiple behaviors by using the settings parameter:

>>> dateparser.parse('2014-10-12', settings={'DATE_ORDER': 'YMD'})
datetime.datetime(2014, 10, 12, 0, 0)

>>> dateparser.parse('2014-10-12', settings={'DATE_ORDER': 'YDM'})
datetime.datetime(2014, 12, 10, 0, 0)

>>> dateparser.parse('1 year', settings={'PREFER_DATES_FROM': 'future'})  # Today is 2020-09-23
datetime.datetime(2021, 9, 23, 0, 0)

>>> dateparser.parse('tomorrow', settings={'RELATIVE_BASE': datetime.datetime(1992, 1, 1)})
datetime.datetime(1992, 1, 2, 0, 0)

To see more examples on how to use the settings, check the settings section in the docs.

False positives

dateparser will do its best to return a date, dealing with multiple formats and different locales. For that reason it is important that the input is a valid date, otherwise it could return false positives.

To reduce the possibility of receiving false positives, make sure that:

  • The input string it's a valid date and it doesn't contain any other words or numbers.
  • If you know the language or languages beforehand you add them through the languages or locales properties.

On the other hand, if you want to exclude any of the default parsers (timestamp, relative-time...) or change the order in which they are executed, you can do so through the settings PARSERS.

Installation

Dateparser supports Python >= 3.7. You can install it by doing:

$ pip install dateparser

If you want to use the jalali or hijri calendar, you need to install the calendars extra:

$ pip install dateparser[calendars]

Common use cases

dateparser can be used with a really different number of purposes, but it stands out when it comes to:

Consuming data from different sources:

  • Scraping: extract dates from different places with several different formats and languages
  • IoT: consuming data coming from different sources with different date formats
  • Tooling: consuming dates from different logs / sources
  • Format transformations: when transforming dates coming from different files (PDF, CSV, etc.) to other formats (database, etc).

Offering natural interaction with users:

  • Tooling and CLI: allow users to write “3 days ago” to retrieve information.
  • Search engine: allow people to search by date in an easiest / natural format.
  • Bots: allow users to interact with a bot easily

You may also like...

  • price-parser - A small library for extracting price and currency from raw text strings.
  • number-parser -Library to convert numbers written in the natural language to it's equivalent numeric forms.
  • Scrapy - Web crawling and web scraping framework

License

BSD3-Clause

dateparser's People

Contributors

allactaga avatar ammar-azif avatar arnavkapoor avatar asadurski avatar atharmohammad avatar csalazar avatar d10xa avatar eliasdorneles avatar elrull avatar eragnms avatar eszakharova avatar gallaecio avatar gavishpoddar avatar grestonian avatar horva avatar hristo-vrigazov avatar jbkahn avatar lopuhin avatar markbaas avatar noviluni avatar redapple avatar sardok avatar sarthakmadaan avatar scop avatar serhii73 avatar thomasst avatar tsrdatatech avatar waqasshabbir avatar watchful1 avatar wrar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dateparser's Issues

For date parsing, the time component is being cached between calls.

Maybe this is a "feature" but it smells more like a bug. Reporting in case, since its caused some headaches.
Using dateparser 0.3.1

Expected behavior:
the "time" part of the datetime object should be the current time when parsing a value with no time info like today.

Current behavior:
When parsing today, if you call it again at a later time the time is being cached in between. Even if calling with a different value like hoy (today in spanish).
This is a bit surprising, and I haven't tested what would happen the time rolls over to a different date but I assume that could be problematic also.

Python 2.7.10 (default, Sep 30 2015, 17:12:08)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dateparser, time
>>> dateparser.parse('today')
datetime.datetime(2015, 11, 11, 17, 28, 50, 234704)
>>> time.sleep(2)
>>> dateparser.parse('today')
datetime.datetime(2015, 11, 11, 17, 28, 50, 234704)
>>> time.sleep(2)
>>> dateparser.parse('hoy')
datetime.datetime(2015, 11, 11, 17, 28, 50, 234704)

Thanks! 🍻

Dateparser returning timezone aware datetime depending on existence of date_format arg (with any value)

Here are some self explanatory examples.

In [22]: ddp.get_date_data('2014-10-09T17:57:39+00:00')['date_obj']
Out[22]: datetime.datetime(2014, 10, 9, 17, 57, 39)

In [23]: ddp.get_date_data('2014-10-09T17:57:39+00:00', '')['date_obj']
Out[23]: datetime.datetime(2014, 10, 9, 17, 57, 39)

In [24]: ddp.get_date_data('2014-10-09T17:57:39+00:00', '%Y')['date_obj']
Out[24]: datetime.datetime(2014, 10, 9, 17, 57, 39, tzinfo=tzutc())

French `moins de 21s` not getting parsed.

Although, french dates like above in the subject seems to translate fine but are not getting parsed.

>>> parse('moins de 21s')
>>>
>>> language_loader._data['fr'].translate('moins de 21s')
'21 s'
>>> parse('21 s')
datetime.datetime(2015, 7, 13, 9, 19, 43, 484810)

Should replace date_range with rrule?

>>> start = datetime.strptime('2015-01-07 04:00:00', "%Y-%m-%d %H:%M:%S")
>>> end = datetime.strptime('2015-01-17 04:00:00', "%Y-%m-%d %H:%M:%S")
>>>
>>> for x in dateutil.rrule.rrule(DAILY, dtstart=start, until=end):
...     print x
... 
2015-01-07 04:00:00
2015-01-08 04:00:00
2015-01-09 04:00:00
2015-01-10 04:00:00
2015-01-11 04:00:00
2015-01-12 04:00:00
2015-01-13 04:00:00
2015-01-14 04:00:00
2015-01-15 04:00:00
2015-01-16 04:00:00
2015-01-17 04:00:00
>>>
>>> for x in dateparser.date.date_range(start, end, days=1):
...     print x
... 
2015-01-07 04:00:00
2015-01-08 04:00:00
2015-01-09 04:00:00
2015-01-10 04:00:00
2015-01-11 04:00:00
2015-01-12 04:00:00
2015-01-13 04:00:00
2015-01-14 04:00:00
2015-01-15 04:00:00
2015-01-16 04:00:00

Which is the right implementation?

Support for date strings with mixed languages

Old topic: "Date string like 'Marzo 2, 2015 at 8:56 pm' not being parsed"

In [7]: DateDataParser().get_date_data('Marzo 2, 2015 at 8:56 pm')
Out[7]: {'date_obj': None, 'period': 'day'}

Should have returned {'date_obj': datetime(2015, 3, 2, 20, 56), 'period': 'day'}

packaging issues

Hi,

I think there are several issues with setup.py:

  1. it imports from dateparser to get __version__. This is not good because if this import fails (e.g. because of missing dateutil dependency) installation will fail. This means dateutil in install_requires won't work if dateutil is not installed. It is better to either extract __version__ using a regex or even have it duplicated.
  2. setup.py tries to use distutils if setuptools is not available, but there are setuptools-specific options like include_package_data. If include_package_data is needed then dateparser won't work with distutils. It is not needed though. I think it is better to either remove distutils fallback or to make sure setup.py works with distutils. I'd also remove include_package_data.
  3. setup.py reads install_requires from requirements.txt file. I think this is a wrong approach: requirements.txt should specify package versions that are known to work, while install_requires should exclude version that are known not to work. I.e. using foo==1.0 in requirements.txt is good (because it ensures users will get a working build when they follow requirements.txt), but foo==1.0 in install_requires is bad because it prevents package from being used with an updated versions of a dependency, and it may cause an unintended package downgrade for the end user. The difference is that users can't opt out of install_requires, so we should be careful about what is put there; the less strict install_requires is the better.
  4. because of (3) wheel package is in install_requires. It is unnecesary: wheel package is not needed to install Python wheels, it is only needed to create wheels. wheel version is fixed, so by installing dateparser users could get their local wheel upgraded or downgraded, and they might need specific wheel versions for other software.

Missing date parts

Sometimes there are dates that don't have all the information to get the exact date. Like "December 21" or "Friday". We can assume either current year and week, or the one that is latest in the past.

To achieve that, instead of calling parse method in dateutil_parse function we would need to call for the _parse method of dateutil parser and then, when we have information on what parts are parsed (and depending on configuration), we either chose date that is current week/month/year or the last one seen.

Wrong date parsing when year changes

During scraping a website I encountered this issue:

Dec 14 11:00 is parsed as datetime.datetime(2015, 12, 14, 11, 0) whereas it was supposed to mean datetime.datetime(2014, 12, 14, 11, 0) because it was a post of 2014.

I think there should be parameter like only_allow_past_dates which should disable future date parsing and interpret it only as the date that has passed.

parsed time zone different for fixed and relative dates

It appears that dateparser.parse converts a fixed date and time to the local time zone and a relative date and time to UTC? I was wondering if it might be possible to add an option to make them both return the same, or to set the time zone in the returned datetime object? I would like to convert the parsed date and time to a Unix time stamp, but I don't know whether the user will enter a fixed or relative date and time. This is with version 0.2.1 on Ubuntu 14.04.2 LTS. Thank you!

Special parser to break date string and passing identifiable chunks to relevant sub parsers

As of now, we've mixed time parsing logic in FreshnessDateParser. This is kind of breaking SRP. Ideally, we'd like to have a special parser which would break string into multiple parts, directing them to relevant sub parsers -- eventually consolidating the separate results to return one datetime object.

It's open to suggestions. Above is just a recommendation.

Please add support for default UNIX "date" command format

Most Unix/Linux flavors use %a %b %e %T %Z %Y as the default date format. However dateparser does not support that format.

> date
Tue Oct 13 20:18:56 CDT 2015

> python
>>> import dateparser
>>> type(dateparser.parse('Tue Oct 13 20:18:56 CDT 2015'))
<class 'NoneType'>

Incorrect Portuguese translation of the 'second' keyword in languages.yaml

dateparser v0.3.0 on Ubuntu 14.04

The languages.yaml file has the incorrect English to Portuguese Translation of 'second.'
'segunda' is the plural adjective form of second as an ordered position where as 'segundo' is the desired term for the unit of time. The bug manifests itself as follows:

>>> parse(u'1 segundo atrás')
>>>
>>> parse(u'1 segunda atrás')
datetime.datetime(2015, 7, 13, 9, 19, 43, 484810)

Vietnamese month uncertainty (discussion)

We need to come up with generic solution for this.

Vietnamese language does not have names for months and simply use "Month One", "Month Two" etc.
Some sites use numeric form like "Month 1", "Month 2" etc. So when we translate tokens from Vietnamese for dates like "1 Year 1 Month 1 Day" it is not quite clear whether it is "1 year 1 month 1 day" or "1 year 1 January day".

Provide a way to check supported languages

Suggested on #6

The idea is to have something like dateparser.languages that would allow one to check support for a given language.

Right now, there are language specific code in both date_parser.py and freshness_date_parser.py.
Any thoughts on how we should do that?

Configuration

It looks that we need to parametrize parsing behavior. Instead of keep adding parameters to the parse function I suggest to create a Registry changeable with configure(key=value) function.
Example settings could be NO_DATES_FROM_FUTURE (to parse dates from web, where some times pieces from the created date is missing and we are assuming past) or SUPPORT_BEFORE_COMMON_ERA (to use custom datetime class inherited from datetime but supporting Astronomical year numbering)

We need support for 'Today 01:56 AM'

Currently it returns something like datetime.datetime(2014, 12, 9, 15, 17, 21, 562654) - date is correct here but time corresponds to import time because of last line in freshness_date_parser.py - we should consider changing this maybe by reinitializing freshness parser each time.

Support for mixed languages

Hi guys, dateparser isn't detecting dates like Diciembre 23, 2014 at 3:43 am, which is actually a mix between Spanish (Diciembre) and English (at). Which would be the best way to deal with it?

Better period extracting from dateutil parser

As we now have information of what date units (year, month, hour) were parsed exactly by dateutil parser, we now can guess date period with more precision. For example, if any of the day, hour, minute, second, microsecond units were parsed, then it is day period, else if month was parsed then period is month and at last year period for year unit. I am not sure if week period can be applicable to dates passed to dateutil parser.

This period should be passed all the way back to call from _DateLanguageParser.

Invalid date getting parsed

Hi guys, the string u'Wed, 30 Nov -0001 00:00:00 +0000' is getting parsed to datetime.datetime(2001, 11, 30, 2, 0) which is wrong.

File base configuration

Settings object here must be able to get settings file on initialization (string or already opened file) and defaults to data/settings.yaml if not set. All settings should be moved to this file instead of class attributes. Here is how it is done for LanguageDataLoader class.

There could be some additional notes for improvement of existing code when pull-request is ready.

Improper parsing of relative date with absolute time of exactly midnight

When parsing a relative date with an absolute time (e.g., 1 week ago at 12 am), the parser ignores the time portion if it is exactly midnight (00:00:00 or the equivalent). If the time portion is one minute later, it works properly. This is with Python 2.7.9 and dateparser 0.3.0. Thanks.

In [1]: import dateparser as dp

In [2]: dp.parse('1 week ago at 12:00 am')
Out[2]: datetime.datetime(2015, 8, 24, 18, 35, 6, 272800)

In [3]: dp.parse('1 week ago at 12:01 am')
Out[3]: datetime.datetime(2015, 8, 24, 0, 1)

Date strings with `year` in them are not parsing correctly

Dates like '19 February 2013 year 09:10' do not parse correctly.

DateDataParser().get_date_data('19 February 2013 year 09:10')

returns

datetime(2, 1, 8, 10, 30, 38, 116715)

while the correct date should be:

datetime(2013, 2, 19, 9, 10)

Extend language redetection to every subparser used.

For now language detection works only with subparser extending dateutil behavior. We should move it one level up to the main parser, so we can use same detected language for all approaches, including freshness subparser and formats. This way we can set default formats specific to some languages.

Upper limit for years and months

It looks like the years upper limit is 19 years and for months it's 12. Its quite common to have mentions like "25 years ago", "50 years ago" or "24 months ago" on webpages and dateparser returns None for them :P .

ddp.get_date_data('19 years ago')
{'date_obj': datetime.datetime(1995, 11, 25, 6, 17, 17, 980574), 'period': u'years'}

ddp.get_date_data('20 years ago')
{'date_obj': None, 'period': 'day'}

ddp.get_date_data('12 months ago')
{'date_obj': datetime.datetime(2013, 11, 25, 6, 17, 17, 980574), 'period': u'months'}

ddp.get_date_data('13 months ago')
{'date_obj': None, 'period': 'day'}

This is quite interesting. I can get past the 19 years barrier with these queries:

ddp.get_date_data('19 years 12 months ago')
{'date_obj': datetime.datetime(1994, 11, 25, 6, 17, 17, 980574), 'period': u'months'}

ddp.get_date_data('19 years 12 months 1000 weeks ago')
{'date_obj': datetime.datetime(1975, 9, 26, 6, 17, 17, 980574), 'period': u'weeks'}

But then its quite rare to have text like the above two examples on webpages.

Inconsistent return types between calendars.

HijriCalendar and JalaliCalendar implement BaseCalendar. But these two classes' get_date function's return type differs. This should be fixed by defining an interface for BaseCalender.

12 am/pm

According to this wiki page noon/midnight could be written in different ways. We should check if we parsing 12 noon/noon correctly and also add an option to choose how we should treat 12 am/pm.

Cannot parse foreign (i.e: arabic) dates in dateparser

DateParser seems to suffer from the same pains as Arrow in arrow-py/arrow#152... it seems unable to parse arabic dates:

In [1]: import dateparser

In [2]: from dateparser.date import DateDataParser

In [3]: ddp = DateDataParser()

In [4]: ddp.get_date_data('२०१४-०४-२८')
Out[4]: {'date_obj': None, 'period': 'day'}

In [5]: ddp.get_date_data('۱۳۹۳-۰۲-۱۰')
Out[5]: {'date_obj': None, 'period': 'day'}

support for words(noon, midnight, etc.) as time

Hi, It would be nice to add support for words as time, noon specially.

Here an example I got:

ERROR: Unknown date format u'Oct. 26, 2012 at noon' in http://www.realbuzz.com/forums/gear/?&items=30

Thanks.

'wheel' dependency

Hi,

requirements.txt states that project depends on wheel library, but there is no imports in the source that use that library. I think that means that this library is not actually required for dateparser to work.

Perhaps it could be removed from requirements.txt?

Please at least indicate python 3 compatibility.

I am looking for a Python 3 date parsing library, and I actually wound up trying to install this library before realizing it's only Py2k compatible.

If you aren't planning on supporting python 3, can you please at least note that (hopefully rather prominently) somewhere in the readme?

Iranian calendar

On Iranian sites often Iranian calendar is used instead of Gregorian. We need to add this support for Persian language. Leonid pushed some work with 5c8611fd52752e8890681fd403f02de633f7f20d commit in other repository. He is also did some research, so feel free to contact him on this matter.

Intended PyPI update?

Current PyPI release (0.1.0) is from November 2014. With the latest changes on here, I've found much better accuracy with hours, minutes, and seconds. Do you have an intended date for the next release? This is a really great library.

Add Support for CJK Languages

Hi,

I'm working on adding support for parsing CJK (Chinese, Japanese, Korean) languages and had a few questions.

I have defined a zh_parserinfo class and have been able to get a test case like this to work:

def test_zh_dates(self):
    date = DateParser(language='zh').parse(u'2014年10月4日', date_format='%Y年%m月%d日')   
    self.assertEqual(date.year, 2014)
    self.assertEqual(date.month, 10)
    self.assertEqual(date.day, 4)

However, this is not ideal. Dates are always written year-month-day, so it would be nice if parse handled it by default, and the above test case would pass without specifying this date_format. There are also two cases to support:

# 年 means year, 月 means month, 日 means day
# Dates are always written year first.
date = DateParser(language='zh').parse(u'2014年10月4日')   

# This is the same date, but it is written with Chinese numbers. 
# 二 = 2, 〇 = 0, 一 = 1, 四 =  4, 十 = 10, etc...
date = DateParser(language='zh').parse(u'二〇一四年十月四日')  # formal usage

Where would you suggest putting the code to handle the mapping of Chinese numbers to standard numbers? Any another suggestions for implementing this are appreciated. Thanks!

Edit: I'm not familiar with Arabic, but in #6 it looks like a similar mapping is needed. Something generic that could work for CJK languages, Arabic, and whatever other languages might need this would be best.

Offer a more direct way to parse a date

The most common use-case is by far just getting the date for a given date string, without really caring about language it is.

Right now, the way to do that has been:

>>> from dateparser.date import DateDataParser
>>> ddp = DateDataParser(allow_redetect_language=True)
>>> ddp.get_date_data(u'24 de Janeiro de 2014')['date_obj']
datetime.datetime(2014, 1, 24, 0, 0)
>>> ddp.get_date_data(u'January 1st 2014')['date_obj']
datetime.datetime(2014, 1, 1, 0, 0)

What I'd like to be able to do:

>>> import dateparser
>>> dateparser.parse_date(u'24 de Janeiro de 2014')
datetime.date(2014, 1, 24)
>>> dateparser.parse_date(u'January 1st 2014')
datetime.date(2014, 1, 24)
>>> dateparser.parse_datetime(u'24 de Janeiro de 2014, 13:23')
datetime.datetime(2014, 1, 24, 13, 23)

What do you think, folks?

Simple arithmetic for the words

We need to be able to transform token sequences like "seven hundred and sixty-five thousand, four hundred and thirty-two" to the "765432". There could be different handling of such tokens in different languages (for example Roman numerals deals with subtractions). So let's for now only focus on how English tokens transforming to numbers. Let's call this approach "general" (later we would define which approach should be used in languages.yaml file)
Initial idea is to iterate through the list of tokens, skipping tokens that are in skip, or [\W_]+. Each token should be present in dictionary (numbers section of the language).

So if number represented by current token is less then previous, we use addition, if it is greater than several of previous nearby numbers, than those smaller number are describing this bigger one and use multiplication. Be sure to use multiplication only with those preceding number that are 1) less then current 2) directly chained with current.

This approach should of course be properly tested.

"Feb 2011" parsing fails but Jan, Mar-Dec works

>>> from dateutil import parser
>>> parser.parse('Jan 2011', fuzzy=True)
datetime.datetime(2011, 1, 30, 0, 0)
>>> parser.parse('Feb 2011', fuzzy=True)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/umair/dev/venvs/nathanartz/local/lib/python2.7/site-packages/dateutil/parser.py", line 743, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/home/umair/dev/venvs/nathanartz/local/lib/python2.7/site-packages/dateutil/parser.py", line 310, in parse
    ret = default.replace(**repl)
ValueError: day is out of range for month
>>> parser.parse('Dec 2011', fuzzy=True)
datetime.datetime(2011, 12, 30, 0, 0)

Add support for timezone info in pytz

This means you don't have to duplicate that work or hardcode that information.

Such as

set(['PMDT',
'BAKT',
'CPT',
'KUYT',
'WAT',
'TKT',
'CHAST',
'NOVST',
'FJST',
'ALMST',
'SHEST',
'SCT',
'PDDT',
'BRST',
'VLAST',
'NPT',
'CVST',
'QYZT',
'PMMT',
'NEST',
'AQTST',
'LHDT',
'VUST',
'MDDT',
'CMT',
'SRET',
'zzz',
'HKST',
'LST',
'CHOST',
'EEST',
'MAGT',
'WGT',
'NFT',
'TJT',
'BEAUT',
'PLMT',
'SMT',
'RET',
'COST',
'FJT',
'BST',
'TASST',
'JST',
'UYST',
'TAST',
'MDT',
'VET',
'CLST',
'HST',
'FMT',
'TBIST',
'ORAST',
'SBT',
'PYST',
'MMT',
'LMT',
'YAKT',
'MART',
'EDT',
'MAWT',
'AHST',
'VOST',
'TOT',
'CAT',
'MADMT',
'VOLST',
'ROTT',
'ISST',
'PGT',
'KGST',
'CHOT',
'YEKST',
'YPT',
'AZOMT',
'FKST',
'FORT',
'NCT',
'PNT',
'WGST',
'ARST',
'KIZT',
'KWAT',
'SAMT',
'FNT',
'AKDT',
'LINT',
'EGT',
'DUST',
'WITA',
'JCST',
'NZDT',
'JWST',
'SHET',
'GBGT',
'PHST',
'UYT',
'HOVT',
'MALST',
'PYT',
'APT',
'PEST',
'WEMT',
'FRUST',
'KST',
'STAT',
'HDT',
'VLAT',
'YST',
'PKT',
'HMT',
'SJMT',
'MADT',
'CET',
'BMT',
'SAKST',
'ChST',
'AFT',
'CST',
'BTT',
'SST',
'AWDT',
'MUT',
'IRST',
'IST',
'SAST',
'SET',
'ORAT',
'RMT',
'AST',
'NUT',
'SWAT',
'ECT',
'AQTT',
'YERT',
'TLT',
'PDT',
'TOST',
'IMT',
'HAST',
'NOVT',
'YWT',
'AKST',
'GYT',
'CEST',
'BEAT',
'TBIT',
'WART',
'CWT',
'NEGT',
'TFT',
'FKT',
'PHOT',
'IHST',
'BDST',
'DDUT',
'EASST',
'NRT',
'URAT',
'BAKST',
'CKT',
'FRUT',
'MUST',
'AWT',
'PKST',
'AMST',
'SDMT',
'AHDT',
'BOST',
'BNT',
'WET',
'ADMT',
'NZST',
'ANAT',
'ADDT',
'CEMT',
'CANT',
'ALMT',
'CKHST',
'PHT',
'SVET',
'EMT',
'DMT',
'LHST',
'AZST',
'TRST',
'SAMST',
'GET',
'MALT',
'MHT',
'ASHST',
'MOT',
'ANT',
'TSAT',
'TBMT',
'GEST',
'PST',
'DAVT',
'TMT',
'COT',
'PET',
'AZOST',
'TAHT',
'VUT',
'KMT',
'IRKT',
'CAST',
'MAGST',
'KDT',
'GALT',
'OMST',
'KIZST',
'SRT',
'KOST',
'NDT',
'NMT',
'CDT',
'SAKT',
'DUSST',
'FNST',
'CVT',
'WAST',
'PPT',
'CGST',
'NST',
'UTC',
'MEST',
'VOLT',
'ACWST',
'CHADT',
'ULAT',
'IDDT',
'SDT',
'PWT',
'ART',
'HOVST',
'ULAST',
'MADST',
'GST',
'EPT',
'BORT',
'BOT',
'OMSST',
'XJT',
'URAST',
'PPMT',
'AWST',
'YERST',
'UYHST',
'IOT',
'MYT',
'HKT',
'SVEST',
'YDT',
'PMST',
'CAWT',
'WSDT',
'WMT',
'ACWDT',
'KRAT',
'ACDT',
'UZST',
'AKTT',
'IRKST',
'MDST',
'MWT',
'EET',
'BURT',
'EST',
'JDT',
'LKT',
'NWT',
'WSST',
'JMT',
'EGST',
'CDDT',
'AMT',
'CHDT',
'CAPT',
'BDT',
'MIST',
'TRT',
'EWT',
'BORTST',
'YDDT',
'MPT',
'LRT',
'HADT',
'GAMT',
'KUYST',
'IDT',
'IRDT',
'AEDT',
'YAKST',
'ACT',
'NET',
'PMT',
'NZMT',
'QMT',
'ANAST',
'YEKT',
'NDDT',
'EAST',
'CGT',
'EDDT',
'ADT',
'CUT',
'FET',
'GHST',
'SYOT',
'GMT',
'EHDT',
'WIB',
'BRT',
'QYZST',
'MET',
'WIT',
'AKTST',
'KRAST',
'KART',
'MST',
'MSM',
'AEST',
'MSK',
'GFT',
'MVT',
'MSD',
'AZT',
'ACST',
'SGT',
'CLT',
'PETT',
'UZT',
'DACT',
'EAT',
'FFMT',
'PETST',
'WARST',
'MOST',
'AZOT',
'ICT',
'KGT',
'NCST',
'WEST',
'JAVT',
'ASHT']))

Uniform unit tests

It seems that we have plenty of methodologies used in unit tests.
Some of them are:

    date = DateParser(language='cz').parse('pon 16. čer 2014 10:07:43')
    self.assertEqual(date.year, 2014)
    self.assertEqual(date.month, 6)
    self.assertEqual(date.day, 16)
    self.assertEqual(date.hour, 10)
    self.assertEqual(date.minute, 07)
    self.assertEqual(date.second, 43)
   parser = DateParser()
    date_fixtures = [
        ('13 iunie 2013', datetime(2013, 6, 13)),
        ('14 aprilie 2014', datetime(2014, 4, 14)),
        ('18 martie 2012', datetime(2012, 3, 18)),
    ]

    for dt_string, correct_date in date_fixtures:
        parsed = parser.parse(dt_string)
        self.assertEquals(correct_date.date(), parsed.date())
    @parameterized.expand([
    param('Sep 03 2014 | 4:32 pm EDT', datetime(2014, 9, 3, 21, 32)),
    param('17th October, 2034 @ 01:08 am PDT', datetime(2034, 10, 17, 9, 8)),
    param('15 May 2004 23:24 EDT', datetime(2004, 5, 16, 4, 24)),
    param('15 May 2004', datetime(2004, 5, 15, 0, 0)),
    param('Nov 25 2014 10:17 pm EST', datetime(2014, 11, 26, 4, 17)),
    ])
   date = DateParser(language='pl').parse('Środa, 26 listopada 2014 10:11:12')
   self.assertEqual(date.timetuple()[:6], (2014, 11, 26, 10, 11, 12))

Maybe we could create simple method/function for asserting correct date? Eg.

    date = DateParser(language='en').parse('Tue, 25 Dec, 2012 12:00')
    self.assertDate(date, 2012, 12, 25, 12, 0)

Depending of number of params given to assertDate we check date with given resolution

Dates detection

Current implementation of language detection works in a way "When we ask for next language, that means that previous language did not work on the given date and should be dropped" We need to change this behavior to only modify working set of languages if at least one language was applicable. If possible, we should also find a way to make language detection more clear for the reader.
This need to be done without significant increase of code complexity.
Current test case is:

In [1]: from dateparser import DateDataParser
In [2]: parser = DateDataParser()
In [3]: parser.get_date_data(u'01-01-15 06:47 AM')
Out[3]: {'date_obj': datetime.datetime(2015, 1, 1, 6, 47), 'period': 'day'}
In [4]: parser.get_date_data(u'foo')
Out[4]: {'date_obj': None, 'period': 'day'}
In [5]: parser.get_date_data(u'01-01-15 06:47 AM')
Out[5]: {'date_obj': None, 'period': 'day'}

Stop parsing invalid dates

We should not really parse dates like this

>>> from dateparser import parse
>>> parse("2015-03-17T16:37:51+00:002015-03-17T15:24:37+00:002015-03-17T15:02:08+00:002015-03-17T13:09:31+00:002015-03-17T11:34:21+00:002015-03-16T17:49:15+00:002015-03-16T17:33:30+00:002015-03-16T16:49:46+00:002015-03-16T15:50:57+00:002015-03-16T13:26:50+00:00 ")
datetime.datetime(2015, 3, 17, 13, 26, 50)

Skipped tests should be converted

During the migration to the declarative languages approach we marked some tests as skipped, because the nature of code in those places almost completely changed. Now, when the behavior of working with languages is more likely finalized, those skipped tests should be rewritten to test that new code is still working for input from old cases. Tests should be designed in a way described here.

"in 5 min" returns None

I just downloaded this module and it's fantastic. Thank you for it.

I noticed that "5 min ago" works. "in 5 min" returns None.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.