Code Monkey home page Code Monkey logo

date-parser-tutorial's Introduction

A Complete Date Parser Tutorial

This article covers everything you need to know about parsing dates in Python. By the end of this article, you will be able to extract human-readable strings from dates and parse strings into dates.

Table of Contents

Introduction

Python comes with a very powerful module for handing dates—datetime. This module ships with a class that also has the same name—datetime.

This class is very powerful and it can handle aware and naïve objects. Aware objects are the ones that have time zone information attribute, tzinfo. While naïve objects are open to interpretation because there is no time zone information.

Quite often, there will be a need to represent the datetime object as string. On the other hand, there will be a need to parse datetime objects from string objects.

Let's explore extracting string from datetime objects first.

Parsing Dates Into String

The datetime class has a useful method - today(). This method returns a new datetime object representing the current time.

>>> from datetime import datetime
>>> published_date = datetime.today()
>>> published_date
datetime.datetime(2021, 9, 17, 17, 21, 2, 397259)
>>> print(today)
2021-09-17 17:21:02.397259

As evident here, published_date is an object of datetime. Printing this on the console prints out the year, month, date, hours, minutes, seconds, and milliseconds in one long string.

Quick Conversion to String Using Attributes

This object has some useful attributes which can help in getting the day, month, year, etc. Here are few examples.

>>> published_date.year
2021
>>> published_date.month
5
>>> published_date.day
20

Similarly, there are other useful attributes like hour, minute, second.

>>> published_date.hour
17
>>> published_date.minute
21
>>> published_date.second
2

These attributes may be good enough for few cases.

Typically, a more refined control is needed.

Extract String With a Finer Control

The datetime.datetime object can be easily converted to strings in any desired format, thanks to the strftime function.

Here is one example:

>>> published_date.strftime('%m/%d/%Y')
'09/17/2021'

Note that Y in this example is an uppercase Y. A lowercase y prints the year in a two-digit format.

>>> published_date.strftime('%m/%d/%y')
'09/17/21'

For month, %B will extract the Month as full name, and %b will extract the abbreviated month name:

>>> published_date.strftime('%b %d,%Y')
'Sep 17,2021'
>>> published_date.strftime('%B %d,%Y')
'September 17,2021'

The only difficult aspect of strftime function is that remembering all these format codes is not easy.

Format Codes Quick Reference

Here is a quick reference of the commonly used codes. For the full list, see official documentation.

Note: Every number is zero-padded by default.

Directive Meaning Example
%d Day of the month. 17
%b Month as abbreviated name. Sep
%B Month as full name. September
%m Month as number. 09
%y Year without century. 21
%Y Year with century. 2021
%H Hour (24-hour clock). 17
%I Hour (12 hour clock). 05
%M Minute. 21
%S Second. 02

Parsing Dates From String

Often, there is a need to convert a string representation of a date to a datetime object. The source of these strings can be anything such as user input, files, scraped data, etc.

Any string that represents date and/or time can be converted into datetime objects by using Python's strpdate function. This function uses the same format codes that are used by strtftime.

The Let's look at an example:

>>> date_mdy = '09/17/21'
>>> type(date_mdy)
<class 'str'>
>>> datetime.strptime(date_mdy,'%m/%d/%y')
datetime.datetime(2021, 9, 17, 0, 0)

This function takes two arguments: the first is the string that represents the date. The second parameter is the format. If this function cannot parse the string into datetime using the format supplied, it will raise a ValueError.

The format string can be customized to convert a variety of scenarios.

>>> date_Bdy ='September 17,2021'
>>> datetime.strptime(date_Bdy,'%B %d,%Y')
datetime.datetime(2021, 9, 17, 0, 0)

Note that even one extra space will result in failed parsing.

>>> date_Bdy ='September 17,2021' # No space before year
>>> datetime.strptime(date_Bdy,'%B %d, %Y') # Space before year
 ...  
ValueError: time data 'September 17,2021' does not match format '%B %d, %Y'

Moreover, for every format, the format string has to be carefully constructed. This involves rereferring the official documentation to find the format codes, taking care of all the edge cases. Handling relative dates like "tomorrow" are even more difficult. Other problems are handling time zones and various locales.

All these complexities can be reduced with the help of Date Parser package.

Date Parser Package

Date Parser is a package that aims to make parsing dates from string much easier, without going into the complexities required by strptime function, while still being powerful.

Date Parser package can be installed easily from the terminal.

pip install dateparser

If you are using Anaconda, it is available on the default channel as well as conda-forge.

conda install dateparser

The next step is to explore the basic usage of Date Parser.

Basic Usage of Date Parser

The most common way of working with date parser is to use the parse function. The only mandatory argument for this function is the string that represents the date.

For most of the commonly used date formats, simply supplying the date string to the parse string works.

>>> from dateparser import parse
>>> parse('09/17/21')
datetime.datetime(2021, 9, 17, 0, 0)

This method takes away the complication of building the format string and create the datetime object.

The parse method is forgiving about the separator and space. Here are few more examples of how easily date parser takes care of parsing without worrying about format strings.

>>> parse('09/17/21')
datetime.datetime(2021, 9, 17, 0, 0)
>>> parse('09-17-21')
datetime.datetime(2021, 9, 17, 0, 0)
>>> parse('09 17 21')
datetime.datetime(2021, 9, 17, 0, 0)
>>> parse('09~17~21')
datetime.datetime(2021, 9, 17, 0, 0)

Note that, unlike datetime.strptime function, date parser's parse method does not raise ValueError if it is unable to convert to a datetime object.

>>> print(parse('091721'))
None

Apart from the numeric date formats, date parser can handle month names.

>>> parse('September 17,2021')
datetime.datetime(2021, 9, 17, 0, 0)
>>> parse('September17,2021') # No space anywhere
datetime.datetime(2021, 9, 17, 0, 0)

Apart from the date, date parser can also parse strings that contain time values. For example, lets take the string '2021-09-17 17:21:02.397259'. This contains year, date, month, hours, minutes, seconds and microseconds. The parse method handles it perfectly:

>>> parse('2021-09-17 17:21:02.397259')
datetime.datetime(2021, 9, 17, 17, 21, 2, 397259)

Here are few more examples:

>>> dateparser.parse('Wed, 17 May 2021 17:21:02')
datetime.datetime(2021, 5, 17, 17, 21, 2)

This takes care of the most common date formats. However, often there are cases when this is not sufficient. For example, the date string '09/07/21' is ambiguous. The month can be 09 or 07, based on the source of the data. There are few ways to handle this.

Parsing Ambiguous Dates with Date Parser

The parse method has many optional parameters. One of the parameter is date_formats. Note that this parameter expects a list of format strings. This will contains a list of possible formats that date parser can use to detect the date.

>>> parse('09/07/21')
datetime.datetime(2021, 9, 7, 0, 0)
>>> parse('09/07/21', date_formats=['%d/%m/%y']) # Month is 07
datetime.datetime(2021, 7, 9, 0, 0)

The other way is to provide the locale. For example, for locale en-US month is written first, while for locale en-GB, the day is written first. The locale information can be supplied using the optional parameter locales, which is again a list of possible locales.

>>> parse('09/07/21')
datetime.datetime(2021, 9, 7, 0, 0)
>>> parse('09/07/21', locales=['en-GB'])
datetime.datetime(2021, 7, 9, 0, 0)

Another option is to use the settings parameter. This parameter expects a dictionary. Here is an example to handle this date:

>>> parse('09/07/21', settings={'DATE_ORDER': 'MDY'}) # Month first
datetime.datetime(2021, 9, 7, 0, 0)
>>> parse('09/07/21', settings={'DATE_ORDER': 'DMY'}) # Day first
datetime.datetime(2021, 7, 9, 0, 0)

Parsing Relative Dates With Date Parser

Strings like today, tomorrow, yesterday can be parsed directly with Date Parser. More impressively, relative dates like 3 days ago, in two days work as well. Be careful with this as this works only with specific phrases. For example, in two days can be parsed work but aft two days does not work.

>>> parse('today')
datetime.datetime(2021, 9, 17, 10, 54, 17, 197078)
>>> parse('tomorrow')
datetime.datetime(2021, 9, 18, 10, 54, 23, 906776)
>>> parse('yesterday')
datetime.datetime(2021, 9, 16, 10, 54, 31, 875047)
>>> parse('3 days ago')
datetime.datetime(2021, 9, 14, 10, 54, 43, 614972)
>>> parse('in 3 days')
datetime.datetime(2021, 9, 20, 10, 54, 57, 176714)
>>> parse('after 3 days') # Can not be parsed

Handling Time Zones and Languages

Date Parser can handle time zone information in various ways. The first and easiest way is to supply the time zone abbreviation in the parse string directly. Three-letter abbreviations like EST and UTC offsets like -0500 both work.

>>> parse('09/07/21 9:30pm EST')
datetime.datetime(2021, 9, 7, 21, 30, tzinfo=<StaticTzInfo 'EST'>)
>>> parse('09/07/21 9:30pm -0500')
datetime.datetime(2021, 9, 7, 21, 30, tzinfo=<StaticTzInfo 'UTC\-05:00'>)

Another way of attaching time zone information is to set it in settings parameter. Note that this would need both TIMEZONE and RETURN_AS_TIMEZONE_AWARE to be supplied.

>>> parse('09/07/21 9:30pm', settings={'TIMEZONE': 'EST','RETURN_AS_TIMEZONE_AWARE': True})
datetime.datetime(2021, 9, 7, 21, 30, tzinfo=<StaticTzInfo 'EST'>)

Apart from time zones, Date Parser can handle over 200 locales.

>>> parse('27 सितंबर, 2021') # Hindi
datetime.datetime(2021, 9, 27, 0, 0)
>>> parse('17 сентября 2021') # Russian
datetime.datetime(2021, 9, 27, 0, 0)
>>> parse('2021年9月17日') # Simplified Chinese
datetime.datetime(2021, 9, 27, 0, 0)

date-parser-tutorial's People

Contributors

gabijafatenaite avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.