Mathematics, philosophy, code, travel and everything in between. More about me…

I write about

Faster date parsing in Python

I’ve been optimizing a complex real-time trading application written in Python. One of the many lessons learned was just how slow date/time parsing can be. strptime is a great universal function, but its versatility comes at a cost. Fortunately, there is another way.

When you know your date format in advance it is much more optimal to parse dates manually. For example, suppose most of your dates come in the format “20140920-17:59:47.133986”. You can write a custom parsing function that covers this special case and defaults to standard datetime.datetime.strptime for all other cases:

def fastStrptime(val: str, format: str) -> datetime.datetime:
    l = len(val)
    if format == '%Y%m%d-%H:%M:%S.%f' and (l == 21 or l == 24):
        us = int(val[18:24])
        # If only milliseconds are given we need to convert to microseconds.
        if l == 21:
            us *= 1000
        return datetime.datetime(
            int(val[0:4]), # %Y
            int(val[4:6]), # %m
            int(val[6:8]), # %d
            int(val[9:11]), # %H
            int(val[12:14]), # %M
            int(val[15:17]), # %s
            us, # %f

    # Default to the native strptime for other formats.
    return datetime.datetime.strptime(val, format)

A quick test shows that this special case parsing is more than 5 times faster than plain strptime (5.4 µs vs 28.2 µs):

In [1]: fmt = '%Y%m%d-%H:%M:%S.%f'

In [2]: d = '20140920-17:59:47.133'

In [3]: %timeit datetime.datetime.strptime(d, fmt)
10000 loops, best of 3: 28.2 µs per loop

In [4]: %timeit fastStrptime(d, fmt)
100000 loops, best of 3: 5.43 µs per loop

This really makes a difference when you’re parsing dates and times a lot. In my case — ingesting Level 2 (depth of market) data — it was a few hundred dates per second. Adding a custom strptime similar to the one above immediately resulted in a drop in CPU usage.

September 20, MMXIV — Python, Finance.