Faster date parsing in Python
I’ve been optimizing a complex real-time trading application written in Python. One of the many lessons learned was just how slow date/time parsing can be.
strptime is a great universal function, but its versatility comes at a cost. Fortunately, there is another way.
When you know your date format in advance it is much more optimal to parse dates manually. For example, suppose most of your dates come in the format “20140920-17:59:47.133986”. You can write a custom parsing function that covers this special case and defaults to standard
datetime.datetime.strptime for all other cases:
def fastStrptime(val: str, format: str) -> datetime.datetime: l = len(val) if format == '%Y%m%d-%H:%M:%S.%f' and (l == 21 or l == 24): us = int(val[18:24]) # If only milliseconds are given we need to convert to microseconds. if l == 21: us *= 1000 return datetime.datetime( int(val[0:4]), # %Y int(val[4:6]), # %m int(val[6:8]), # %d int(val[9:11]), # %H int(val[12:14]), # %M int(val[15:17]), # %s us, # %f ) # Default to the native strptime for other formats. return datetime.datetime.strptime(val, format)
A quick test shows that this special case parsing is more than 5 times faster than plain
(5.4 µs vs 28.2 µs):
In : fmt = '%Y%m%d-%H:%M:%S.%f' In : d = '20140920-17:59:47.133' In : %timeit datetime.datetime.strptime(d, fmt) 10000 loops, best of 3: 28.2 µs per loop In : %timeit fastStrptime(d, fmt) 100000 loops, best of 3: 5.43 µs per loop
This really makes a difference when you’re parsing dates and times a lot. In my case — ingesting Level 2 (depth of market) data — it was a few hundred dates per second. Adding a custom
strptime similar to the one above immediately resulted in a drop in CPU usage.