Subtracting a data column from another yields a timedelta column with a very fine grained resolution. If all you need is the difference in days, the first instinct is to use dt.days
npr['diff_since_last'].dt.days
But this was quite slow, and it turns out that the following does the same and it is a lot quicker:
npr['diff_since_last']/np.timedelta64(1, 'D')
260 times faster!
It is a bit courious since one would think dt.days could use division, but there is probably a good reason for it. Stil, in many cases it seems division is all you need and with repeated use in large dataframes it becomes almost a requirement.
Here are some stats (dataframe with 1.6 million observations):
% timeit a= npr['diff_since_last']/np.timedelta64(1, 'D')
36.5 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit a=npr['diff_since_last'].dt.days
9.54 s ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
len(npr)
Out[69]: 1632061
Subscribe to:
Post Comments (Atom)
Difference in days between two date columns: Dividing by np.timedelta64(1, 'D') is way faster than dt.days
Subtracting a data column from another yields a timedelta column with a very fine grained resolution. If all you need is the difference in d...
-
We would like to have a query language that can easily express and answer the following: - How many individuals received pharmaceutical A ...
-
Subtracting a data column from another yields a timedelta column with a very fine grained resolution. If all you need is the difference in d...
No comments:
Post a Comment