python - How is timeit affected by the length of a list literal? -
update: apparently i'm timing speed python can read list. doesn't change question, though.
so, read this post other day , wanted compare speeds looked like. i'm new pandas time see opportunity moderately interesting, jump on it. anyway, tested out 100 numbers, thinking sufficient satisfy itch play pandas. graph looked like:
notice there 3 different runs. these runs run in sequential order, had spike @ same 2 spots. spots approximately 28 , 64. initial thought had bytes, 4. maybe first byte contains additional information being list, , next byte data , every 4 bytes after causes spike in speed, kinda made sense. needed test more numbers. created dataframe of 3 sets of arrays, each 1000 lists ranging in length 0-999. timed them in same manner, is:
run 1: 0, 1, 2, 3, ... run 2: 0, 1, 2, 3, ... run 3: 0, 1, 2, 3, ...
what expected see dramatic increase approximately every 32 items in array, instead there's no recurrence pattern(i did zoom in , spikes):
however, you'll notice, vary lot between numbers 400 , 682. oddly, 1 run spike in same place making pattern harder distinguish in 28 , 64 points in graph. green line on place really. shameful.
question: what's happening @ initial 2 spikes , why "fuzzy" on graph between 400 , 682? finished running test on 0-99 sets time did simple addition each item in array , result linear, think has strings.
i tested other methods first, , got same results, graph messed because joined results wrong, ran again overnight(this took long time) using code make sure times correctly aligned indexes , runs performed in correct order:
import statistics s import timeit df = pd.dataframe([[('run_%s' % str(x + 1)), r, np.random.choice(100, r).tolist()] r in range(0, 1000) x in range(3)], columns=['run', 'length', 'array']).sort_values(['run', 'length']) df['time'] = df.array.apply(lambda x: s.mean(timeit.repeat(str(x)))) # graph ax = df.groupby(['run', 'length']).mean().unstack('run').plot(y='time') ax.set_ylabel('time [ns]') ax.set_xlabel('array length') ax.legend(loc=3)
i have dataframe pickled if you'd see raw data.
you severely overcomplicating things using pandas
, .apply
here. there no need - inefficient. vanilla python way:
in [3]: import timeit in [4]: setup = "l = list(range({}))" in [5]: test = "str(l)"
note, timeit
functions take number
parameter, number of times run. defaults 1000000
, let's make more reasonable, using number=100
, don't have wait around forever...
in [8]: data = [timeit.repeat(test, setup.format(n), number=100) n in range(0, 10001, 100)] in [9]: import statistics in [10]: mean_data = list(map(statistics.mean, data))
visual inspection of results:
in [11]: mean_data out[11]: [3.977467228348056e-05, 0.0012597616684312622, 0.002014552320664128, 0.002637979011827459, 0.0034494600258767605, 0.0046060653403401375, 0.006786816345993429, 0.006134035007562488, 0.006666974319765965, 0.0073876206879504025, 0.008359026357841989, 0.008946725012113651, 0.01020014965130637, 0.0110439983351777, 0.012085124345806738, 0.013095536657298604, 0.013812023680657148, 0.014505649354153624, 0.015109792332320163, 0.01541508767210568, 0.018623976677190512, 0.018014412683745224, 0.01837641668195526, 0.01806374565542986, 0.01866597666715582, 0.021138361655175686, 0.020885809014240902, 0.023644315680333722, 0.022424093661053728, 0.024507874331902713, 0.026360396664434422, 0.02618172235088423, 0.02721496132047226, 0.026609957004742075, 0.027632603014353663, 0.029077719994044553, 0.030218352350251127, 0.03213361800105, 0.0321545610204339, 0.032791375007946044, 0.033749551337677985, 0.03418213398739075, 0.03482868466138219, 0.03569800598779693, 0.035460735321976244, 0.03980560234049335, 0.0375820419867523, 0.03880414469555641, 0.03926491799453894, 0.04079093333954612, 0.0420664346893318, 0.044861480011604726, 0.045125720323994756, 0.04562378901755437, 0.04398221097653732, 0.04668888701902082, 0.04841196699999273, 0.047662509993339576, 0.047592316346708685, 0.05009777001881351, 0.04870589632385721, 0.0532167866670837, 0.05079756366709868, 0.05264475334358091, 0.05531930166762322, 0.05283398299555605, 0.055121281009633094, 0.056162080339466534, 0.05814277834724635, 0.05694748067374652, 0.05985202432687705, 0.05949359833418081, 0.05837553597909088, 0.05975819365509475, 0.06247356999665499, 0.061310798317814864, 0.06292542165222888, 0.06698586166991542, 0.06634997764679913, 0.06443380867131054, 0.06923895300133154, 0.06685209332499653, 0.06864909763680771, 0.06959929631557316, 0.06832000267847131, 0.07180017333788176, 0.07092387134131665, 0.07280202202188472, 0.07342300032420705, 0.0745120863430202, 0.07483605532130848, 0.0734497313387692, 0.0763389469939284, 0.07811927401538317, 0.07915793966579561, 0.08072184936221068, 0.08046915601395692, 0.08565403800457716, 0.08061318534115951, 0.08411134833780427, 0.0865995019945937]
this looks pretty darn linear me. now, pandas
is handy way graph things, if want convenient wrapper around matplotlib
's api:
in [14]: import pandas pd in [15]: df = pd.dataframe({'time': mean_data, 'n':list(range(0, 10001, 100))}) in [16]: df.plot(x='n', y='time') out[16]: <matplotlib.axes._subplots.axessubplot @ 0x1102a4a58>
and here result:
this should on right track time you've been trying time. wound timing, explained in comments:
you timing result of
str(x)
results in list-literal, timing interpretation of list literals, not conversion oflist
->str
i can speculate patterns seeing result of that, interpreter/hardware dependent. here findings on machine:
in [18]: data = [timeit.repeat("{}".format(str(list(range(n)))), number=100) n in range(0, 10001, 100)]
and using range isn't large:
in [23]: data = [timeit.repeat("{}".format(str(list(range(n)))), number=10000) n in range(0, 101)]
which guess sort of looks yours. perhaps better suited it's own question, though.
Comments
Post a Comment