python - How is timeit affected by the length of a list literal? -


update: apparently i'm timing speed python can read list. doesn't change question, though.

so, read this post other day , wanted compare speeds looked like. i'm new pandas time see opportunity moderately interesting, jump on it. anyway, tested out 100 numbers, thinking sufficient satisfy itch play pandas. graph looked like:

list string conversion 0-99

notice there 3 different runs. these runs run in sequential order, had spike @ same 2 spots. spots approximately 28 , 64. initial thought had bytes, 4. maybe first byte contains additional information being list, , next byte data , every 4 bytes after causes spike in speed, kinda made sense. needed test more numbers. created dataframe of 3 sets of arrays, each 1000 lists ranging in length 0-999. timed them in same manner, is:

run 1: 0, 1, 2, 3, ... run 2: 0, 1, 2, 3, ... run 3: 0, 1, 2, 3, ... 

what expected see dramatic increase approximately every 32 items in array, instead there's no recurrence pattern(i did zoom in , spikes):

list string conversion 0-999

however, you'll notice, vary lot between numbers 400 , 682. oddly, 1 run spike in same place making pattern harder distinguish in 28 , 64 points in graph. green line on place really. shameful.

question: what's happening @ initial 2 spikes , why "fuzzy" on graph between 400 , 682? finished running test on 0-99 sets time did simple addition each item in array , result linear, think has strings.

i tested other methods first, , got same results, graph messed because joined results wrong, ran again overnight(this took long time) using code make sure times correctly aligned indexes , runs performed in correct order:

import statistics s import timeit df = pd.dataframe([[('run_%s' % str(x + 1)), r, np.random.choice(100, r).tolist()]                     r in range(0, 1000) x in range(3)],                    columns=['run', 'length', 'array']).sort_values(['run', 'length']) df['time'] = df.array.apply(lambda x: s.mean(timeit.repeat(str(x))))  # graph ax = df.groupby(['run', 'length']).mean().unstack('run').plot(y='time') ax.set_ylabel('time [ns]') ax.set_xlabel('array length') ax.legend(loc=3) 

i have dataframe pickled if you'd see raw data.

you severely overcomplicating things using pandas , .apply here. there no need - inefficient. vanilla python way:

in [3]: import timeit  in [4]: setup = "l = list(range({}))"  in [5]: test = "str(l)" 

note, timeit functions take number parameter, number of times run. defaults 1000000, let's make more reasonable, using number=100, don't have wait around forever...

in [8]: data = [timeit.repeat(test, setup.format(n), number=100) n in range(0, 10001, 100)]  in [9]: import statistics  in [10]: mean_data = list(map(statistics.mean, data)) 

visual inspection of results:

in [11]: mean_data out[11]: [3.977467228348056e-05,  0.0012597616684312622,  0.002014552320664128,  0.002637979011827459,  0.0034494600258767605,  0.0046060653403401375,  0.006786816345993429,  0.006134035007562488,  0.006666974319765965,  0.0073876206879504025,  0.008359026357841989,  0.008946725012113651,  0.01020014965130637,  0.0110439983351777,  0.012085124345806738,  0.013095536657298604,  0.013812023680657148,  0.014505649354153624,  0.015109792332320163,  0.01541508767210568,  0.018623976677190512,  0.018014412683745224,  0.01837641668195526,  0.01806374565542986,  0.01866597666715582,  0.021138361655175686,  0.020885809014240902,  0.023644315680333722,  0.022424093661053728,  0.024507874331902713,  0.026360396664434422,  0.02618172235088423,  0.02721496132047226,  0.026609957004742075,  0.027632603014353663,  0.029077719994044553,  0.030218352350251127,  0.03213361800105,  0.0321545610204339,  0.032791375007946044,  0.033749551337677985,  0.03418213398739075,  0.03482868466138219,  0.03569800598779693,  0.035460735321976244,  0.03980560234049335,  0.0375820419867523,  0.03880414469555641,  0.03926491799453894,  0.04079093333954612,  0.0420664346893318,  0.044861480011604726,  0.045125720323994756,  0.04562378901755437,  0.04398221097653732,  0.04668888701902082,  0.04841196699999273,  0.047662509993339576,  0.047592316346708685,  0.05009777001881351,  0.04870589632385721,  0.0532167866670837,  0.05079756366709868,  0.05264475334358091,  0.05531930166762322,  0.05283398299555605,  0.055121281009633094,  0.056162080339466534,  0.05814277834724635,  0.05694748067374652,  0.05985202432687705,  0.05949359833418081,  0.05837553597909088,  0.05975819365509475,  0.06247356999665499,  0.061310798317814864,  0.06292542165222888,  0.06698586166991542,  0.06634997764679913,  0.06443380867131054,  0.06923895300133154,  0.06685209332499653,  0.06864909763680771,  0.06959929631557316,  0.06832000267847131,  0.07180017333788176,  0.07092387134131665,  0.07280202202188472,  0.07342300032420705,  0.0745120863430202,  0.07483605532130848,  0.0734497313387692,  0.0763389469939284,  0.07811927401538317,  0.07915793966579561,  0.08072184936221068,  0.08046915601395692,  0.08565403800457716,  0.08061318534115951,  0.08411134833780427,  0.0865995019945937] 

this looks pretty darn linear me. now, pandas is handy way graph things, if want convenient wrapper around matplotlib's api:

in [14]: import pandas pd  in [15]: df = pd.dataframe({'time': mean_data, 'n':list(range(0, 10001, 100))})  in [16]: df.plot(x='n', y='time') out[16]: <matplotlib.axes._subplots.axessubplot @ 0x1102a4a58> 

and here result:

enter image description here

this should on right track time you've been trying time. wound timing, explained in comments:

you timing result of str(x) results in list-literal, timing interpretation of list literals, not conversion of list->str

i can speculate patterns seeing result of that, interpreter/hardware dependent. here findings on machine:

in [18]: data = [timeit.repeat("{}".format(str(list(range(n)))), number=100) n in range(0, 10001, 100)] 

enter image description here

and using range isn't large:

in [23]: data = [timeit.repeat("{}".format(str(list(range(n)))), number=10000) n in range(0, 101)] 

and results: enter image description here

which guess sort of looks yours. perhaps better suited it's own question, though.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -