python - How is timeit affected by the length of a list literal? -


update: apparently i'm timing speed python can read list. doesn't change question, though.

so, read this post other day , wanted compare speeds looked like. i'm new pandas time see opportunity moderately interesting, jump on it. anyway, tested out 100 numbers, thinking sufficient satisfy itch play pandas. graph looked like:

list string conversion 0-99

notice there 3 different runs. these runs run in sequential order, had spike @ same 2 spots. spots approximately 28 , 64. initial thought had bytes, 4. maybe first byte contains additional information being list, , next byte data , every 4 bytes after causes spike in speed, kinda made sense. needed test more numbers. created dataframe of 3 sets of arrays, each 1000 lists ranging in length 0-999. timed them in same manner, is:

run 1: 0, 1, 2, 3, ... run 2: 0, 1, 2, 3, ... run 3: 0, 1, 2, 3, ... 

what expected see dramatic increase approximately every 32 items in array, instead there's no recurrence pattern(i did zoom in , spikes):

list string conversion 0-999

however, you'll notice, vary lot between numbers 400 , 682. oddly, 1 run spike in same place making pattern harder distinguish in 28 , 64 points in graph. green line on place really. shameful.

question: what's happening @ initial 2 spikes , why "fuzzy" on graph between 400 , 682? finished running test on 0-99 sets time did simple addition each item in array , result linear, think has strings.

i tested other methods first, , got same results, graph messed because joined results wrong, ran again overnight(this took long time) using code make sure times correctly aligned indexes , runs performed in correct order:

import statistics s import timeit df = pd.dataframe([[('run_%s' % str(x + 1)), r, np.random.choice(100, r).tolist()]                     r in range(0, 1000) x in range(3)],                    columns=['run', 'length', 'array']).sort_values(['run', 'length']) df['time'] = df.array.apply(lambda x: s.mean(timeit.repeat(str(x))))  # graph ax = df.groupby(['run', 'length']).mean().unstack('run').plot(y='time') ax.set_ylabel('time [ns]') ax.set_xlabel('array length') ax.legend(loc=3) 

i have dataframe pickled if you'd see raw data.

you severely overcomplicating things using pandas , .apply here. there no need - inefficient. vanilla python way:

in [3]: import timeit  in [4]: setup = "l = list(range({}))"  in [5]: test = "str(l)" 

note, timeit functions take number parameter, number of times run. defaults 1000000, let's make more reasonable, using number=100, don't have wait around forever...

in [8]: data = [timeit.repeat(test, setup.format(n), number=100) n in range(0, 10001, 100)]  in [9]: import statistics  in [10]: mean_data = list(map(statistics.mean, data)) 

visual inspection of results:

in [11]: mean_data out[11]: [3.977467228348056e-05,  0.0012597616684312622,  0.002014552320664128,  0.002637979011827459,  0.0034494600258767605,  0.0046060653403401375,  0.006786816345993429,  0.006134035007562488,  0.006666974319765965,  0.0073876206879504025,  0.008359026357841989,  0.008946725012113651,  0.01020014965130637,  0.0110439983351777,  0.012085124345806738,  0.013095536657298604,  0.013812023680657148,  0.014505649354153624,  0.015109792332320163,  0.01541508767210568,  0.018623976677190512,  0.018014412683745224,  0.01837641668195526,  0.01806374565542986,  0.01866597666715582,  0.021138361655175686,  0.020885809014240902,  0.023644315680333722,  0.022424093661053728,  0.024507874331902713,  0.026360396664434422,  0.02618172235088423,  0.02721496132047226,  0.026609957004742075,  0.027632603014353663,  0.029077719994044553,  0.030218352350251127,  0.03213361800105,  0.0321545610204339,  0.032791375007946044,  0.033749551337677985,  0.03418213398739075,  0.03482868466138219,  0.03569800598779693,  0.035460735321976244,  0.03980560234049335,  0.0375820419867523,  0.03880414469555641,  0.03926491799453894,  0.04079093333954612,  0.0420664346893318,  0.044861480011604726,  0.045125720323994756,  0.04562378901755437,  0.04398221097653732,  0.04668888701902082,  0.04841196699999273,  0.047662509993339576,  0.047592316346708685,  0.05009777001881351,  0.04870589632385721,  0.0532167866670837,  0.05079756366709868,  0.05264475334358091,  0.05531930166762322,  0.05283398299555605,  0.055121281009633094,  0.056162080339466534,  0.05814277834724635,  0.05694748067374652,  0.05985202432687705,  0.05949359833418081,  0.05837553597909088,  0.05975819365509475,  0.06247356999665499,  0.061310798317814864,  0.06292542165222888,  0.06698586166991542,  0.06634997764679913,  0.06443380867131054,  0.06923895300133154,  0.06685209332499653,  0.06864909763680771,  0.06959929631557316,  0.06832000267847131,  0.07180017333788176,  0.07092387134131665,  0.07280202202188472,  0.07342300032420705,  0.0745120863430202,  0.07483605532130848,  0.0734497313387692,  0.0763389469939284,  0.07811927401538317,  0.07915793966579561,  0.08072184936221068,  0.08046915601395692,  0.08565403800457716,  0.08061318534115951,  0.08411134833780427,  0.0865995019945937] 

this looks pretty darn linear me. now, pandas is handy way graph things, if want convenient wrapper around matplotlib's api:

in [14]: import pandas pd  in [15]: df = pd.dataframe({'time': mean_data, 'n':list(range(0, 10001, 100))})  in [16]: df.plot(x='n', y='time') out[16]: <matplotlib.axes._subplots.axessubplot @ 0x1102a4a58> 

and here result:

enter image description here

this should on right track time you've been trying time. wound timing, explained in comments:

you timing result of str(x) results in list-literal, timing interpretation of list literals, not conversion of list->str

i can speculate patterns seeing result of that, interpreter/hardware dependent. here findings on machine:

in [18]: data = [timeit.repeat("{}".format(str(list(range(n)))), number=100) n in range(0, 10001, 100)] 

enter image description here

and using range isn't large:

in [23]: data = [timeit.repeat("{}".format(str(list(range(n)))), number=10000) n in range(0, 101)] 

and results: enter image description here

which guess sort of looks yours. perhaps better suited it's own question, though.


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -