python - Complex filtering in dask DataFrame -


i'm used doing "complex" filtering on pandas dataframe objects:

import numpy np import pandas pd data = pd.dataframe(np.random.random((10000, 2)) * 512, columns=["x", "y"]) data2 = data[np.sqrt((data.x - 200)**2 + (data.y - 200)**2) < 1] 

this produces no problems.

but dask dataframes have:

ddata = dask.dataframe.from_pandas(data, 8) ddata2 = ddata[np.sqrt((ddata.x - 200)**2 + (ddata.y - 200)**2) < 1] --------------------------------------------------------------------------- notimplementederror                       traceback (most recent call last) <ipython-input-13-c2acf73dddf6> in <module>() ----> 1 ddata2 = ddata[np.sqrt((ddata.x - 200)**2 + (ddata.y - 200)**2) < 1]  ~/anaconda3/lib/python3.6/site-packages/dask/dataframe/core.py in __getitem__(self, key)    2115             return new_dd_object(merge(self.dask, key.dask, dsk), name,    2116                                  self, self.divisions) -> 2117         raise notimplementederror(key)    2118     2119     def __setitem__(self, key, value):  notimplementederror: 0       false 

meanwhile simpler operation:

ddata2 = ddata[ddata.x < 200] 

works fine.

i think issue "complex" math (i.e. np.sqrt) result no longer lazy dask dataframe.

is there way around this? have create new column can filter on or there better way?

if replace np.sqrt da.sqrt works fine.

import dask.array da 

you may notice np.sqrt of dask series produces numpy array, step in computation not lazy, forces concrete result. use dask equivalent function maintain laziness , keep dask-compliant.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -