PCA data preparation - center, scale and normalization sparse matrix -


i have large sparse data set ~(300000 * 10000), each row instance , each column feature. however, many pixels ( or features) each instance zeros, means there no data there. add zeros myself keep same shape each instance. , segments has data can reside in anywhere along 10000 pixels. example, data can reside 0 2000 pixels, or 1500 7000 pixels, or 5000 10000 pixels. have example fake dataset below:

data = np.array([[0,0,0,1,3,5,0,0,0,0],                 [5,4,8,6,10,0,0,0,0,0],                 [0,0,4,8,7,3,2,6,9,0],                 [0,0,0,0,10,5,2,9,3,8],                 [1,2,3,4,5,6,7,8,9,10],                 [0,0,0,4,2,6,8,9,0,0]]) 

i have few questions data preparation:

1) how can better deal pixels data missing instead of adding 0s in pixels.

2) if there no better solution missing data, how can center/ scale/normalize sparse data in dataset before use scipy.linalg.sparse.svds decompose dataset?

hope hear inspiring ideas proceed.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -