PCA data preparation - center, scale and normalization sparse matrix -


i have large sparse data set ~(300000 * 10000), each row instance , each column feature. however, many pixels ( or features) each instance zeros, means there no data there. add zeros myself keep same shape each instance. , segments has data can reside in anywhere along 10000 pixels. example, data can reside 0 2000 pixels, or 1500 7000 pixels, or 5000 10000 pixels. have example fake dataset below:

data = np.array([[0,0,0,1,3,5,0,0,0,0],                 [5,4,8,6,10,0,0,0,0,0],                 [0,0,4,8,7,3,2,6,9,0],                 [0,0,0,0,10,5,2,9,3,8],                 [1,2,3,4,5,6,7,8,9,10],                 [0,0,0,4,2,6,8,9,0,0]]) 

i have few questions data preparation:

1) how can better deal pixels data missing instead of adding 0s in pixels.

2) if there no better solution missing data, how can center/ scale/normalize sparse data in dataset before use scipy.linalg.sparse.svds decompose dataset?

hope hear inspiring ideas proceed.


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -