PCA data preparation - center, scale and normalization sparse matrix -
i have large sparse data set ~(300000 * 10000
), each row instance , each column feature. however, many pixels ( or features) each instance zeros, means there no data there. add zeros myself keep same shape each instance. , segments has data can reside in anywhere along 10000 pixels. example, data can reside 0 2000 pixels, or 1500 7000 pixels, or 5000 10000 pixels. have example fake dataset below:
data = np.array([[0,0,0,1,3,5,0,0,0,0], [5,4,8,6,10,0,0,0,0,0], [0,0,4,8,7,3,2,6,9,0], [0,0,0,0,10,5,2,9,3,8], [1,2,3,4,5,6,7,8,9,10], [0,0,0,4,2,6,8,9,0,0]])
i have few questions data preparation:
1) how can better deal pixels data missing instead of adding 0
s in pixels.
2) if there no better solution missing data, how can center/ scale/normalize sparse data in dataset before use scipy.linalg.sparse.svds
decompose dataset?
hope hear inspiring ideas proceed.
Comments
Post a Comment