python - Sklearn DBscan Cannot Fit CSR Sparse Data -


i have sparse data transform csr sparse vector:

from scipy.sparse import coo_matrix  num_news = indexed.agg(max(indexed["newsindex"])).take(1)[0][0] + 1  # maximum index of news in data  def get_matrix(news):     row = [0 in news]     data = [1 in news]     return coo_matrix((data, (row,news)), shape=(1, num_news)).tocsr()  d['feature'] = d['newsarr'].apply(get_matrix) 

then, show using pd.head:

uuid    newsarr     feature 0   014324000050581     [300.0, 274.0]  (0, 274)\t1\n (0, 300)\t1 1   014379002854034     [3539.0, 1720.0, 402.0, 1787.0, 2854.0, 2500.0...   (0, 402)\t1\n (0, 492)\t1\n (0, 493)\t1\n ... 2   014379004874618     [346.0]     (0, 346)\t1 3   014379004904357     [592.0, 1586.0, 20.0, 4165.0, 19.0, 165.0, 12.0]    (0, 12)\t1\n (0, 19)\t1\n (0, 20)\t1\n (0... 4   014379004920072     [1658.0, 283.0, 7.0, 492.0]     (0, 7)\t1\n (0, 283)\t1\n (0, 492)\t1\n (... 

the output of d['feature'][:1].tolist() following:

[<1x93315 sparse matrix of type '<type 'numpy.int64'>' 2 stored elements in compressed sparse row format>] 

then want use dbscan:

from sklearn.cluster import dbscan  db = dbscan(eps=0.3, min_samples=10).fit_predict(d['feature']) 

however, receive following error:

valueerror: setting array element sequence.

i believe not reasonable since vector 1*num_news. try use tolist():

db = dbscan(eps=0.3, min_samples=10).fit_predict(d['feature'].tolist()) 

the following error pops up:

valueerror: expected 2d array, got 1d array instead: array=[ <1x93315 sparse matrix of type '<type 'numpy.int64'>'     2 stored elements in compressed sparse row format>  <1x93315 sparse matrix of type '<type 'numpy.int64'>'     19 stored elements in compressed sparse row format>  <1x93315 sparse matrix of type '<type 'numpy.int64'>'     1 stored elements in compressed sparse row format>  ...,  <1x93315 sparse matrix of type '<type 'numpy.int64'>'     3 stored elements in compressed sparse row format>  <1x93315 sparse matrix of type '<type 'numpy.int64'>'     2 stored elements in compressed sparse row format>  <1x93315 sparse matrix of type '<type 'numpy.int64'>'     15 stored elements in compressed sparse row format>]. reshape data either using array.reshape(-1, 1) if data has single feature or array.reshape(1, -1) if contains single sample. 

i know sklearn can use csr sparse matrix input, how can ?


Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -