python - Sklearn DBscan Cannot Fit CSR Sparse Data -
i have sparse data transform csr sparse vector:
from scipy.sparse import coo_matrix num_news = indexed.agg(max(indexed["newsindex"])).take(1)[0][0] + 1 # maximum index of news in data def get_matrix(news): row = [0 in news] data = [1 in news] return coo_matrix((data, (row,news)), shape=(1, num_news)).tocsr() d['feature'] = d['newsarr'].apply(get_matrix) then, show using pd.head:
uuid newsarr feature 0 014324000050581 [300.0, 274.0] (0, 274)\t1\n (0, 300)\t1 1 014379002854034 [3539.0, 1720.0, 402.0, 1787.0, 2854.0, 2500.0... (0, 402)\t1\n (0, 492)\t1\n (0, 493)\t1\n ... 2 014379004874618 [346.0] (0, 346)\t1 3 014379004904357 [592.0, 1586.0, 20.0, 4165.0, 19.0, 165.0, 12.0] (0, 12)\t1\n (0, 19)\t1\n (0, 20)\t1\n (0... 4 014379004920072 [1658.0, 283.0, 7.0, 492.0] (0, 7)\t1\n (0, 283)\t1\n (0, 492)\t1\n (... the output of d['feature'][:1].tolist() following:
[<1x93315 sparse matrix of type '<type 'numpy.int64'>' 2 stored elements in compressed sparse row format>] then want use dbscan:
from sklearn.cluster import dbscan db = dbscan(eps=0.3, min_samples=10).fit_predict(d['feature']) however, receive following error:
valueerror: setting array element sequence.
i believe not reasonable since vector 1*num_news. try use tolist():
db = dbscan(eps=0.3, min_samples=10).fit_predict(d['feature'].tolist()) the following error pops up:
valueerror: expected 2d array, got 1d array instead: array=[ <1x93315 sparse matrix of type '<type 'numpy.int64'>' 2 stored elements in compressed sparse row format> <1x93315 sparse matrix of type '<type 'numpy.int64'>' 19 stored elements in compressed sparse row format> <1x93315 sparse matrix of type '<type 'numpy.int64'>' 1 stored elements in compressed sparse row format> ..., <1x93315 sparse matrix of type '<type 'numpy.int64'>' 3 stored elements in compressed sparse row format> <1x93315 sparse matrix of type '<type 'numpy.int64'>' 2 stored elements in compressed sparse row format> <1x93315 sparse matrix of type '<type 'numpy.int64'>' 15 stored elements in compressed sparse row format>]. reshape data either using array.reshape(-1, 1) if data has single feature or array.reshape(1, -1) if contains single sample. i know sklearn can use csr sparse matrix input, how can ?
Comments
Post a Comment