pandas - How can I change my index vector into sparse feature vector that can be used in sklearn? -

July 15, 2011

i doing news recommendation system , need build table users , news read. raw data :

001436800277225 [12,456,157] 009092130698762 [248] 010003000431538 [361,521,83] 010156461231357 [173,67,244] 010216216021063 [203,97] 010720006581483 [86] 011199797794333 [142,12,86,411,201] 011337201765123 [123,41] 011414545455156 [62,45,621,435] 011425002581540 [341,214,286]

the first column userid, second column newsid.newsid index column, example, after transformation, [12,456,157] in first row means user has read 12th, 456th , 157th news (in sparse vector, 12th column, 456th column , 157th column 1, while other columns have value 0). , want change these data sparse vector format can used input vector in kmeans or dbscan algorithm of sklearn. how can that?

one option construct sparse matrix explicitly. find easier build matrix in coo matrix format , cast csr format.

from scipy.sparse import coo_matrix  input_data = [     ("001436800277225", [12,456,157]),     ("009092130698762", [248]),     ("010003000431538", [361,521,83]),     ("010156461231357", [173,67,244])     ]  number_movies = 1000 # maximum index of movies in data number_users = len(input_data) # number of users in model  # you'll want have way lookup index given user id. user_row_map = {} user_row_index = 0  # structures coo format i,j,data = [],[],[] user, movies in input_data:      if user not in user_row_map:         user_row_map[user] = user_row_index         user_row_index+=1      movie in movies:         i.append(user_row_map[user])         j.append(movie)         data.append(1)  # number of times users watched movie  # create matrix in coo format; cast csr easier use feature_matrix = coo_matrix((data, (i,j)), shape=(number_users, number_movies)).tocsr()

Search This Blog

LP

pandas - How can I change my index vector into sparse feature vector that can be used in sklearn? -

Comments

Post a Comment

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

nginx - phpPgAdmin - log in works but I have to login again after clicking on any links -

How to deploy a middleman blog inside a rails app? -