python - Document plotting using Pyplot and sklearn -
i looking gain insight layout of document set. casting them numbers array using following approach sklearn.
pipeline = pipeline([("vect", countvectorizer()), ("tfidf", tfidftransformer()),]) matrix = pipeline.fit_transform(docs).todense()
if cluster them use
kmeans = kmeans(n_clusters=2).fit(matrix) data2d = kmeans.transform(matrix)
then plot them using pyplot
plt.scatter(data2d[:,0], data2d[:,1], c = categories)
however, generates kmeans representation of dataset. there anyway of summing values in matrix , plotting them can see how relative each other, without using kmeans. representation consistent eveytime.
for coming after me. principle in question known multidimensional scaling. here useful blog explains principles behind it. https://de.dariah.eu/tatom/working_with_text.html
Comments
Post a Comment