python - Document plotting using Pyplot and sklearn -


i looking gain insight layout of document set. casting them numbers array using following approach sklearn.

  pipeline = pipeline([("vect", countvectorizer()),                        ("tfidf", tfidftransformer()),])    matrix = pipeline.fit_transform(docs).todense() 

if cluster them use

    kmeans = kmeans(n_clusters=2).fit(matrix)     data2d = kmeans.transform(matrix) 

then plot them using pyplot

    plt.scatter(data2d[:,0], data2d[:,1], c = categories) 

however, generates kmeans representation of dataset. there anyway of summing values in matrix , plotting them can see how relative each other, without using kmeans. representation consistent eveytime.

for coming after me. principle in question known multidimensional scaling. here useful blog explains principles behind it. https://de.dariah.eu/tatom/working_with_text.html


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -