python 3.x - Adding documents to gensim model -


i have class wrapping various objects required calculating lsi similarity:

class similarityfiles:      def __init__(self, file_name, tokenized_corpus, stoplist=none):         if stoplist none:             self.filtered_corpus = tokenized_corpus         else:             self.filtered_corpus = []             convo in tokenized_corpus:                 self.filtered_corpus.append([token token in convo if token not in stoplist])         self.dictionary = corpora.dictionary(self.filtered_corpus)         self.corpus = [self.dictionary.doc2bow(text) text in self.filtered_corpus]         self.lsi = models.lsimodel(self.corpus, id2word=self.dictionary, num_topics=100)         self.index = similarities.matrixsimilarity(self.lsi[self.corpus]) 

i want add function class allow adding documents corpus , updating model accordingly. i've found dictionary.add_documents, , model.add_documents, there 2 things aren't clear me:

  1. when create lsi model, 1 of parameters function receives id2word=dictionary. when updating model, how tell use updated dictionary? unnecessary, or make difference?
  2. how update index? looks documentation if use similarity class, , not matrixsimilarity class, can add documents index, don't see such functionality matrixsimilarity. if understood correctly, matrixsimilarity better if input corpus contains dense vectors (which does, because i'm using lsi model). have change similarity can update index? or, conversely, what's complexity of creating index? if it's insignificant, should create new index updated corpus, follows:

code:

self.dictionary.add_documents(new_docs)    # new_docs after filtering stop words new_corpus = [self.dictionary.doc2bow(text) text in new_docs] self.lsi.add_documents(new_corpus) self.index = similarities.matrixsimilarity(self.lsi[self.corpus]) 

thanks. :)


Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -