R - fix sorting when using anti_join to remove stop words (creating ngrams) -


very new r , coding, , trying frequency analysis on long list of sentences , given weighting. i've un-nested , mutated data, when try remove stop words, sort order of words within each sentence gets randomized. need create bigrams later on, , prefer if they're based on original phrase.

here's relevant code, can provide more if insufficient:

library(dplyr) library(tidytext)  data = data%>%   anti_join(stop_words)%>%   filter(!is.na(word)) 

what can retain original sort order within each sentence? have words in sentence indexed can match them given weight. there better way remove stop words doesn't mess sort order?

saw similar question here it's unresolved: how stop anti_join reversing sort order in r?

also tried didn't work: dplyr how sort groups within sorted groups?

got colleague in writing unfortunately they're not available anymore insight helpful. thanks!

you add sort-index data before sorting

library(dplyr) library(tidytext)  data = data %>%   dplyr::mutate(idx = 1:n()) %>%   dplyr::anti_join(stop_words) %>%   dplyr::filter(!is.na(word)) %>%   dplyr::arrange(idx) 

(the dplyr:: not necessary, helps remember function comes from)


Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -