html - How remove <a> tag-content in the content field before indexing/stored in Solr -
i index html-documents solr 6.6.0. there lot of link-text in content field, dilute search results. so, how remove tag-content in "content"-field befor indexing/storing in solr? there way updaterequestprocessorchain? knows solutions?
use htmlstripcharfilterfactory
filter in field definition during index time.
this char filter strips html input stream
<analyzer> <charfilter class="solr.htmlstripcharfilterfactory"/> <tokenizer ...> [...] </analyzer>
Comments
Post a Comment