html - How remove <a> tag-content in the content field before indexing/stored in Solr -


i index html-documents solr 6.6.0. there lot of link-text in content field, dilute search results. so, how remove tag-content in "content"-field befor indexing/storing in solr? there way updaterequestprocessorchain? knows solutions?

use htmlstripcharfilterfactory filter in field definition during index time.

this char filter strips html input stream

<analyzer>  <charfilter class="solr.htmlstripcharfilterfactory"/>  <tokenizer ...>  [...] </analyzer> 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -