java - Spell-Check: Find one-to-one token difference mapping between two strings -


i stumbled on question on internet archive , having difficulty wrapping head around it. want find desired mapping amongst different tokens between 2 strings. output should string-to-string map.

for example:

string1: hewlottpackardenterprise helped american raleways in n y

string2: hewlett packard enterprise helped american railways in ny

output:
hewlottpackardenterprise -> hewlett packard enterprise

hewlott -> hewlett

raleways -> railways

n y -> ny

note: have been able write edit-distance method, finds types of edits (segregated types, deletion, substitution etc.) , can convert first string second convert method

what have tried far?

approach 1: began naive approach of splitting both strings space, inserting tokens of first string hash map , comparing tokens of other string hashmap. however, approach fails misses on relevant mappings.

approach 2: utilize covert method find edit positions in string, , type of edits. using space edits, i'm able create mapping hewlottpackardenterprise -> hewlett packardenterprise. however, method explodes more , more things need splitted within same word.

appreciate thoughts in regard! clear doubts in comments.

public string returnwhitespaceedittoken(editdone e, list<string> testtokens) {     int pos = e.pos, count=0, i=0;     string resulttoken = null;      if (e.type.equals(deleteedit)) {         (i=0;i<testtokens.size();i++) {             count+=testtokens.get(i).length();             if (count==pos) {                 break;             }             if (i!=testtokens.size()-1) {                 count++;             }         }          resulttoken = testtokens.get(i) + " " + testtokens.get(i+1);     } else if (e.type.equals(insertedit)) {         (i=0;i<testtokens.size();i++) {             count+=testtokens.get(i).length();             if (count>pos) {                 break;             }             if (i!=testtokens.size()-1) {                 count++;              }         }         string token = testtokens.get(i);         resulttoken = token.substring(count-token.length(), pos) + token.substring(pos, count);     }     return resulttoken; } 

a pretty common way of handling problems find longest common subsequence (or it's dual shortest edit script) between 2 strings , post-process output specific format want; in case string maps.

wikipedia has pretty decent introduction problem here: https://en.wikipedia.org/wiki/longest_common_subsequence_problem

and great paper "an o(nd) difference algorithm , variations" myers can found here. http://www.xmailserver.org/diff2.pdf


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -