Encoding for strings (preferrably a value) such that closer values means more similar Strings? -


i looking encoding can encode every string unique number such ->

  1. every 2 strings similar must have values close each other.
  2. every 2 values close each other must represent similar strings.

similarity of strings mean few substitutions in 1 string can form string. no additions or deletions considered.

the string can have characters a, c, t , g (only 4 possibilities)

things have tried ->

  1. gray code -> satisfies second 1 doesn't satisfy first criteria. 2 string similar need not means have closer values in gray code.

  2. hamming distance reference string -> if hamming distance same not mean @ strings similar, equally far reference. not satisfy second criteria.

please suggest method if know particular problem.

i think you're looking space filling curve: a coloured hilbert curve

consider string n dimensional vector of chars, , have corresponding point in n dimensional space. 2 strings have manhattan distance equal sum of differences of characters, strings close in representation similar strings.

we transform our n dimensional vector number between 0 , n, n highest possible valued string, using hilbert curve. in image, have 2 dimensions, hillbert curves can generalised higher dimensions.

if @ image, line continuous, , satisfies condition 2. hillbert curves generalised gray code.

for part, condition 1 true. if @ image, colour of hilbert curve changes on it's length. colour between adjacent areas of hilbert curve normaly quite similar, exception in case halfway down left side, colour changes orange blue. however, hillbert curves fill small area before ever moving onto next one, therefore similar strings have integer representation close 1 another. it's not perfect it's good.


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -