c++ - Has this vector occured before -
i have lot of vectors (in order of 10^4, more!) , getting more vectors in input stream. so, example, have
v1 = 1 0 4 1 1
v2 = 1 1 2 5 3 6 2
v3 = 0 1 1 5 0
i have 10^4 such vectors now, in input vector v4 = 0 1 1 5 0
, , want check if has appeared before or not, how suggest me this?
i list techniques have thought of, , errors accompany them:
- to use
std::map
, orstd::set
same. but,std::map std::set
not support vector argument. - to convert every integer in vector string type, append them in order , store string in map. error: case of
v5 = 11 1 1 1
,v6 = 1 1 1 1 1
shown same. - similiar above, add delimiter after every integer. error: tedious code?
i wished know if can think of method achieve this?
edit: 10^4, it's achievable. new task requires me store upto 10^9. don't think stl have space, have thrown sigabrt error. know other efficient hashing method can work in case?
the simple method of doing store vectors in vector, , maintain order of std::sort() family of functions, using std::lexigraphical_compare sort predicate. allow binary searching list in o(log(n)) amortized time, @ expensive of semi-costly sort operation, can reduced playing games heapifying or partitioning list-of-vectors load it.
more efficient this, however, store vectors trie (https://en.wikipedia.org/wiki/trie), each path down trie stores unique sequence vectors. depending on variance in data, can more space-efficient, , both addition , search o(log(n)) operations.
take advice grain of salt, however, 10^4 elements tiny number. experience differences in efficiency sorting & searching algorithms start show on modern hardware when you're in 10^6-10^7 range data set. below scale, oft times simplest, cache-friendly algorithm wins out.
another alternative if you're going raw speed, , list of vectors scan known , static, use finite state machine accept/reject input. tools ragel can make short work of such problems.
Comments
Post a Comment