Algorithm to search list to see if item exists, with a twist
Hiya,
Non-computer scientist here looking for some algorithm help. I want to search a list of items to see if my item is in the list, or any rotation. So if I search for aabb, I want to know if aabb, abba, bbaa or baab is in the list.
All of the items in my list are four characters long, and my list will have at most 256 items in it, and is fixed (so I'm not worried about adding or removing items from the structure I'm searching once it's made). I would like to search the same list many thousands of times, for different items.
I'm mostly interested in speed rather than storage space, as my list of items to search is quite small and fixed. I don't want to know where in the list the item is, simply that it or a rotation exists in the list.
Which algorithms do you think I should look up to perform this search?
The simplest way to do this seems to be to quadruple the list of items, so that each rotation is in the list, then sort and search that. I've also thought about hashing the items, for example summing up the number of a's, b's etc in the list, then only searching manually that sub-list, but I'm wondering if someone smarter than me has already solved this :)
Thanks,
Douglas
I think hashing is the best solution
I think your hashing method is the best way to do this. I know you don't care about how much space the algorithm consumes, but if you can come up with an algorithm that consumes less space, why not use it?
If you go with a hash function method, your function should be something like h(x) = x[1] + x[2] + x[3] + ... + x[n]
Where x[i] is the ASCII code of character i in string x. This way, string x will always have the same hash key, regardless of the order of the characters in string x. (In general, this would be considered a bad hash function, but for your problem it's perfect.)
Then, instead of comparing to a list of strings, you can compare to a list of hash keys. If you sort the list of m hash keys, your running time will be O(log(m)) for every search. (Meaning that in the worst case, when performing a search for a particular hash key, you will compare the hash key to log(m) other hash keys.)