I have a set of more than a million items in the form:
// Just an example for simplicity
struct Item {
std::string name_;
int value_;
};
I need to be able to search items in the set (both by value and by name), add and remove items in the set.
And no two items can have same name or same value.
What is the best way to structure my data so all functions are optimized for speed?
Note that I’m doing this for homework so all libraries have to be mine or from STL.
(1) First idea was to put all items in a vector and have two binary search trees with a node like this:
struct TreeNode {
//index of an item in the vector
int index;
//pointers to branching nodes
TreeNode* left, * right;
};
and sort one by the name, one by value.
The problem here is, when removing items from the set, I need to update all larger indexes in both trees.
(2) Second idea was to put all items in a list and have tree nodes like this:
struct TreeNode {
//pointer to an item in the list
Item* item;
//pointers to branching nodes
TreeNode* left, * right;
};
but I don’t know if <list>
is going to move items around the memory under the hood when I’m adding and removing them and leave tree nodes with broken pointers.
None of your stated requirements involve sorted access. In that case, the fastest version using just the standard library is having two unordered_maps
that cross-reference each other.
#include <string>
#include <unordered_map>
class SiMap
{
using string_int_map = std::unordered_map<std::string, int>;
using int_string_map = std::unordered_map<int, const std::string*>;
string_int_map string2int;
int_string_map int2string;
public:
bool insert(const std::string& name, int value)
{
auto inserted = string2int.emplace(name, value);
if(! inserted.second) // name not unique
return false;
const std::string* ref = &inserted.first->first;
try {
auto second_inserted = int2string.emplace(value, ref);
if(! second_inserted.second) { // undo if value is not unique
string2int.erase(inserted.first);
return false;
}
} catch(...) {
string2int.erase(inserted.first);
throw;
}
return true;
}
bool erase(const std::string& name) noexcept
{
string_int_map::iterator found = string2int.find(name);
if(found == string2int.end())
return false;
int2string.erase(found->second);
string2int.erase(found);
return true;
}
bool erase(int value) noexcept
{
int_string_map::iterator found = int2string.find(value);
if(found == int2string.end())
return false;
string2int.erase(*found->second);
int2string.erase(found);
return true;
}
const int* find(const std::string& name) const noexcept
{
string_int_map::const_iterator found = string2int.find(name);
return found == string2int.end() ? nullptr : &found->second;
}
const std::string* find(int value) const noexcept
{
int_string_map::const_iterator found = int2string.find(value);
return found == int2string.end() ? nullptr : found->second;
}
};
Note that unordered_map
guarantees that pointers to elements remain valid even if the map itself changes. Iterators are invalidated, though. So it is safe to work with raw pointers.
Extending the interface for iteration or construction and lookup via std::string_view
or std::string&&
are left as an exercise to the reader.
If you can use standard containers, why custom binary search trees? Use two
std::[unordered_]map
s. “I don’t know if <list> is going to move items around” It will not, but neither will a map. Store the elements in one map (doesn’t matter which one), and store pointers to them in another map.The data structure that you are searching for, is called a bidirectional map (a.k.a bitmap). If you cannot use boost::bitmap, then try to implement it yourself as proposed by @HolyBlackCat. Some online examples exist, i.e. this one.
Cheat and embed SQLite3.
@TonySalimi “Bimap” presumably not “Bitmap”.
@tadman Correct. Meant to type ‘cool,’ accidentally wrote ‘fool.’ Autocorrect knows me too well 🙂