Union-Find is a data structure that efficiently handles the union and find operations on disjoint sets, commonly used in algorithms like Kruskal's Minimum Spanning Tree.
#include <iostream>
#include <vector>
class UnionFind {
public:
UnionFind(int size) : parent(size), rank(size, 1) {
for (int i = 0; i < size; ++i) parent[i] = i;
}
int find(int p) {
if (parent[p] != p) parent[p] = find(parent[p]);
return parent[p];
}
void unionSets(int p, int q) {
int rootP = find(p);
int rootQ = find(q);
if (rootP != rootQ) {
if (rank[rootP] > rank[rootQ]) parent[rootQ] = rootP;
else if (rank[rootP] < rank[rootQ]) parent[rootP] = rootQ;
else {
parent[rootQ] = rootP;
rank[rootP]++;
}
}
}
private:
std::vector<int> parent;
std::vector<int> rank;
};
Understanding the Basics of Union-Find
What is Union-Find?
Union-Find, also known as the Disjoint Set Union (DSU), is a data structure that efficiently handles the union and find operations. This structure is essential in scenarios where you need to maintain a collection of disjoint sets and perform quick queries on which set a particular item belongs to.
Historically, Union-Find was born from the need to solve problems related to connectivity, particularly in the context of networks and graph theory. It allows for the dynamic merging of sets and is widely used in various applications like network connectivity, image processing, and clustering.
Key Concepts
- Disjoint Sets: A collection of non-overlapping sets, where each element belongs to exactly one set.
- Union Operation: Combines two sets into one.
- Find Operation: Identifies the set that a particular element belongs to.
These operations are designed to be efficient, allowing amortized constant time complexity for each operation when implemented with specific optimizations.

Data Structure and Its Operations
Structure of Union-Find
The Union-Find data structure typically consists of two main arrays:
- A parent array that maps each element to its parent.
- A rank array that approximates the depth of the trees, helping in optimizing the union operation.
Here’s how we can implement the initialization of the union-find structure in C++:
class UnionFind {
public:
vector<int> parent, rank;
UnionFind(int n) {
parent.resize(n);
rank.resize(n, 0);
for (int i = 0; i < n; ++i) {
parent[i] = i; // Each element is its own parent
}
}
};
Union Operation
Union by rank is a technique that strives to keep the tree flat. When two sets are united, the root of the tree with lesser rank becomes a child of the tree with greater rank. This keeps the resulting tree shallow, improving performance.
Below is an implementation of the union operation:
void unionSets(int a, int b) {
int rootA = find(a);
int rootB = find(b);
if (rootA != rootB) {
if (rank[rootA] < rank[rootB]) {
parent[rootA] = rootB;
} else if (rank[rootA] > rank[rootB]) {
parent[rootB] = rootA;
} else {
parent[rootB] = rootA;
rank[rootA]++;
}
}
}
This function first finds the roots of both elements and then merges them based on their ranks.
Find Operation
To optimize the find operation, we utilize path compression. This technique flattens the structure of the tree whenever the find operation is called, ensuring that every node points directly to the root. As a result, future operations on the same elements become significantly faster.
Here’s the code implementing the find operation:
int find(int a) {
if (parent[a] != a) {
parent[a] = find(parent[a]); // Path compression
}
return parent[a];
}

Optimizations in Union-Find
Path Compression
Path compression greatly optimizes the find operation. By ensuring that every node traversed on the way to the root now points directly to the root, repeated lookups become almost instantaneous.
Union by Rank
Union by rank saves time in the union process by always attaching the shorter tree under the root of the taller tree. This keeps the overall height of the trees low, resulting in better performance during subsequent union and find operations.
When combined, these optimizations yield an almost constant time complexity for union and find operations, making the Union-Find structure very efficient.

Practical Applications of Union-Find
Solving the Connected Components Problem
One of the classic problems solvable by union-find is determining the connected components of a graph. For a given set of edges, we can merge sets that share connections, ultimately allowing us to identify how many distinct parts of the graph exist.
Here’s a C++ snippet that demonstrates this:
void connectedComponents(const vector<pair<int, int>>& edges, int n) {
UnionFind uf(n);
for (const auto& edge : edges) {
uf.unionSets(edge.first, edge.second);
}
// Counting distinct components ...
}
Example: Kruskal's Algorithm for Minimum Spanning Tree
Union-Find is essential in Kruskal's algorithm for constructing a minimum spanning tree (MST). The algorithm sorts all edges and then uses the union-find structure to ensure that adding an edge doesn’t create a cycle.
Here’s an example code snippet:
void kruskalAlgorithm(vector<Edge>& edges, int n) {
// Sorting edges by weight
sort(edges.begin(), edges.end());
UnionFind uf(n);
for (const auto& edge : edges) {
if (uf.find(edge.u) != uf.find(edge.v)) {
uf.unionSets(edge.u, edge.v);
// Add edge to MST ...
}
}
}

Performance Analysis
Time Complexity
The union and find operations can be executed in nearly constant time, specifically O(α(n)), where α is the Inverse Ackermann function. This is almost constant for all practical values of n, making union-find extremely efficient for large data sets.
Space Complexity
The space complexity typically involves storing two arrays: the parent and rank arrays, leading to an overall space complexity of O(n), where n is the number of elements in the union-find structure.

Troubleshooting Common Issues
Debugging Union-Find Implementations
Common mistakes when implementing union-find include failing to properly initialize the parent and rank arrays or erroneously merging elements. Careful attention to these intricate details is crucial.
Testing Your Union-Find Implementation
To ensure that your union-find implementation is working correctly, it’s essential to write unit tests. For example:
void testUnionFind() {
UnionFind uf(10);
uf.unionSets(1, 2);
assert(uf.find(1) == uf.find(2);
// More test cases...
}
Testing various scenarios will help identify potential shortcomings in the implementation.

Conclusion
The Union-Find data structure is a powerful tool in computer science, especially for solving connectivity problems efficiently. By mastering the techniques of union by rank and path compression, developers can dramatically improve the performance of their algorithms.
Incorporating union-find into your problem-solving toolkit will equip you with the skills needed to tackle a range of computational challenges. Explore further and deepen your understanding of algorithms to make the most out of this essential technique.