Maintain List Order While Eliminating Duplicates in Python
Written on
Chapter 1: Introduction
In Python programming, a common task is to eliminate duplicates from a list. The standard method involves using a set, which is an unordered collection of unique elements. To create a set from any iterable, you can easily utilize the built-in set() function. If you later require a list, you can convert the set back using the list() function.
For instance, consider the following example to demonstrate how to remove duplicates using the set method:
import random
alist = [random.randint(0, 10) for i in range(10)]
print(alist)
list(set(alist))
However, running this script will show that list(set(alist)) does not maintain the original order of elements. Since sets are inherently unordered, the sequence is lost when converting back to a list.
Section 1.1: Preserving Order
If maintaining the order is important, you will need to implement a different technique. A common practice is to use OrderedDict, which preserves the order of keys upon insertion:
alist = [0, 6, 5, 9, 0, 7, 4, 7, 3, 2]
from collections import OrderedDict
list(OrderedDict.fromkeys(alist))
# Output: [0, 6, 5, 9, 7, 4, 3, 2]
Starting from Python 3.7, the built-in dictionary also guarantees to retain the insertion order. Thus, if you are using Python version 3.7 or higher, you can directly use it:
alist = [0, 6, 5, 9, 0, 7, 4, 7, 3, 2]
list(dict.fromkeys(alist))
# Output: [0, 6, 5, 9, 7, 4, 3, 2]
It's important to note that the overhead of creating a dictionary first and then converting it to a list can be significant. If order isn't critical, you might prefer using a set because it offers additional operations.
Subsection 1.1.1: Hashable Items Requirement
To utilize both the set and OrderedDict/dict methods, your items must be hashable, meaning they should be immutable. If you are dealing with items that aren't hashable (like lists), you will need to adopt a more time-consuming approach that involves comparing each item against every other item in a nested loop.
Chapter 2: Visual Learning
The first video titled "Remove duplicates based on sort order - Power BI Tips & Tricks #22 - YouTube" provides insights into managing duplicates while keeping track of sorting. This video is a helpful resource for understanding practical applications in data handling.
The second video, "Remove Duplicates Based On Sort Order - YouTube," further explores techniques for sorting data while removing duplicates. Watching this video will enhance your understanding of the topic.