Sets
Introduction
In Python, lists and sets are both mutable. The key difference is that sets are unordered and cannot have duplicate elements in a set. Sets are particularly useful for membership testing and eliminating said duplicates, even if lists are more useful overall.
Implementation
Sets can be declared using the set()
function or the curly braces {}
. There are important differences here that are explained easier with code than with words:
fruits = {'apple', 'banana', 'strawberry'}
type(fruits)
veggies = set('squash', 'zucchini', 'cauliflower')
structure = {}
type(structure)
empty = set()
type(empty)
<class 'set'> `TypeError: set expected at most 1 argument, got 3` <class 'dict'> <class 'set'>
From this example, we see that filling curly braces with items results in a set, but using set()
does not work — the function takes at most 1 argument. With this information, how to we declare an empty set? We see that using just { }
results in an empty dictionary, given that the two data structures share notation, but set()
with no parameters will give us an empty set.
Below is a list of set functions w3schools.com, some of which we discuss later.
Method | Alternative Code | Description |
---|---|---|
|
Adds an element to a set |
|
|
Removes all the elements from a set |
|
|
Returns a copy of a set |
|
|
|
Returns a (new) list of elements in this set that are not in the other. Can list multiple arguments in the function or |
|
In-place version of |
|
|
Both functions remove the item from a set — |
|
|
|
Returns a set that includes the values that are in both sets. |
|
In-place version of |
|
|
Compares exactly two sets. Returns |
|
|
Compares exactly two sets. Returns |
|
|
Compares exactly two sets. Returns |
|
|
Removes a random element from a set. |
|
|
Compares exactly two sets. Returns a set that includes the items that are not shared between sets. |
|
|
|
In-place version of |
|
|
Returns all the elements in all of the 2+ sets. Duplicates intrinsically disallowed. |
|
|
As you might be able to tell by the function names and alternative code, Python sets follow many of the same logical operations as mathematical sets.
The add()
function is where we see our disorder appear. If you’re adding value(s) to a set and run your code multiple times, you might notice that the order of the set differs if you print the final set. The location of the item is determined by system memory, which changes frequently. You can check if sets match by using ==
as follows:
fruits = {"apple", "banana", "strawberry"}
fruits2 = {"apple", "strawberry", "banana"}
print(fruits == fruits2)
True
The drawback here is that sets cannot be indexed like lists, so if you need to evaluate elements of your group at the item level instead of the group level, lists will better fit your needs.
Sets support set comprehension, much like lists support list comprehension. This example is taken from the Python sets documentation.
magic = {x for x in 'abracadabra' if x not in 'abc'}
print(magic)
{'r', 'd'}
Wait, shouldn’t this return "{'r', 'd', 'r'}"? Not quite…
Duplicate Removal
The main appeal of sets is its quick handling of duplicates. While a list’s unique elements can be found by wrapping it with the set()
function or calling unique()
on it, you can use a set to prevent duplicates from being added in the first place. Take the following example:
fish = {'salmon', 'tuna', 'cod'}
fish.add('cod')
print(fish)
{'salmon', 'cod', 'tuna'}
Recall that even if the order of the set changes, as long as ==
returns True
, the sets are equivalent. This example demonstrates that sets handle duplicates on their own — no exception is thrown, even though "cod" is already in fish
.
In our "abracadabra" example above, this is why "{'r', 'd'}" is the output — set comprehension, in that case, got the unique non-abc letters in "abracadabra", not every instance of them.
One application of this property is tracking the unique words in a document. If you’re parsing the file, you can add each word to a set and be left with every different word that appears in the document. If you need to access the independent elements for any reason, you can recast it using list()
.
Examples
How would I take the word "banana" out of a set if I did know it was included?
Click to see solution
fruits = {'orange', 'grapefruit', 'banana'}
fruits.remove('banana')
print(fruits)
{'orange', 'grapefruit'}
Repeat the prior example, but what if we did not know the contents of the set?
Click to see solution
fruits = {'orange', 'grapefruit', 'banana'}
fruits.discard('banana')
print(fruits)
{'orange', 'grapefruit'}