Python's `copy` Module

Python's `copy` Module

This article explains Python's copy module.

Focusing on the differences between shallow and deep copies, we provide a clear explanation—from the basic mechanisms of object duplication to applications in custom classes—along with practical examples.

YouTube Video

Python's copy Module

Python's copy module is the standard module for handling object duplication (copying). The goal is to understand the difference between shallow and deep copies and to be able to control copying behavior for custom objects.

Basics: What is a shallow copy?

Here we illustrate the behavior of shallow copies for mutable objects such as lists and dictionaries. A shallow copy duplicates only the top-level object and shares the references inside.

 1# Example: shallow copy with lists and dicts
 2import copy
 3
 4# A nested list
 5original = [1, [2, 3], 4]
 6shallow = copy.copy(original)
 7
 8# Mutate nested element
 9shallow[1].append(99)
10
11print("original:", original)  # nested change visible in original
12print("shallow: ", shallow)

In this code, the top-level list is duplicated, but the inner sublists are shared by reference, so shallow[1].append(99) is also reflected in original. This is a typical behavior of a shallow copy.

What is a deep copy?

A deep copy recursively duplicates the object and all of its internal referents, producing an independent object. Use it when you want to safely duplicate complex nested structures.

 1# Example: deep copy with nested structures
 2import copy
 3
 4original = [1, [2, 3], {'a': [4, 5]}]
 5deep = copy.deepcopy(original)
 6
 7# Mutate nested element of the deep copy
 8deep[1].append(100)
 9deep[2]['a'].append(200)
10
11print("original:", original)  # original remains unchanged
12print("deep:    ", deep)

In this example, changes to deep do not affect original. deepcopy copies the entire object graph, yielding an independent replica.

How to decide which one to use

Which copy to use should be determined by the object's structure and your purpose. If you do not want changes to internal mutable elements to affect the original object, deepcopy is appropriate. This is because it recursively duplicates the entire object, creating a completely independent copy.

On the other hand, if duplicating only the top-level object is sufficient and you value speed and memory efficiency, copy (a shallow copy) is more suitable.

  • You want to avoid the original changing when you modify internal mutable elements → use deepcopy.
  • Duplicating only the top level is enough and you want to prioritize performance (speed/memory) → use copy (shallow copy).

Handling built-in and immutable objects

Immutable objects (e.g., int, str, and tuples whose contents are immutable) usually do not need to be copied. copy.copy may return the same object when appropriate.

 1# Example: copying immutables
 2import copy
 3
 4a = 42
 5b = copy.copy(a)
 6print(a is b)  # True for small ints (implementation detail)
 7
 8s = "hello"
 9t = copy.deepcopy(s)
10print(s is t)  # True (strings are immutable)

Since copying immutable objects provides little efficiency benefit, the same object may be reused. This is rarely something you need to worry about in application design.

Custom classes: defining __copy__ and __deepcopy__

If the default copying is not what you expect for your class, you can provide your own copy logic. If you define __copy__ and __deepcopy__, copy.copy and copy.deepcopy will use them.

 1# Example: customizing copy behavior
 2import copy
 3
 4class Node:
 5    def __init__(self, value, children=None):
 6        self.value = value
 7        self.children = children or []
 8
 9    def __repr__(self):
10        return f"Node({self.value!r}, children={self.children!r})"
11
12    def __copy__(self):
13        # Shallow copy: create new Node, but reuse children list reference
14        new = self.__class__(self.value, self.children)
15        return new
16
17    def __deepcopy__(self, memo):
18        # Deep copy: copy value and deepcopy each child
19        new_children = [copy.deepcopy(child, memo) for child in self.children]
20        new = self.__class__(copy.deepcopy(self.value, memo), new_children)
21        memo[id(self)] = new
22        return new
  • By implementing __copy__ and __deepcopy__, you can flexibly control class-specific copying behavior. For example, you can handle cases where certain attributes should refer to (share) the same object, while other attributes should be duplicated as entirely new objects.
  • The memo argument passed to __deepcopy__ is a dictionary that records objects already processed during recursive copying and prevents infinite loops caused by circular references.
 1# Build tree
 2root_node = Node('root', [Node('child1'), Node('child2')])
 3
 4shallow_copy = copy.copy(root_node)   # shallow copy
 5deep_copy = copy.deepcopy(root_node)  # deep copy
 6
 7# Modify children
 8# affects both root_node and shallow_copy
 9shallow_copy.children.append(Node('child_shallow'))
10# affects deep_copy only
11deep_copy.children.append(Node('child_deep'))
12
13# Print results
14print("root_node:", root_node)
15print("shallow_copy:", shallow_copy)
16print("deep_copy:", deep_copy)
  • In this code, the variable shallow_copy is created via a 'shallow copy', so only the top level of the object is duplicated, and the objects it references internally (in this case, the children list) are shared with the original root_node. As a result, when you add a new node to shallow_copy, the shared children list is updated and the change is reflected in the contents of root_node as well.
  • On the other hand, the variable deep_copy is created via a 'deep copy', so the internal structure is recursively duplicated, producing a tree completely independent of root_node. Therefore, even if you add a new node to deep_copy, it does not affect root_node.

Cyclic references (recursive objects) and the importance of memo

When a complex object graph references itself (cyclic references), deepcopy uses a memo dictionary to track objects already copied and prevent infinite loops.

 1# Example: recursive list and deepcopy memo demonstration
 2import copy
 3
 4a = []
 5b = [a]
 6a.append(b)  # a -> [b], b -> [a]  (cycle)
 7
 8# deepcopy can handle cycles
 9deep = copy.deepcopy(a)
10print("deep copy succeeded, length:", len(deep))
  • Internally, deepcopy uses memo to refer to objects already duplicated, enabling safe copying even in the presence of cycles. You need a similar mechanism when performing recursion manually.

copyreg and serialization-style custom copying (advanced)

If you want to register special copying behavior in coordination with libraries, you can use the standard copyreg module to register how objects are (re)constructed. This is useful for complex objects and C extension types.

 1# Example: using copyreg to customize pickling/copy behavior (brief example)
 2import copy
 3import copyreg
 4
 5class Wrapper:
 6    def __init__(self, data):
 7        self.data = data
 8
 9def reduce_wrapper(obj):
10    # Return callable and args so object can be reconstructed
11    return (Wrapper, (obj.data,))
12
13copyreg.pickle(Wrapper, reduce_wrapper)
14
15w = Wrapper([1, 2, 3])
16w_copy = copy.deepcopy(w)
17print("w_copy.data:", w_copy.data)
  • Using copyreg, you can provide reconstruction rules that affect both pickle and deepcopy. It's an advanced API, and in most cases __deepcopy__ is sufficient.

Practical caveats and pitfalls

There are several important points to ensure correct behavior when using the copy module. Below we explain common pitfalls you may encounter in development and how to avoid them.

  • Performance deepcopy can consume significant memory and CPU, so consider carefully whether it's necessary.
  • Handling elements you want to share If you want some attributes, such as large caches, to remain shared, intentionally avoid copying their references inside __deepcopy__.
  • Immutable internal state If you hold immutable data internally, copying may be unnecessary.
  • Threads and external resources Resources that cannot be copied, such as sockets and file handles, are either meaningless to copy or will cause errors, so you must avoid copying them by design.

Practical example: a pattern for safely updating dictionaries

When updating a complex configuration object, this example uses deepcopy to protect the original settings.

 1# Example: safely update a nested configuration using deepcopy
 2import copy
 3
 4default_config = {
 5    "db": {"host": "localhost", "ports": [5432]},
 6    "features": {"use_cache": True, "cache_sizes": [128, 256]},
 7}
 8
 9# Create a working copy to modify without touching defaults
10working = copy.deepcopy(default_config)
11working["db"]["ports"].append(5433)
12working["features"]["cache_sizes"][0] = 512
13
14print("default_config:", default_config)
15print("working:", working)

Using deepcopy, you can safely create derived configurations without risking damage to the default settings. This is especially useful for nested mutable structures such as configurations.

Best practices

To use the copy module safely and effectively, it's important to keep the following practical guidelines in mind.

  • Choose between shallow and deep copies based on whether changes will occur and their scope.
  • Implement __copy__ and __deepcopy__ in custom classes to achieve the expected behavior.
  • Because deep copies are costly, reduce the need for copying through design when possible. Consider immutability and explicit clone methods, among other techniques.
  • When dealing with cyclic references, either leverage deepcopy or provide a memo-like mechanism manually.
  • Design so that external resources such as file handles and threads are not copied.

Conclusion

The copy module is the standard tool for object duplication in Python. By correctly understanding the differences between shallow and deep copies and implementing custom copying behavior when needed, you can perform duplication safely and predictably. By clarifying at the design stage whether copying is truly necessary and what should be shared, you can avoid unnecessary bugs and performance issues.

You can follow along with the above article using Visual Studio Code on our YouTube channel. Please also check out the YouTube channel.

YouTube Video