The `io` Module in Python
This article explains the io module in Python.
We will explain the io module in Python with practical examples.
YouTube Video
The io Module in Python
Input/output processing forms the foundation for all kinds of data operations, such as files, networks, and standard I/O. Python's io module provides a set of abstract classes that unify these input/output operations. The key concept to understanding this module is the idea of a "stream.".
What is a stream?
A stream is an abstract flow for reading and writing data sequentially and continuously.
When reading file contents byte by byte or sending and receiving data over a network, all these can be handled as data streams.
By abstracting this mechanism, files, memory, and networks—different sources of I/O—can be handled with common operations such as reading and writing.
Python's io module provides a unified interface for streams, allowing efficient handling of both text and binary data.
Basic Structure of the io Module
The io module has a three-layered hierarchy according to the nature of streams.
-
Raw Layer (
RawIOBase)RawIOBasehandles the lowest-level byte I/O, such as OS file descriptors and devices. -
Buffered Layer (
BufferedIOBase)BufferedIOBaseprovides a cache (buffer) to improve I/O efficiency.BufferedReaderandBufferedWriterare typical examples. -
Text Layer (
TextIOBase)TextIOBaseconverts byte sequences to strings and handles encoding. Usually, when opening a file with theopen()function,TextIOWrapperfrom this layer is used.
Thanks to this structure, the io module clearly separates text and binary I/O while allowing flexible combinations.
Basic Structure of the io Module
RawIOBase handles OS file descriptors at the lowest layer, with BufferedIOBase adding a cache on top, and the top layer TextIOBase handling string conversions.
1import io
2
3# Check the core base classes hierarchy
4print(io.IOBase.__subclasses__())- This code is for checking the group of abstract classes inheriting from
IOBase. You can seeTextIOBase,BufferedIOBase, andRawIOBase, confirming the hierarchical structure.
io.IOBase: The Base Class of All
IOBase is the abstract base class for all I/O objects, defining common methods such as close(), flush(), and seekable(). It is rarely used directly and usually accessed through derived classes.
1import io
2
3f = io.StringIO("data")
4print(f.seekable()) # True
5print(f.readable()) # True
6print(f.writable()) # True
7f.close()- This example shows that the common methods of
IOBasecan also be used in upper classes.seekable()andreadable()are useful for checking the properties of a stream.
io.RawIOBase: The Lowest-Level Layer
RawIOBase is the layer closest to the OS file descriptor and does not perform buffering. The typical implementation is FileIO, which reads and writes by byte.
1import io, os
2
3# Create a low-level FileIO object (no buffering)
4fd = os.open('raw_demo.bin', os.O_RDWR | os.O_CREAT)
5raw = io.FileIO(fd, mode='w+')
6raw.write(b'abc123')
7raw.seek(0)
8print(raw.read(6)) # b'abc123'
9raw.close()FileIOis a concrete implementation ofRawIOBase; all reads and writes are performed asbytes. Efficiency can be improved by combining it with the upperBufferedIOBaselayer.
io.BufferedIOBase: Intermediate Layer (With Buffering)
BufferedIOBase is an intermediate layer that performs buffering, making disk access more efficient. The main implementations are BufferedReader, BufferedWriter, BufferedRandom, and BufferedRWPair.
1import io
2
3# Create a buffered binary stream on top of a BytesIO (simulate file)
4base = io.BytesIO()
5buffered = io.BufferedWriter(base)
6buffered.write(b'Python IO buffering')
7buffered.flush()
8base.seek(0)
9print(base.read()) # b'Python IO buffering'- In this example, data written via
BufferedWriteris temporarily stored in a memory buffer and is actually transferred to the lower layer upon callingflush().
Example of BufferedReader
BufferedReader is a read-only buffered stream that supports efficient reading with peek() and read().
1import io
2
3stream = io.BytesIO(b"1234567890")
4reader = io.BufferedReader(stream)
5print(reader.peek(5)) # b'12345' (non-destructive)
6print(reader.read(4)) # b'1234'
7print(reader.read(3)) # b'567'peek()only "peeks" at the data and does not move the pointer. By combining it withread(), you can flexibly control buffering.
io.TextIOBase: Text-Only Layer
TextIOBase is an abstraction layer for handling strings, internally performing decoding and encoding. A typical implementation class is TextIOWrapper.
1import io
2
3# Wrap a binary stream to handle text encoding
4binary = io.BytesIO()
5text_stream = io.TextIOWrapper(binary, encoding='utf-8')
6text_stream.write("\u3053\u3093\u306B\u3061\u306F")
7text_stream.flush()
8
9# Reset stream position
10binary.seek(0)
11
12# Read bytes once
13data = binary.read()
14
15# Show both raw bytes and decoded text
16print("Raw bytes:", data)
17print("Decoded text:", data.decode('utf-8'))- In this example,
TextIOWrapperencodes the string to UTF-8 and writes it to the underlying binary stream.
Example of Reading with TextIOWrapper
Decoding is performed automatically when reading.
1import io
2
3binary_data = io.BytesIO("Python I/O".encode('utf-8'))
4text_reader = io.TextIOWrapper(binary_data, encoding='utf-8')
5print(text_reader.read()) # 'Python I/O'TextIOWrapperserves as the fundamental class for text I/O and forms the basis for almost all high-level file operations.
io.StringIO: In-Memory Text Stream
StringIO is a class that allows you to handle strings in memory as if they were files. It is useful for I/O testing and temporary data generation.
1import io
2
3text_buf = io.StringIO()
4text_buf.write("In-memory text stream")
5text_buf.seek(0)
6print(text_buf.read()) # 'In-memory text stream'StringIOallows file-like operations without using the disk and is widely used in unit testing.
io.BytesIO: In-Memory Binary Stream
BytesIO is an in-memory file class for handling byte sequences (bytes). It is useful for situations such as binary processing or data compression where you do not want to use files.
1import io
2
3buf = io.BytesIO()
4buf.write(b'\x01\x02\x03')
5buf.seek(0)
6print(list(buf.read())) # [1, 2, 3]BytesIOhas the same interface asBufferedIOBaseand can be used as a substitute for many file APIs.
Custom Streams (Creating Original Classes)
The classes in io are extensible, allowing you to create your own stream classes. Below is an example of a TextIOBase subclass that capitalizes all text on write.
1import io
2
3class UpperTextIO(io.TextIOBase):
4 def __init__(self):
5 self.buffer = ""
6 def write(self, s):
7 self.buffer += s.upper()
8 return len(s)
9
10u = UpperTextIO()
11u.write("hello io")
12print(u.buffer) # "HELLO IO"- As long as you adhere to the contract of
TextIOBase, you can define any custom behavior like this. It is also easy to extend streams for specific uses, such as files and networks.
Summary
The io module organizes input/output processing into a hierarchy of abstract and concrete classes.
RawIOBaseis a class for OS-level byte I/O.BufferedIOBaseis a class that provides an efficient cache layer.TextIOBaseis a class that manages reading and writing of strings.StringIOandBytesIOare classes that provide in-memory streams.
Understanding these classes allows you to accurately grasp the workings of Python's I/O system and apply them to file operations, network communication, and the design of test streams.
You can follow along with the above article using Visual Studio Code on our YouTube channel. Please also check out the YouTube channel.