The `io` Module in Python

The `io` Module in Python

This article explains the io module in Python.

We will explain the io module in Python with practical examples.

YouTube Video

The io Module in Python

Input/output processing forms the foundation for all kinds of data operations, such as files, networks, and standard I/O. Python's io module provides a set of abstract classes that unify these input/output operations. The key concept to understanding this module is the idea of a "stream.".

What is a stream?

A stream is an abstract flow for reading and writing data sequentially and continuously.

When reading file contents byte by byte or sending and receiving data over a network, all these can be handled as data streams.

By abstracting this mechanism, files, memory, and networks—different sources of I/O—can be handled with common operations such as reading and writing.

Python's io module provides a unified interface for streams, allowing efficient handling of both text and binary data.

Basic Structure of the io Module

The io module has a three-layered hierarchy according to the nature of streams.

  1. Raw Layer (RawIOBase)

    RawIOBase handles the lowest-level byte I/O, such as OS file descriptors and devices.

  2. Buffered Layer (BufferedIOBase)

    BufferedIOBase provides a cache (buffer) to improve I/O efficiency. BufferedReader and BufferedWriter are typical examples.

  3. Text Layer (TextIOBase)

    TextIOBase converts byte sequences to strings and handles encoding. Usually, when opening a file with the open() function, TextIOWrapper from this layer is used.

Thanks to this structure, the io module clearly separates text and binary I/O while allowing flexible combinations.

Basic Structure of the io Module

RawIOBase handles OS file descriptors at the lowest layer, with BufferedIOBase adding a cache on top, and the top layer TextIOBase handling string conversions.

1import io
2
3# Check the core base classes hierarchy
4print(io.IOBase.__subclasses__())
  • This code is for checking the group of abstract classes inheriting from IOBase. You can see TextIOBase, BufferedIOBase, and RawIOBase, confirming the hierarchical structure.

io.IOBase: The Base Class of All

IOBase is the abstract base class for all I/O objects, defining common methods such as close(), flush(), and seekable(). It is rarely used directly and usually accessed through derived classes.

1import io
2
3f = io.StringIO("data")
4print(f.seekable())   # True
5print(f.readable())   # True
6print(f.writable())   # True
7f.close()
  • This example shows that the common methods of IOBase can also be used in upper classes. seekable() and readable() are useful for checking the properties of a stream.

io.RawIOBase: The Lowest-Level Layer

RawIOBase is the layer closest to the OS file descriptor and does not perform buffering. The typical implementation is FileIO, which reads and writes by byte.

1import io, os
2
3# Create a low-level FileIO object (no buffering)
4fd = os.open('raw_demo.bin', os.O_RDWR | os.O_CREAT)
5raw = io.FileIO(fd, mode='w+')
6raw.write(b'abc123')
7raw.seek(0)
8print(raw.read(6))  # b'abc123'
9raw.close()
  • FileIO is a concrete implementation of RawIOBase; all reads and writes are performed as bytes. Efficiency can be improved by combining it with the upper BufferedIOBase layer.

io.BufferedIOBase: Intermediate Layer (With Buffering)

BufferedIOBase is an intermediate layer that performs buffering, making disk access more efficient. The main implementations are BufferedReader, BufferedWriter, BufferedRandom, and BufferedRWPair.

1import io
2
3# Create a buffered binary stream on top of a BytesIO (simulate file)
4base = io.BytesIO()
5buffered = io.BufferedWriter(base)
6buffered.write(b'Python IO buffering')
7buffered.flush()
8base.seek(0)
9print(base.read())  # b'Python IO buffering'
  • In this example, data written via BufferedWriter is temporarily stored in a memory buffer and is actually transferred to the lower layer upon calling flush().

Example of BufferedReader

BufferedReader is a read-only buffered stream that supports efficient reading with peek() and read().

1import io
2
3stream = io.BytesIO(b"1234567890")
4reader = io.BufferedReader(stream)
5print(reader.peek(5))   # b'12345' (non-destructive)
6print(reader.read(4))   # b'1234'
7print(reader.read(3))   # b'567'
  • peek() only "peeks" at the data and does not move the pointer. By combining it with read(), you can flexibly control buffering.

io.TextIOBase: Text-Only Layer

TextIOBase is an abstraction layer for handling strings, internally performing decoding and encoding. A typical implementation class is TextIOWrapper.

 1import io
 2
 3# Wrap a binary stream to handle text encoding
 4binary = io.BytesIO()
 5text_stream = io.TextIOWrapper(binary, encoding='utf-8')
 6text_stream.write("\u3053\u3093\u306B\u3061\u306F")
 7text_stream.flush()
 8
 9# Reset stream position
10binary.seek(0)
11
12# Read bytes once
13data = binary.read()
14
15# Show both raw bytes and decoded text
16print("Raw bytes:", data)
17print("Decoded text:", data.decode('utf-8'))
  • In this example, TextIOWrapper encodes the string to UTF-8 and writes it to the underlying binary stream.

Example of Reading with TextIOWrapper

Decoding is performed automatically when reading.

1import io
2
3binary_data = io.BytesIO("Python I/O".encode('utf-8'))
4text_reader = io.TextIOWrapper(binary_data, encoding='utf-8')
5print(text_reader.read())  # 'Python I/O'
  • TextIOWrapper serves as the fundamental class for text I/O and forms the basis for almost all high-level file operations.

io.StringIO: In-Memory Text Stream

StringIO is a class that allows you to handle strings in memory as if they were files. It is useful for I/O testing and temporary data generation.

1import io
2
3text_buf = io.StringIO()
4text_buf.write("In-memory text stream")
5text_buf.seek(0)
6print(text_buf.read())  # 'In-memory text stream'
  • StringIO allows file-like operations without using the disk and is widely used in unit testing.

io.BytesIO: In-Memory Binary Stream

BytesIO is an in-memory file class for handling byte sequences (bytes). It is useful for situations such as binary processing or data compression where you do not want to use files.

1import io
2
3buf = io.BytesIO()
4buf.write(b'\x01\x02\x03')
5buf.seek(0)
6print(list(buf.read()))  # [1, 2, 3]
  • BytesIO has the same interface as BufferedIOBase and can be used as a substitute for many file APIs.

Custom Streams (Creating Original Classes)

The classes in io are extensible, allowing you to create your own stream classes. Below is an example of a TextIOBase subclass that capitalizes all text on write.

 1import io
 2
 3class UpperTextIO(io.TextIOBase):
 4    def __init__(self):
 5        self.buffer = ""
 6    def write(self, s):
 7        self.buffer += s.upper()
 8        return len(s)
 9
10u = UpperTextIO()
11u.write("hello io")
12print(u.buffer)  # "HELLO IO"
  • As long as you adhere to the contract of TextIOBase, you can define any custom behavior like this. It is also easy to extend streams for specific uses, such as files and networks.

Summary

The io module organizes input/output processing into a hierarchy of abstract and concrete classes.

  • RawIOBase is a class for OS-level byte I/O.
  • BufferedIOBase is a class that provides an efficient cache layer.
  • TextIOBase is a class that manages reading and writing of strings.
  • StringIO and BytesIO are classes that provide in-memory streams.

Understanding these classes allows you to accurately grasp the workings of Python's I/O system and apply them to file operations, network communication, and the design of test streams.

You can follow along with the above article using Visual Studio Code on our YouTube channel. Please also check out the YouTube channel.

YouTube Video