Refactoring Python Packages: Embracing Best Practices

Published: March 13, 2022

As a developer, when i found myself refactoring some old codebases of mine, I had to decide which best practices to take to to improve maintainability, readability, and overall code quality. In this article, I'll explore the process of refactoring a Python package and adopting industry-standard best practices that I settled on after some Googling.

In summary though, files were reformatted trying to follow the PEP8 guidelines (These are the de facto canonical standards for organizing python code) as well as the practices suggested in this python guide as well as this blog post. The main things to nore were:

Organizing Code into Packages

To achieve better organization and separation of concerns, we restructure the code into Python packages. Packages are directories with an __init__.py file, indicating that they contain Python code. They allow us to group related modules together, providing a more organized and scalable codebase.

By following this approach, we can export functionality in the __init__.py file, making it accessible to other parts of the application. This not only enhances modularity but also simplifies imports across the project.

Although, leaving an init.py file empty is considered normal and even good practice, if the package's modules and sub-packages do not need to share any code.

File and Module Naming

Renaming files with underscores and avoiding characters like spaces or hyphens in module names are recommended practices. Although using underscores in module names is permissible, it's best to keep module names short and concise, eliminating the need to separate words.And, most of all, don't namespace with underscores; use submodules instead.

Using Docstrings for Documentation

Good documentation is crucial for code understanding and collaboration. I use docstrings in internal modules to provide clear and concise explanations of the module's purpose and functionality. This documentation not only helps fellow developers comprehend the code but also helps me with future maintenance and updates.

Import Styles

Import style is adapted from this blog article. Proper import styles enhance code readability and maintenance. Absolute imports are favored, where we specify the absolute path of the imported module from the project root or any other directory accessible through sys.path.

The python import system already handles everything at a global level through sys.modules, just not across different processes. However, all scripts run from a single main module will have access to the exact same set of imported module objects, which only get loaded from disk once

Also, in general, when a class is defined in a module, the object instantiation of that class is created at the bottom of the module. This is standardized in my code for ease of importing.

.env for API Keys

Handling sensitive information like API keys securely is vital in any project. To prevent accidentally exposing such keys on public repositories like GitHub, we can create a .env file to store sensitive data. Following this Blog, I used the decouple library to read values from the .env file easily, ensuring that confidential information remains hidden. The usage is:

secure.py
from decouple import config
 
API_USERNAME = config('USER')
API_KEY = config('KEY')
đź’ˇ

The difference between a "Module" and a "Package"? A Module is a single python script whereas a Package is a collection of modules.

Notes on Folder Structure

  • bin/: Holds any executable files. Executable shouldn’t have a lot of code, just an import and a call to a main function in your runner script. I also keep my setup references here (e.g. SQL)
  • docs/: To maintain good documentation of all its parts. Put any documentation for internal modules here. I use docstrings in internal modules, but these files are whole-module documentation that at the very least give a holistic view of the purpose and function of the module/sub-module.
  • module_name/: Has subdirectories for splitting out parts of the application logic into more manageable chunks.
  • data/: Having this directory is helpful for testing. It’s a central location for any files that the application will ingest or produce, simplifying testing and internal use.
  • tests/: All tests: unit tests, execution tests, integration tests etc...

Conclusion

Refactoring a Python codebase to embrace best practices can significantly enhance code maintainability and developer productivity. By adhering to PEP 8 guidelines, adopting organized application layouts, and leveraging useful tools like decouple, we ensure that our codebase becomes more readable, scalable, and secure :)


If you want to learn more about setup.py files, check out this repository.