Skip to content

Latest commit

 

History

History
368 lines (304 loc) · 12.1 KB

code_rules.md

File metadata and controls

368 lines (304 loc) · 12.1 KB

PySSA Code Rules

Authors: Hannah Kullik & Martin Urban

Contents of this document

Description

Python is the main programming language of the PySSA project. This document describes the rules which must be followed if the source code gets extended.

Dev-Environment

The conda environment used for the development must be created through an environment.yaml file from one of the authors. This ensures that the development environment is reproducible.

Pre-commit hook

Before committing any changes, a custom commit-hook will be run automatically. You must solve any issues before committing any changes!

You can choose to run the pre-commit hook configuration by yourself beforehand by running the command:

pre-commit run --all-files

All available pre-commit hooks are listed here: https://pre-commit.com/hooks.html. The addition of a pre-commit hook needs an approval of an author.

To be able to run the pre-commit hooks on Windows, it could be possible to update the SSL lib. To do this download this OpenSSL Installer.

Linting

You have to run ruff over your code to check any static errors. The configuration to use is defined in the pyproject.toml.

Type annotation

Python is a dynamically typed language, but in this project Python is used as a statically typed language. The decision emphasizes robust and less error-prone code. Therefore, you have to use Python's type annotation feature.

Annotations of python builtins

Annotating variables using python builtins where it is possible.

i: int = 0

Annotations of pyssa builtins

Annotating variables using pyssa builtins where data structures of pyssa are used.

protein_pairs_for_analysis: list['protein_pair.ProteinPair'] = []

Annotations of library builtins

Annotating variables using library builtins where data types of libraries are used.

import numpy as np

distances_of_amino_acid_pairs: np.ndarray = np.ndarray([])

Annotations of return values

If a function/ method has a return value that will not be used, that function call needs to be wrapped inside the rvoid function. The rvoid function is the only function which gets imported as function and not as module:

from pyssa.util.void import rvoid
from pyssa.util import main_window_util

rvoid(main_window_util.setup_app_settings(self.app_settings))  # void indicates that there is a
# return value but it is not used

Naming conventions

  • Package: snake_case
  • Module: snake_case
  • Class: PascalCase
  • Method: snake_case
    • private: _ prefix (single underscore)
    def _create_directory_structure(self) -> None:
  • Function: snake_case
  • Variable: snake_case
    • argument: a/an_var_name, if no specific variable is meant.
    def export_protein_as_pdb_file(a_filepath: str) -> None:
    • argument: the_var_name, if a specific variable is meant.
    def load_settings(the_app_settings: 'settings.Settings') -> None:
    • method/function scope: tmp_ prefix
    ...
    tmp_destination_filepath: str = "/home/rhel_user/scratch/log.txt"
    ...
  • Global variable: g_ prefix + snake_case

Imports

Never use wildcard imports. Always import the module not the class itself.

from pymol import cmd # Correct: Module is imported

from pymol import * # Wrong! Wildcard import
from os.path import exists # Wrong! Function/Class import

Use official abbreviations for common python libraries.

import numpy as np
import pandas as pd

Exception handling

Argument checks

Always check for None:

def copy_fasta_file(a_source_filepath, a_destination_filepath):
    if a_source_filepath is None:
        logger.error(f"The argument 'a_source_filepath' is illegal: {a_source_filepath}!")
        raise exception.IllegalArgumentError("An argument is illegal.")
    if a_destination_filepath is None:
        logger.error(f"The argument 'a_destination_filepath' is illegal: {a_destination_filepath}!")
        raise exception.IllegalArgumentError("An argument is illegal.")

Raise IllegalArgumentError if unmodified argument is not usable for the function/method:

import os


def copy_fasta_file(a_source_filepath: pathlib.Path, a_destination_filepath: pathlib.Path):
  ...
  if not os.path.exists(a_source_filepath):  # argument is unmodified
    raise FileNotFoundError()

Raise custom exception if argument is modified and is not usable for the function/method

import os


def copy_fasta_file(a_source_filepath: pathlib.Path, a_destination_filepath: pathlib.Path):
  ...
  if not os.path.exists(a_source_filepath.parent):  # .parent is a modified version of the argument
    raise exceptions.DirectoryNotFoundError("")

try-except blocks

Always wrap cmd commands of the PyMOL API into a try-except block.

import pymol

try:
  cmd.scene(f"{tmp_protein_pair.protein_1.get_molecule_object()}"
            f"{tmp_protein_pair.protein_2.get_molecule_object()}",
            action="recall")
except pymol.CmdException:
    logger.error("...")
    raise ...

Communication

QMainWindow & QDialogs

The communication between any QMainWindow and QDialog is done with signals and slots. This ensures that no unauthorized memory access violations occur.

How-to

  1. Define a custom pyqtsignal in the QDialog class:
...

class DialogAddModel(Qt.QtWidgets.QDialog):
    """Class for a dialog to add proteins to a project."""

    """
    A pyqtsignal that is used to hand-over the protein structure information.
    """
    return_value = pyqtSignal(tuple)  # this is a custom PyQt signal

    ...
  1. Emit the signal where communication should occur.
...

def add_model(self) -> None:
    """Emits a custom pyqtsignal and closes the dialog."""
    self.return_value.emit((self.ui.txt_add_protein.text(), True))
    self.close()

...
  1. Connect the signal in the QMainWindow with the QDialog object and the slot function
...


def add_existing_protein(self) -> None:
  """Opens a dialog to add an existing protein structure to the project."""
  self.tmp_dialog = dialog_add_model.AddProteinView()
  self.tmp_dialog.return_value.connect(self.post_add_existing_protein)  # here is the connection
  self.tmp_dialog.show()


...
  1. Be sure that the slot function has the value of the signal as an function argument
...

def post_add_existing_protein(self, return_value: tuple):  # in this case the value is a tuple
    ...

Threading

Within PySSA the custom Task class will be used if multithreading is necessary for the presenter. The Task class is in the pyssa.internal.thread.tasks module. Do NOT use the _Action class directly only use the Task class!

Usage

...


def opens_project(self):
    """Initiates the task to open an existing project."""
    self._active_task = tasks.LegacyTask(self.__async_open_project, post_func=self.__await_open_project)
    self._active_task.start()


def __async_open_project(self) -> tuple:
    """Runs in the separate QThread and does CPU-bound work."""
    tmp_project_path = pathlib.Path(f"{self._workspace_path}/{self._view.ui.txt_open_selected_project.text()}")
    return ("result", project.Project.deserialize_project(tmp_project_path, self._application_settings))


def __await_post_project(self, a_result: tuple):
    """Runs after the QThread finished."""
    ...

The Task class gets an "async" function and optionally an "await" function. The function that runs in the QThread must have the signature __async (double underscore). The function that runs after the QThread finished must have the signature __await. This design decision is based on intuition because the __async function runs asynchronous in the QThread and the __await function waits for the QThread (__async function) to finish,

Database

PySSA uses a SQLite database for every single project.

Basic interaction with DatabaseX classes

The interaction is managed through the DatabaseManager class. The interaction with the manager from a controller is done through the DatabaseThread class. The DatabaseThread has a queue which accepts objects of the type DatabaseOperation.

To run an INSERT statement from a controller, you have to create a DatabaseOperation object with the SQLQueryType (in this case INSERT_...) and put it into the queue of the DatabaseThread.

def _delete_protein(self):
    """Deletes an existing protein from the project."""
    tmp_protein: "protein.Protein" = self._view.ui.proteins_tree_view.currentIndex().data(enums.ModelEnum.OBJECT_ROLE)
    
    # Below is the creation of the DatabaseOperation object
    tmp_database_operation = database_operation.DatabaseOperation(enums.SQLQueryType.DELETE_EXISTING_PROTEIN, 
                                                                  (0, tmp_protein.get_id()))
    # Here the DatabaseOperation object will be put into the queue of the DatabaseThread
    self._database_thread.put_database_operation_into_queue(tmp_database_operation)
    
    # -- The rest of the function
    self._interface_manager.get_current_project().delete_specific_protein(tmp_protein.get_molecule_object())
    self._interface_manager.refresh_protein_model()
    self._interface_manager.refresh_main_view()

Adding new SQL queries to the DatabaseThread

Every SQL statement has to be implemented in the DatabaseManager class! For proper functionality of the DatabaseThread class it is necessary to add the SQL statements from the database manager into a wrapper function and map this function against an appropriate SQLQueryType enum.

An example for a wrapper function.

@staticmethod
def __wrapper_delete_existing_protein(the_db_manager, the_buffered_data: tuple):
    # It is import to unpack the first element of the tuple with an _ !   
    _, tmp_protein_id = the_buffered_data  
    the_db_manager.delete_existing_protein(tmp_protein_id)

An example for the mapping process

def _setup_operations_mapping(self):
    self._operations_mapping = {
        enums.SQLQueryType.INSERT_NEW_PROTEIN: self.__wrapper_insert_new_protein,
        enums.SQLQueryType.DELETE_EXISTING_PROTEIN: self.__wrapper_delete_existing_protein
    }

and the SQLQueryType enum class

class SQLQueryType(enum.Enum):
    """An enum for all possible sql queries for the database thread."""
    INSERT_NEW_PROTEIN = 'insert_new_protein'
    DELETE_EXISTING_PROTEIN = 'delete_existing_protein'

Terminology

Path, dir, file & filepath

  • Always use path if a directory path is meant.
  • Always use dir if a directory name is meant.
  • Always use filepath if an absolute path to a file is meant.
  • Always use file if a name of a file is meant.

Difference between TODO and fixme

  • Add a # TODO if there is a task which needs to be done.
  • Add a # fixme if there is an important note which needs to be quickly found.

Code formatting

The overall code formatting is done with the auto-formatter black. This will be done if the pre-commit hooks are ran.

Editor folds

Always wrap argument checks into an editor-fold (Ctrl+Alt+T) and insert a line break before and after the ending of the editor-fold. Example:

# <editor-fold desc="Checks">
if the_fasta_path is None:
    logger.error("The argument filename is illegal.")
    raise exception.IllegalArgumentError("")

# </editor-fold>

Code Documentation

The documentation for the pyssa codebase is done with sphinx. To generate the new documentation run if you are in the codebase dir (PySSA/docs/codebase):

sphinx-apidoc -f -o .\source\ ..\..\pyssa\
sphinx-build -M html source/ build/