Written by: Dieter van der Westhuizen
Date: 2023
This script is designed for filtering mitochondrial genome data in a bioinformatics pipeline. Given an input .txt
file containing filter criteria and a .tsv
file containing the dataset, the script filters the dataset based on the specified criteria and outputs the filtered data to a new CSV file.
- User-friendly interface: Provides a list of available
.txt
and.tsv
files in the current working directory for the user to select. - Separator detection: Dynamically detects the separator used in the input files (such as comma, semicolon, pipe, or tab) for accurate reading.
- Filtering mechanism: Efficiently filters the
.tsv
dataset using criteria specified in the.txt
file and merges additional information from the filter list into the final result. - CSV output: Exports the filtered dataset to a new CSV file, which is named by combining the names of the
.txt
and.tsv
files.
- Place the script in a directory containing your
.txt
and.tsv
files. - Run the script. You'll first be prompted to select a
.txt
file containing your filter criteria. - Next, you'll be prompted to select a
.tsv
file containing the dataset. - The script will then detect the separators, filter the dataset, and save the result to a new CSV file in the same directory.
- Check the newly generated CSV file for your filtered results.
- Python environment with the
pandas
library installed. - Ensure that the
.txt
file contains a column named "Gene Symbol" and the.tsv
file has a column named "SYMBOL" for the filtering to work correctly.
- If you haven't installed Anaconda yet, download and install it from the official website.
- Open Anaconda Navigator or the Anaconda terminal.
- Create a new environment with the desired Python version (e.g.,
3.8
):conda create --name mito_filtering python=3.8
- Activate the environment:
conda activate mito_filtering
- Installing Dependencies
Install the required pandas library:
conda install pandas
- Running the Script
Navigate to the script directory:
cd path_to_your_script_directory
- Run the script:
python mitochondrial_analysis.py
- Closing Anaconda
Once done, you can deactivate the Anaconda environment:
conda deactivate
###Setting Up Jupyter Notebook in Anaconda Ensure you've activated the Anaconda environment where you have the required dependencies installed. If you haven't, refer to the instructions in the "Setting Up Anaconda" and "Creating and Activating a New Environment" sections above.
Install jupyter within the environment:
```bash
conda install jupyter
Launch Jupyter Notebook:
```bash
jupyter notebook
You can run each cell by selecting the cell and clicking the "Run" button or by pressing Shift + Enter. Make sure you run the cells in order, especially if they have dependencies on previous cells. You'll see the outputs (e.g., print statements) right below each cell after you run it. If you want to save the outputs, click on "File" and then "Save and Checkpoint". Once you're done, you can close the Jupyter Notebook browser tab and stop the Jupyter server in your terminal by pressing Ctrl + C twice.