HMST-Seq-Analyzer

Abstract:

DNA methylation (5mC) and hydroxymethylation (5hmC) are chemical modifications of cytosine bases which play a crucial role in epigenetic gene regulation. However, cost, data complexity and unavailability of comprehensive analytical tools is one of the major challenges in exploring these epigenetic marks. Hydroxymethylation-and Methylation-Sensitive Tag sequencing (HMST-seq) is one of the most cost-effective techniques that enables simultaneous detection of 5mC and 5hmC at single base pair resolution. We present HMST-Seq-Analyzer as a comprehensive and robust method for performing simultaneous differential methylation analysis on 5mC and 5hmC data sets. HMST-Seq-Analyzer can detect Differentially Methylated Regions (DMRs), annotate them, give a visual overview of methylation status and also perform preliminary quality check on the data. In addition to HMST-Seq, our tool can be used on whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) data sets as well. The tool is written in Python with capacity to process data in parallel and is available at (https://hmst-seq.github.io/hmst/).

How to start:

HMST-Seq-Analyzer is written in python. It can be installed and accessed from command line and is avalible for both linux and mac operating systems. The package can be downloaded here

Prior to installing the package, dependencies must be fulfilled.List of dependencies is as follows:

bedtools
setuptools
itertools
pandas
numpy
argparse
os
shutil
multiprocessing
matplotlib
seaborn
datetime
scipy
tempfile
time
matlab.engine
numba

It is advised to install dependencies using miniconda.

Package contains a file requirments.txt which can be used for automatic installation of dependencies from conda or pip.

To install the package, go to the HMST-Seq-Analyzer directory and type: python setup.py install

For more detials, follow the readme file in the package

Contents of the package:

The package folder will contain following:

demo : Contains demo data sets.
hmst-seq-analyzer : Contains python soruce code of pipeline.
readme.txt : Instructions about usage of package.
requirments.txt : List of requirments. Can be used for automatic installation from miniconda or pip.
setup.py: Setup file for package.

Pipeline Tasks:

The pipeline consists of follwoing 8 tasks. To run a task, type hmst_seq_analyzer <task> [<args>]. To see what are the options for each task of the pipeline, please run: hmst_seq_analyzer -h

gene_annotation : Cleans reference file and creates genomic region files (TSS, geneBody, TES, 5dist and intergenic) from the reference
data_preprocessing : Creation of 5mC and 5hmC files, quantile normalization
find_MRs : Extracts genomic regions from 5mC/5hmC-files and finds methylated regions
prepare_for_DMR_finding : Finds overlapping methylated regions between MRs in WT condition samples and KO condition samples
DMR_search : Finds differentially methylated regions
prep4plot : Prepares files for plotting
plot_all : Plots hyper versus hypo differentially methylated regions, and relative density of significantly modified sites in MRs
clean_files : Removes some unwanted files. Please only use after prep4plot is already done

Demo:

Test run is available on public hg19 data, present in demo folder.

In folder HMST-Seq-analyzer/demo , there is a sbatch file: job_demo_HMST.sbatch

which can be run by entering: sbatch job_demo_HMST.sbatch , in the command line to run the demo automatically.

Tool Box:

The Tool Box can be used for: Cleaning and sorting input methylation data, Extracting data for single chromosome or splitting it chromosome wise from input methylation files and calculating average read count between the range 1-5, 6-10, 11-15, 16-20 for quality check of input data.

The tool box can be found : HMST-Seq-Analyzer_amna/demo

For cleaning and sorting input: python avg_read_coverage.py --In_path data-sample.txt --Out_path out --Org hum

For cleaning sorting input and extracting data for one chromosome : python avg_read_coverage.py --In_path data-sample.txt --Out_path out --Org hum --Chr 17

For cleaning sorting input and splitting data chromosome wise : avg_read_coverage.py --In_path data-sample.txt --Out_path out --Org hum --Chr all

For cleaning sorting input and calculating average read count : python avg_read_coverage.py --In_path data-sample.txt --Out_path out --Org hum --Avg_read y

References:

Gao, F., et al., Integrated detection of both 5-mC and 5-hmC by high-throughput tag sequencing technology highlights methylation reprogramming of bivalent genes during cellular differentiation. Epigenetics, 2013. 8(4): p. 421-430.

HMST-Seq-Analyzer: A New Python Tool for Differential Methylation and Hydroxymethylation Analysis in Various DNA Methylation Sequencing Data