DNA methylation (5mC) and hydroxymethylation (5hmC) are chemical modifications of cytosine bases which play a crucial role in epigenetic gene regulation. However, cost, data complexity and unavailability of comprehensive analytical tools is one of the major challenges in exploring these epigenetic marks. Hydroxymethylation-and Methylation-Sensitive Tag sequencing (HMST-seq) is one of the most cost-effective techniques that enables simultaneous detection of 5mC and 5hmC at single base pair resolution. We present HMST-Seq-Analyzer as a comprehensive and robust method for performing simultaneous differential methylation analysis on 5mC and 5hmC data sets. HMST-Seq-Analyzer can detect Differentially Methylated Regions (DMRs), annotate them, give a visual overview of methylation status and also perform preliminary quality check on the data. In addition to HMST-Seq, our tool can be used on whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) data sets as well. The tool is written in Python with capacity to process data in parallel and is available at (https://hmst-seq.github.io/hmst/).
HMST-Seq-Analyzer is written in python. It can be installed and accessed from command line and is avalible for both linux and mac operating systems. The package can be downloaded here
Prior to installing the package, dependencies must be fulfilled.List of dependencies is as follows:
It is advised to install dependencies using miniconda.
Package contains a file requirments.txt which can be used for automatic installation of dependencies from conda or pip.
To install the package, go to the HMST-Seq-Analyzer directory and type:
python setup.py install
For more detials, follow the readme
file in the package
The package folder will contain following:
demo
: Contains demo data sets.hmst-seq-analyzer
: Contains python soruce code of pipeline.readme.txt
: Instructions about usage of package.requirments.txt
: List of requirments. Can be used for automatic installation from miniconda or pip.setup.py
: Setup file for package. The pipeline consists of follwoing 8 tasks. To run a task, type hmst_seq_analyzer <task> [<args>]
. To see what are the options for each task of the pipeline, please run: hmst_seq_analyzer
gene_annotation
: Cleans reference file and creates genomic region files (TSS, geneBody, TES, 5dist and intergenic) from the referencedata_preprocessing
: Creation of 5mC and 5hmC files, quantile normalizationfind_MRs
: Extracts genomic regions from 5mC/5hmC-files and finds methylated regionsprepare_for_DMR_finding
: Finds overlapping methylated regions between MRs in WT condition samples and KO condition samples
DMR_search
: Finds differentially methylated regionsprep4plot
: Prepares files for plottingplot_all
: Plots hyper versus hypo differentially methylated regions, and relative density of significantly modified sites in MRsclean_files
: Removes some unwanted files. Please only use after prep4plot is already doneTest run is available on public hg19 data, present in demo folder.
In folder HMST-Seq-analyzer/demo
, there is a sbatch file:
job_demo_HMST.sbatch
which can be run by entering: sbatch job_demo_HMST.sbatch
, in the command line to run the demo automatically.
The Tool Box can be used for: Cleaning and sorting input methylation data, Extracting data for single chromosome or splitting it chromosome wise from input methylation files and calculating average read count between the range 1-5, 6-10, 11-15, 16-20 for quality check of input data.
The tool box can be found : HMST-Seq-Analyzer_amna/demo
For cleaning and sorting input: python avg_read_coverage.py --In_path data-sample.txt --Out_path out --Org hum
For cleaning sorting input and extracting data for one chromosome : python avg_read_coverage.py --In_path data-sample.txt --Out_path out --Org hum --Chr 17
For cleaning sorting input and splitting data chromosome wise : avg_read_coverage.py --In_path data-sample.txt --Out_path out --Org hum --Chr all
For cleaning sorting input and calculating average read count : python avg_read_coverage.py --In_path data-sample.txt --Out_path out --Org hum --Avg_read y