Evaluation and methylation prediction
Evaluation types
single-read level
genomic-loci (group-reads) level
Evaluation metrics
ROC-AUC
PR-AUC
Methylation prediction
We provided independent trained models on each 5mC and 6mA datasets of different motifs and methyltransferases in the ./trained_model fold.:
MODEL="BERT_plus"
MODEL_SAVE_PATH=<model saved path>
REF=<reference genome fasta file>
FAST5_FOLD=<fast5 files to be analyzed>
OUTPUT=<output file>
time python detect.py --model ${MODEL} --model_dir ${MODEL_SAVE_PATH} \
--gpu cuda:0 --fast5_fold ${FAST5_FOLD} --num_worker 12 \
--motif ${MOTIF} --m_shift ${NUCLEOTIDE_LOC_IN_MOTIF} --evalMode test_mode --w_len ${W_LEN} --ref_genome ${REF} --output_file ${OUTPUT}
We generate the same output format as the deepSignal (https://github.com/bioinfomaticsCSU/deepsignal).:
# output example
NC_000913.3 4581829 + 4581829 43ea7b03-8d2b-4df3-b395-536b41872137 t 1.0 3.0398369e-06 0 TGCGGGTCTTCGCCATACACG
NC_000913.3 4581838 + 4581838 43ea7b03-8d2b-4df3-b395-536b41872137 t 0.9999996 0.00013372302 0 TCGCCATACACGCGCTCAAAC
NC_000913.3 4581840 + 4581840 43ea7b03-8d2b-4df3-b395-536b41872137 t 1.0 0.0 0 GCCATACACGCGCTCAAACGG
NC_000913.3 4581848 + 4581848 43ea7b03-8d2b-4df3-b395-536b41872137 t 1.0 0.0 0 CGCGCTCAAACGGCTGCAAAT
NC_000913.3 4581862 + 4581862 43ea7b03-8d2b-4df3-b395-536b41872137 t 1.0 0.0 0 TGCAAATGCTCGTCGGTAAAC