Hello everybody,
I am still new to the field of computational epigenetics, so I need some help with the following task(s):
I study applied bioinformatics and in the context of my master thesis, I need to compute methylation levels around splice junctions. I need to output it in a format that I have never seen before. I did some research about the format, but I couldn't find anything about it. 'The format seems to be similar to fasta, but instead of a sequence (after the header starting with >
), it provides methylation levels in a tab-seperated manner, and I honestly don't know what DSQ stands for. A small part of a methylation track is given below is given below:
>chr1:142346773:142346881:+@chr1:142380702:142380810:+@chr1:142404277:142404426:+_expu=400_expd=200_bsz=20_part=0
DSQ 18.5594 18.5594 18.5594 8.22605 18.5594 31.9349 36.4521 36.4521 33.8659 18.5594 8.22605 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>chr1:58852214:58852582:+@chr1:58878691:58878806:+@chr1:58880759:58881091:+_expu=400_expd=200_bsz=20_part=0
DSQ 0 0 0 0 0 0 0 0 4.50575 ...
This format is recognized by a newly developed flexible self-organizing map for DNA methylation analysis (or other digitized epigenetic signals). The paper describing the software is freely accesible here. Unfortunately besides a paper describing this software, the authors provide a 3-page-quick-start-manual, which doesn't tell much about this format shown above, but maybe someone here has seen this format before and can explain me the anatomy of it.
What I have done so far:
- I downloaded RNA-Seq runs from human spleen sample provided by NIH Roadmap Epigenomics Project. The GEO accession is GSM1010976.
- I used TopHat splice junction mapper in order to determine splice junctions and therfor used hg19 as reference genome.
I need to compute:
- The methylation levels in the range -200nt/+200nt to the left/right of these splice junctions respectively
- I need them in 20nt intervals. These DSQ values seen in the above example represent the (normalized?) methylation levels within a 20nt bin
I also found the data of whole genome BS-Seq experiment which was done for the same spleen sample. The GEO accession is GSM983652. I considered the following possibilities:
- If I understand correctly, the provided wig-file already contains methylation data. If that is the case, I would like to use the already existing methylation data. Is there a tool to extract methyation data out of a wig file? As I said before I need the cytosine methylation levels near splice junctions and I need them to be exported in the format shown above.
- If option 1 doesn't work, which tool should I use to analyse the provided BS-Seq data? And again: How can I export them in the format shown above?
I hope that somebody can help me with these tasks.
Best regards
thefirstrealace