Repetitive element coverage of consensus
1
1
Entering edit mode
10.1 years ago
jmack159 ▴ 30

I have a consensus sequence for a class of repetitive elements (from repbase) and I want to find out the depth coverage of the real genomic instances (taken from ucsc rmask table) against this consensus sequence. The elements tend to become 5' truncated over evolutionary time, so some instances may be quite short, while some instances are full length. I would like to get out something in the end like a wigg file describing the "depth" of instances at each position in the consensus, but I am not sure how to go about it. Any help is greatly appreciated.

EDIT:

One approach I was considering is to break up my instance sequences into k-mers of say 50bp and then align to the consensus reference via bowtie producing BAM and then using samtools to get the windowed coverage. Thoughts?

ChIP-Seq alignment genome blast • 2.3k views
ADD COMMENT
1
Entering edit mode
10.1 years ago
Manvendra Singh ★ 2.2k

I think that its doable

For example :

If you have L1Hs element, (just assuming that this is family of repetitive elements where your consensus belongs)

###### fetch all L1Hs from rmsk_table
grep -wi 'L1Hs' rmsk.bed > L1Hs.bed ### you would get some around 1500 sequences

#### fetch sequences of L1Hs
bedtools getfasta -fi hg19.fa -bed L1Hs.bed -fo L1Hs.fa.out

Now you align your consensus with L1Hs.fa.out

count how many instances are there on every nucleotide resolution. A simple perl script could help or which ever you think is good

ADD COMMENT

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6