Remove patches from gtf file?
0
2
Entering edit mode
8.7 years ago
Michelle M. ▴ 70

Hi there,

So I'm using an Ensembl gtf file (GrCh37) for rna-seq analysis and am wondering about the patches.

I know what the annotation patches are and why they're there, but should I exclude them when generating my count matrix in HTseq or Cufflinks? i.e. if I left them in, won't I get multi-reads mapping to both the patch and the original region, thereby screwing the true counts?

Thanks for your input, much appreciated.

Cheers,

M

gtf ensembl RNA-Seq patch • 3.1k views
ADD COMMENT
1
Entering edit mode

I went through a similar conundrum. While I am not exactly answering your question, I can share this with you: I have pretty heavy libraries and couldn't believe how long the calculations were taking. So I will be removing the patches and restart the analysis; feeling more comfortable about this decision since I came across (this morning) a line from the STAR aligner manual: "Generally, patches and alternative haplotypes should not be included in the genome", suggesting to only use the primary assembly.

https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf (page 5)

You do bring a valid point though. And I would be very curious to see the appropriate answer.

ADD REPLY
0
Entering edit mode

Thanks Joel, that helps a lot. I'll be interested to see if anyone can confirm this, but in the meantime I think I'll be removing the patches from the file.

ADD REPLY
0
Entering edit mode

I just came across this, which was helpful: http://seqanswers.com/forums/archive/index.php/t-4459.html

Cheers

ADD REPLY

Login before adding your answer.

Traffic: 2792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6