Extract exons using GFF from a SAM file
0
0
Entering edit mode
8.5 years ago
IsmailM ▴ 110

I have a SAM file that I created by aligning raw illumina reads of one species to a closely related reference assembly.

I am now attempting to extract the sequences of any genes (i.e. just the exons joined together) present in my sam file with the indels fixed. A GFF for the reference assembly is available that I can use.

There are a number of similar questions on Biostars, however, the suggested methods do not fix Indels (they simply convert them into lower case)...

It has been suggested that "genometools extractfeat" might work. Has anyone used this or another tool to do something similar.

Many Thanks

samtools alignment sequencing genome sequence • 2.1k views
ADD COMMENT
1
Entering edit mode

Bag O' Questions - please don't take it as me picking on you though!

  • Is this RNA or DNA sequencing?
  • If DNA, is it pair or single-end sequencing?
  • When you talk about indels being fixed, do you mean you realigned your reads around indels, or you edited your GFF file to reflect the indels present in your SAM file? You talk about lower-case letters, but the SAM file does not have lower case letters (or at least it shouldn't).
  • I presume you want to do this to get the sequenced DNA from exons only - but what would you do with a read that spans an exon/intron boundry?
ADD REPLY

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6