Entering edit mode
10.1 years ago
M K
▴
660
Hi All,
I mapped my specific strand RNA-seq data to human genome hg19, and I want to extract sense and antisense read counts from mapping file (.bam) . Can any one help me to do that.
so how can I compare the results with the annotation file GTF
Huh? Why would you need to and what would that even mean?
I want to detect antisense using R package called NASTI-seq and they using modified GTF file to compare the results with the mapped file .bam to count reads
Always mention in the beginning when you intend to use a particular package, especially when it's an obscure one.
If you look at the NASTIseq documentation, you'll see that the input is a list with the following 4 items:
smat
: A matrix of sense counts (these counts can be produced via my answer above).asmat
: A matrix of antisense counts (these counts can be produced via my answer above).genepos
: Your GTF file, presumably as aGRanges
object or maybe a data.frame, but they don't specify this. It appears that they just need gene bounds (this is actually poorly thought out, but doable withGenomicFeatures
).pospairs
: A 2-column data.frame with known sense-antisense pairs. The values should match row.names fromgenepos
.Thanks Ryan for helping me, and I am working on featureCounts to get sense and antisense matrix, but I didn't understand how they get genepos file so do you have any idea about that.
The documentation for that package is absolutely terrible (please feel encouraged to contact the author and complain...when a paper over an R/other software package is in review, we really need to include the documentation/tutorial in the review process), so I'm surprised you're stumped. It turns out that you have to look through the source code to see what's actually used from
genepos
and what things need to be named as.Assuming you have a reasonably well formatted GTF for your organism (i.e., not from UCSC), then something like the following R code should work to create
genepos
.You'll need to change the
import.gff(...)
part to match your own GTF file. For what it's worth, this method is rather slow with large GTF files, but you'll only need to do it once.Dear Devon, I am trying to extract sense and antisense read percentage for globin genes. I used htseq-count to obtain stats. I have posted a question on comparing strand setting as yes, no and reverse How to interpret the difference among these three options in strandedness from HTSeq-count. I need some help on interpreting the results can you help.