Question

how to export RNA-seq reads which mapped to genes?

0

Entering edit mode

10.5 years ago

trakhtenberg ▴ 160

Hello, I have mapped reads in bam format from tophat, and I would like to filter it in a way that only those paired reads which mapped to genes (in GTF) would remain. Would appreciate advice on how I can accomplish this. thanks

RNA-Seq • 3.5k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160

Ram · Answer 1 · 2014-10-28

1

Entering edit mode

10.5 years ago

Devon Ryan 105k

Use htseq-count with the -o option and filter the resulting SAM file.

Edit: Removed mention of featureCounts, apparently it doesn't produce a SAM file with -R.

ADD COMMENT • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

thank you, I actually tried that but the output sam did not have reads, only lines like this:

XF:Z:XLOC_040566
XF:Z:XLOC_040566
XF:Z:XLOC_015449
XF:Z:XLOC_015449
XF:Z:__no_feature
XF:Z:__no_feature
XF:Z:XLOC_028412
XF:Z:XLOC_028412
ETC.....

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160

0

Entering edit mode

Then something got broken in the release you're using. Try a different version.

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

which version did you use? I used HTSeq-0.6.1 which had -f options that allows accepting bam (my input was bam). The counts though made sense. thank you

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160

0

Entering edit mode

I don't have handy a version known to work correctly, sorry.

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

instead of using the new -f option for bam input, I just tried the old way, using sam input. it did generate sam output with reads. but when I tried converting to bam

samtools view -bS in.sam > out.bam

I got error:

[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!

do you know what the problem might be?

thank you

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160

0

Entering edit mode

which command would you recommend for filtering the resulting SAM file? I tried converting to BAM without filtering first using this command

samtools view -bS in.sam > out.bam

but got this error:

[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!

I then tried samtools view -bT genome.fasta in.sam > out.bam but it just restored all the reads that were reduced by the htseq-count. So I guess it is important to filter out the reduced reads from SAM prior to converting to BAM. Based on the description on htseq website it seems that the reads in outputted SAM are marked as unmapped to GTF but are not actually deleted, so I would appreciate if you could recommend command for actually deleting them. thank you.

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160

Ram · Answer 2 · 2014-10-28

1

Entering edit mode

10.5 years ago

Manvendra Singh ★ 2.2k

I think that it should be following

convert your gtf to bed (gtf2bed)
intersectBed -abam <your bam> -b <gtf to bed file>

have your samheader too for downstream analysis

ADD COMMENT • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Manu you got lucky this time :-)

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by Ashutosh Pandey 12k

0

Entering edit mode

:) :) , Now I know, we have synchronized answering time.

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

I did steps 1 and 2, and now have bam output. could you please clarify what you mean by "have your samheader too for downstream analysis"? thank you

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160

0

Entering edit mode

Try samtools view -H on your new bam and if it prints some lines then you are fine.

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by Ashutosh Pandey 12k

0

Entering edit mode

samtools view -H printed this:

@HD     VN:1.0  SO:queryname
@SQ     SN:chr1 LN:195471971
@SQ     SN:chr10        LN:130694993
@SQ     SN:chr11        LN:122082543
@SQ     SN:chr12        LN:120129022
@SQ     SN:chr13        LN:120421639
@SQ     SN:chr14        LN:124902244
@SQ     SN:chr15        LN:104043685
@SQ     SN:chr16        LN:98207768
@SQ     SN:chr17        LN:94987271
@SQ     SN:chr18        LN:90702639
@SQ     SN:chr19        LN:61431566
@SQ     SN:chr2 LN:182113224
@SQ     SN:chr3 LN:160039680
@SQ     SN:chr4 LN:156508116
@SQ     SN:chr5 LN:151834684
@SQ     SN:chr6 LN:149736546
@SQ     SN:chr7 LN:145441459
@SQ     SN:chr8 LN:129401213
@SQ     SN:chr9 LN:124595110
@SQ     SN:chrM LN:16299
@SQ     SN:chrX LN:171031299
@SQ     SN:chrY LN:91744698
@PG     ID:TopHat  [original command]

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160

0

Entering edit mode

I did steps 1 and 2 but when I tried extracting the reads, I got 0.5% fewer properly paired reads than were counted by samtools flagstat in the resulting bam. Picard SamToFastq, for some reason, considered 0.5% of the reads as unpaired mates and so they had to be excluded, although they were for sure paired before the filtering and even after the filtering in the resulting bam (based on flagstat). Losing 0.5% is not too bad, but if you have a solution for rescuing them, please let me know. thank you.

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160

Ram · Answer 3 · 2014-10-28

1

Entering edit mode

10.5 years ago

Ashutosh Pandey 12k

Assuming your sam file is coordinate sorted, you can use tabix.

bgzip Input_sorted.sam
tabix -p sam Input_sorted.sam.gz
tabix Input_sorted.sam.gz chrA:start-end chrB:start-end chrC:start-end > Input_gtf.sam (provide the string containing GTF coordinates in a sorted order. Unfortunately tabix doesn't take bed file containing region of interest as an input)

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 10.5 years ago by Ashutosh Pandey 12k

0

Entering edit mode

thank you for responding!

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 10.5 years ago by trakhtenberg ▴ 160