Get gene counts from *.sam file and annotation
0
1
Entering edit mode
6.2 years ago

I have a list of alignment (.sam/.bam formats) from RNAseq experiment and i want to get gene counts from them. The problem is that these alignments have chromosomes as reference, not genes:

NB501288_407_HGYJKBGX7:1:11101:22149:1027#GAGATTGTCAGT  0       chr8    141541926       37      70M     *       0       0       ATTTCNTATGACATTGGGTTCTCATACAGGTCTGCAGTTCAAAATCCAAGTTTGAAATCTGGGACGGAAG  AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE  XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:5C64
NB501288_407_HGYJKBGX7:1:11101:6394:1028#GAGATTGTCAGT   0       chr1    33790402        37      70M     *       0       0       GTCTGNCTGGCTCTGCTCAGTCGGGAGGAGGCGCCCTGGGCCAGAATCCTGGCTGCCACCAGCCCTAGGA  AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEE/EEAEEEEEEEE/  XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:5T64
NB501288_407_HGYJKBGX7:1:11101:19502:1031#GAGATTGTCAGT  16      chr2    97474290        25      70M     *       0       0       CCCCTCCGACCGTTCCCCAGCACACCCCACCCCACTCAGCCGCTCAGCCTCCCTCAGTTACCCANACCGC  <EEEEEEAEAEEEEEEEEEAE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE#AAAAA  XT:A:U  NM:i:4  X0:i:1  X1:i:0  XM:i:4  XO:i:0  XG:i:0  MD:Z:3T0G1A57G5

Is there a tool from samtools package to intersect them with gtf annotation to get gene counts, not chromosome?

RNA-Seq alignment samtools • 3.9k views
ADD COMMENT
1
Entering edit mode

featureCounts and htseq-count are standard software programs to get gene counts from SAM/BAM files.

ADD REPLY
0
Entering edit mode

thats not really solving my problem, i first need to use something like bedtools intersect

ADD REPLY
0
Entering edit mode

To get the gene counts you need to annotate your aligned reads first (based on the coordinates in the 3rd and the 4th column!)

ADD REPLY
0
Entering edit mode

yeah, thats what my goal is the beast i've found so far is bedtools intersect

ADD REPLY
0
Entering edit mode

You are on track!

1) Download a bed file containing the coordinates of all Gencode genes (You can download it from the UCSC table browser).

2) Convert your bam to bed:

samtools view -bq 20 -F 1796 "XXX.bam" | bamToBed -i stdin >  XXX.bed

3) In bedtools run:

bedtools intersect -a XXX.bed -b Genes.bed  -wa  >  intersect.bed

4) Get the total number of overlapped baits:

wc -l  intersect.bed
ADD REPLY

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6