Entering edit mode
6.2 years ago
tim.ivanov.92
▴
40
I have a list of alignment (.sam/.bam formats) from RNAseq experiment and i want to get gene counts from them. The problem is that these alignments have chromosomes as reference, not genes:
NB501288_407_HGYJKBGX7:1:11101:22149:1027#GAGATTGTCAGT 0 chr8 141541926 37 70M * 0 0 ATTTCNTATGACATTGGGTTCTCATACAGGTCTGCAGTTCAAAATCCAAGTTTGAAATCTGGGACGGAAG AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:5C64
NB501288_407_HGYJKBGX7:1:11101:6394:1028#GAGATTGTCAGT 0 chr1 33790402 37 70M * 0 0 GTCTGNCTGGCTCTGCTCAGTCGGGAGGAGGCGCCCTGGGCCAGAATCCTGGCTGCCACCAGCCCTAGGA AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEE/EEAEEEEEEEE/ XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:5T64
NB501288_407_HGYJKBGX7:1:11101:19502:1031#GAGATTGTCAGT 16 chr2 97474290 25 70M * 0 0 CCCCTCCGACCGTTCCCCAGCACACCCCACCCCACTCAGCCGCTCAGCCTCCCTCAGTTACCCANACCGC <EEEEEEAEAEEEEEEEEEAE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE#AAAAA XT:A:U NM:i:4 X0:i:1 X1:i:0 XM:i:4 XO:i:0 XG:i:0 MD:Z:3T0G1A57G5
Is there a tool from samtools package to intersect them with gtf annotation to get gene counts, not chromosome?
featureCounts
andhtseq-count
are standard software programs to get gene counts from SAM/BAM files.thats not really solving my problem, i first need to use something like bedtools intersect
To get the gene counts you need to annotate your aligned reads first (based on the coordinates in the 3rd and the 4th column!)
yeah, thats what my goal is the beast i've found so far is bedtools intersect
You are on track!
1) Download a bed file containing the coordinates of all Gencode genes (You can download it from the UCSC table browser).
2) Convert your bam to bed:
3) In bedtools run:
4) Get the total number of overlapped baits: