Question

featureCounts with pseudobam

1

Entering edit mode

7.9 years ago

mccormack ▴ 90

Has anyone tried using featureCounts with the pseudobam output of Salmon or kallisto ?

I have tried this command:

featureCounts(files=c("/Array/RNA_Seq/TF_B1_kallisto_Feb_14/kallisto_bams/m7-1a.fastq_ka.bam", "/Array/RNA_Seq/TF_B1_Salmon_Feb_14/mappings/m7-1a.bam"), annot.ext="/home/genomes/Araport/Araport11_GFF3_genes_transposons.201606.gtf", isGTFAnnotationFile=TRUE, GTF.featureType="exon", GTF.attrType="transcript_id", useMetaFeatures=TRUE, allowMultiOverlap=TRUE, minOverlap=1, largestOverlap=TRUE, countMultiMappingReads=TRUE, fraction=TRUE, nthreads=10)

I am using the Rsubread package.

It reads the annotation file:

|| Load annotation file /home/genomes/Araport/Araport11_GFF3_ge ... ||
||    Features : 322385                                                       ||
||    Meta-features : 58699                                                   ||
||    Chromosomes/contigs : 7

I get a lot of output like this:

1633                                                                                                            -    153

1634                       +;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+   3820

1635                       +;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+   3942

1636                                                                                                    +;+;+   1858

1637                                                                                                    +;+;+   1894

But reads do not get mapped to genes. Everything is 'Unnassigned_NoFeatures' or 'Unassigned_unmapped'.

My .bam file looks like this:

HISEQ:140:C2GCEACXX:1:1103:1620:2170    16  AT3G03960.1 1833    255 50M *   0   0   TGCATGCACTGTACTTCGTGTAGACCAGATTATAATGGCGAAACCAGCAG  JJJJJIIHJIJIIJJJJJJJJJIHJJJJJJJJJJJJJHHHHHDFFFFCCC  NH:i:1
HISEQ:140:C2GCEACXX:1:1103:1886:2020    16  AT4G18170.1 1058    255 50M *   0   0   CACATGATATGTTTAGGACGGCAGCTTATACTAACGGCGGTTCTGTGGCG  JJJJJJJIJJJJJJJJJHJJJJJJJJJJIIJJJJJJJHHHHHFFFFFCCC  NH:i:1

The identifier in the .bam file is transcript id (e.g. AT3G03960.1) which is included in the gtf file as 'transcript_id'. I have tried various combinations of GTF.featureType and GTF.attrType as well as deleting useMeataFeatures, but reads do not get assigned to transcripts.

RNA-Seq pseudobam featureCounts kallisto Salmon • 2.8k views

ADD COMMENT • link updated 4.7 years ago by ATpoint 86k • written 7.9 years ago by mccormack ▴ 90

score 1 · Answer 1 · 2020-04-07

This is only possible if you have a bam file that is based on the genome coordinate system. Remember that kallisto and salmon quantify reads against the transcriptome and therefore output bam files with coordinates relative to the transcripts. This is completely different from genome coordinates and you would first need to lift that bam file to genome coordinates, respecting splice sites etc. I am not a kallisto user but for salmon I can say that featureCounts on its bam files using a GTF/GFF as reference makes no sense at all.

score 0 · Answer 2 · 2020-04-06

0

Entering edit mode

4.7 years ago

kanika.151 ▴ 160

Hi,

I have used the 'psuedobam' from Kallisto to see if it gives me similar results or not. I was using the Linux based version of featureCounts and it worked. I speculate it has something to with the fasta file you have used for indexing its similarity with the gtf file you use for counting.

I had used Ensembl cdna fasta file for indexing and Ensembl gtf file for counting.