Has anyone tried using featureCounts with the pseudobam output of Salmon or kallisto ?
I have tried this command:
featureCounts(files=c("/Array/RNA_Seq/TF_B1_kallisto_Feb_14/kallisto_bams/m7-1a.fastq_ka.bam", "/Array/RNA_Seq/TF_B1_Salmon_Feb_14/mappings/m7-1a.bam"), annot.ext="/home/genomes/Araport/Araport11_GFF3_genes_transposons.201606.gtf", isGTFAnnotationFile=TRUE, GTF.featureType="exon", GTF.attrType="transcript_id", useMetaFeatures=TRUE, allowMultiOverlap=TRUE, minOverlap=1, largestOverlap=TRUE, countMultiMappingReads=TRUE, fraction=TRUE, nthreads=10)
I am using the Rsubread package.
It reads the annotation file:
|| Load annotation file /home/genomes/Araport/Araport11_GFF3_ge ... ||
|| Features : 322385 ||
|| Meta-features : 58699 ||
|| Chromosomes/contigs : 7
I get a lot of output like this:
1633 - 153
1634 +;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+ 3820
1635 +;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+ 3942
1636 +;+;+ 1858
1637 +;+;+ 1894
But reads do not get mapped to genes. Everything is 'Unnassigned_NoFeatures' or 'Unassigned_unmapped'.
My .bam file looks like this:
HISEQ:140:C2GCEACXX:1:1103:1620:2170 16 AT3G03960.1 1833 255 50M * 0 0 TGCATGCACTGTACTTCGTGTAGACCAGATTATAATGGCGAAACCAGCAG JJJJJIIHJIJIIJJJJJJJJJIHJJJJJJJJJJJJJHHHHHDFFFFCCC NH:i:1
HISEQ:140:C2GCEACXX:1:1103:1886:2020 16 AT4G18170.1 1058 255 50M * 0 0 CACATGATATGTTTAGGACGGCAGCTTATACTAACGGCGGTTCTGTGGCG JJJJJJJIJJJJJJJJJHJJJJJJJJJJIIJJJJJJJHHHHHFFFFFCCC NH:i:1
The identifier in the .bam file is transcript id (e.g. AT3G03960.1) which is included in the gtf file as 'transcript_id'. I have tried various combinations of GTF.featureType and GTF.attrType as well as deleting useMeataFeatures, but reads do not get assigned to transcripts.
Is the kallisto bam file using genome coordinates or transcriptome coordinates?