Entering edit mode
5 months ago
2406691063
•
0
I used featureCounts to get exon counts. And my codes are following:
'''./featureCounts -t exon --countReadPairs -M -O -f -s 0 -p -T 4 -F GTF -a /line/WJ/project/NY_Transcriptome2Genome/featurecount/Subread_to_DEXSeq/NY_for_dexseq.gtf -o /line/WJ/project/NY_Transcriptome2Genome/featurecount/DT_16vs0.txt /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_16_01.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_16_02.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_16_03.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_16_06.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_16_08.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_0_04.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_0_05.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_0_06.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_0_07.sorted.bam /line/WJ/project/NY_Transcriptome2Genome/04_samtools_SNPcalling/sort/DT_0_08.sorted.bam
''' But the output in terminals showed that the successfully assigned alignment scores were only around 40%.
> //========================== featureCounts setting ===========================\\
|| ||
|| Input files : 10 BAM files ||
|| ||
|| DT_16_01.sorted.bam ||
|| DT_16_02.sorted.bam ||
|| DT_16_03.sorted.bam ||
|| DT_16_06.sorted.bam ||
|| DT_16_08.sorted.bam ||
|| DT_0_04.sorted.bam ||
|| DT_0_05.sorted.bam ||
|| DT_0_06.sorted.bam ||
|| DT_0_07.sorted.bam ||
|| DT_0_08.sorted.bam ||
|| ||
|| Output file : DT_16vs0.txt ||
|| Summary : DT_16vs0.txt.summary ||
|| Paired-end : yes ||
|| Count read pairs : yes ||
|| Annotation : NY_for_dexseq.gtf (GTF) ||
|| Dir for temp files : /line/WJ/project/NY_Transcriptome2Genome/fea ... ||
|| ||
|| Threads : 4 ||
|| Level : feature level ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file NY_for_dexseq.gtf ... ||
|| Features : 215200 ||
|| Meta-features : 20538 ||
|| Chromosomes/contigs : 335 ||
|| ||
|| Process BAM file DT_16_01.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 31522099 ||
|| Successfully assigned alignments : 11715686 (37.2%) ||
|| Running time : 0.83 minutes ||
|| ||
|| Process BAM file DT_16_02.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 29776568 ||
|| Successfully assigned alignments : 12311903 (41.3%) ||
|| Running time : 0.76 minutes ||
|| ||
|| Process BAM file DT_16_03.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 28298923 ||
|| Successfully assigned alignments : 12146956 (42.9%) ||
|| Running time : 0.64 minutes ||
|| ||
|| Process BAM file DT_16_06.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 31025669 ||
|| Successfully assigned alignments : 9508592 (30.6%) ||
|| Running time : 0.97 minutes ||
|| ||
|| Process BAM file DT_16_08.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 28444412 ||
|| Successfully assigned alignments : 9160038 (32.2%) ||
|| Running time : 0.93 minutes ||
|| ||
|| Process BAM file DT_0_04.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 27512167 ||
|| Successfully assigned alignments : 12262121 (44.6%) ||
|| Running time : 0.63 minutes ||
|| ||
|| Process BAM file DT_0_05.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 38093207 ||
|| Successfully assigned alignments : 14982222 (39.3%) ||
|| Running time : 0.81 minutes ||
|| ||
|| Process BAM file DT_0_06.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 86329285 ||
|| Successfully assigned alignments : 26742040 (31.0%) ||
|| Running time : 2.83 minutes ||
|| ||
|| Process BAM file DT_0_07.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 58327939 ||
|| Successfully assigned alignments : 21884010 (37.5%) ||
|| Running time : 1.60 minutes ||
|| ||
|| Process BAM file DT_0_08.sorted.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 20276198 ||
|| Successfully assigned alignments : 6942758 (34.2%) ||
|| Running time : 0.46 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/line/WJ/project/NY_Tra ||
|| nscriptome2Genome/featurecount/DT_16vs0.txt.summary" ||
|| ||
\\============================================================================//
I have also found very high "Unassigned_NoFeatures" in summary.
Assigned 11715686 12311903 12146956 9508592 9160038 12262121 14982222 26742040 21884010 6942758
Unassigned_Unmapped 1434907 1376340 1462155 1588518 1277436 1343727 1724046 4257775 2630628 934777
Unassigned_Read_Type 0 0 0 0 0 0 0 0 0 0
Unassigned_Singleton 0 0 0 0 0 0 0 0 0 0
Unassigned_MappingQuality 0 0 0 0 0 0 0 0 0 0
Unassigned_Chimera 0 0 0 0 0 0 0 0 0 0
Unassigned_FragmentLength 0 0 0 0 0 0 0 0 0 0
Unassigned_Duplicate 0 0 0 0 0 0 0 0 0 0
Unassigned_MultiMapping 0 0 0 0 0 0 0 0 0 0
Unassigned_Secondary 0 0 0 0 0 0 0 0 0 0
Unassigned_NonSplit 0 0 0 0 0 0 0 0 0 0
Unassigned_NoFeatures 18371506 16088325 14689812 19928559 18006938 13906319 21386939 55329470 33813301 12398663
Unassigned_Overlapping_Length 0 0 0 0 0 0 0 0 0 0
Unassigned_Ambiguity 0 0 0 0 0 0 0 0 0 0
I have no ideas what to do. Any help in this will really appreciated.
What percentage were you expecting and why?
I ask because unassigned no features means the mapped read simply doesn't overlap your GTF file, which might be expected. For example, I have ribo-depleted total RNA, and it turns out we capture a fair bit of nascent RNAs. So when I do counts using a regular GTF, I get percentages comparable to yours. If I included counts over intronic regions, that percentage would increase because now the nascent pre-mRNA reads get counted. There's also non-gene genomic regions that have RNA signal that I don't count with a typical GTF.
Could there be information missing from your GTF file, such as specific chromosomes?