Very few assigned reads during featurecount function of bioconductor in bacterial RNAseq analysis
0
0
Entering edit mode
6.8 years ago
basucsmcri • 0

Hi everybody, I am new to analyzing RNAseq data. Kindly give your valuable suggestions on the bacterial RNAseq issue. I have successfully mapped the RNAseq reads against the reference genome, after that, I am trying to do read count using the feature-count function of Bioconductor to proceed with differential gene expression. But the problem is only very small percentage (2.4%) of reads are assigned and majority of sequences remain unassigned viz

> library(Rsubread)
> featureCounts(files = "CBC1_1_1.bam", annot.ext = "corrected.gff", isGTFAnnotationFile = TRUE)

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.28.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S CBC1_1_1.bam                                   ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file corrected.gff ...                                     ||
||    Features : 124                                                          ||
||    Meta-features : 1                                                       ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file CBC1_1_1.bam...                                           ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 4892776                                                   ||
||    Successfully assigned reads : 116593 (2.4%)                             ||
||    Running time : 0.24 minutes                                             ||
||                                                                            ||
||                         Read assignment finished.                          ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

$counts
              CBC1_1_1.bam
transcript_id       116593

$annotation
         GeneID
1 transcript_id
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Chr
1 NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Start
1 62611;321838;323588;326618;326743;326825;331182;331318;331745;331866;344966;452121;452231;452312;452395;452503;452586;452669;452777;452868;467092;467464;468234;473561;475216;475295;475477;478458;478580;478661;504421;640573;724864;724967;725156;725240;725341;725442;725524;725603;725690;725771;725862;725967;726057;726177;726264;726343;726434;726530;726651;742097;742194;742537;742983;743065;743144;743230;745977;747630;747813;750795;815945;816037;816130;816214;816306;816400;816482;840689;846331;1146456;1159998;1161798;1164859;1294963;1389306;1396035;2897591;3174053;66210;66292;84564;84644;84744;85035;88742;90378;90600;93868;99435;99921;100004;100266;103410;115028;115359;118628;147582;175236;211354;236279;236609;239754;242156;242486;245631;248032;248292;251341;251549;276479;276562;276690;276791;276874;276957;277084;277185;277268;277351;277479;1833054;2972855
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                End
1 62686;323359;326504;326734;326819;326900;331272;331408;331835;331956;345040;452209;452306;452388;452483;452578;452662;452757;452852;452956;467166;467538;468308;475082;475291;475371;478395;478574;478655;478735;504496;640657;724939;725040;725232;725315;725414;725518;725599;725677;725765;725855;725936;726040;726132;726251;726339;726427;726508;726603;726727;742173;742267;742610;743059;743140;743218;743305;747498;747706;750729;750869;816028;816112;816204;816297;816381;816474;816565;840763;846405;1146802;1161519;1164714;1164975;1295318;1389380;1396120;2897677;3174127;66285;66408;84639;84720;84860;87951;90265;90494;93516;95389;99510;99996;100120;103182;104931;115144;118275;120149;147848;175325;211428;236395;239525;241275;242272;245402;247152;248148;251208;251416;253070;276555;276637;276764;276865;276950;277032;277158;277259;277344;277426;277553;1833128;2973046
                                                                                                                                                                                                                                                   Strand
1 +;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-
  Length
1  58070

$targets
[1] "CBC1_1_1.bam"

$stat
                          Status CBC1_1_1.bam
1                       Assigned       116593
2            Unassigned_Unmapped        33087
3      Unassigned_MappingQuality            0
4             Unassigned_Chimera            0
5      Unassigned_FragmentLength            0
6           Unassigned_Duplicate            0
7        Unassigned_MultiMapping            0
8           Unassigned_Secondary            0
9         Unassigned_Nonjunction            0
10         Unassigned_NoFeatures      4743096
11 Unassigned_Overlapping_Length            0
12          Unassigned_Ambiguity            0

Kindly suggest me what could be the problem.

RNA-Seq R • 3.0k views
ADD COMMENT
1
Entering edit mode

Have a look at the BAM file in IGV. You'll need to visually determine if these results are sensible.

ADD REPLY
0
Entering edit mode

Dear Ryan, Thank you very much for your inputs, I looked into BAM file in IGV and the results are attached herewith for your kind perusal!

IGV image. Mapping image.

Also, I have attached mapping results

Actually, in .gff file, I am getting following warning, can it be the reason for low assignment to features

"Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_id' The attributes included in your GTF annotation are 'ID=id1;Parent=rna0;Dbxref=GeneID:29414814;anticodon=(pos:62644..62646);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Pro' "

$stat Status CBC1_1_1n.bam 1 Assigned 118940 2 Unassigned_Unmapped 14097 3 Unassigned_MappingQuality 0 4 Unassigned_Chimera 0 5 Unassigned_FragmentLength 0 6 Unassigned_Duplicate 0 7 Unassigned_MultiMapping 0 8 Unassigned_Secondary 0 9 Unassigned_Nonjunction 0 10 Unassigned_NoFeatures 4759739 11 Unassigned_Overlapping_Length 0 12 Unassigned_Ambiguity 0

I tried using .gtf file,

but i got the following error "Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file './.Rsubread_featureCounts_pid60181': No such file or directory"

Can you kindly help in solving these issues, I will be grateful. Many Thanks In Advance

ADD REPLY
1
Entering edit mode

GFF files are always problematic. I strongly encourage you to directly use featureCounts from the command line rather than via R. You'll have less to debug that way, though in this case I presume there's a typo somewhere in the file name or path.

ADD REPLY
0
Entering edit mode

Dear Ryan, Thank you very much for your inputs.

I could solve the problem by putting the following command line

fc <- featureCounts(files = "CBC1_1_1.bam, CBC2_1.bam, CBT2_1_1.bam, CBT2_1.bam, CBC2_1_1.bam, CBC1_1_1b.bam, CBT1_1_1.bam, CBT1_1.bam", annot.ext = "annotate.gff",isGTFAnnotationFile = TRUE,GTF.featureType = "CDS", isPairedEnd = TRUE, GTF.attrType = "ID")"

Also, I determined feature counts using HTSeq, But now the issue is since I am analyzing bacterial RNA seq and I am focussing on small genes which are a part of a larger operon. How feature count Rsubread and HTSeq count take care of that. Which one will be strict in taking count of small genes also??

Although I came across "count_PE_fragments.py [options]" which takes care of small genes, but the problem is I could not clone it in ubuntu. I could clone it in Mac but the command "count_PE_fragments.py [options]" (https://github.com/asafpr/count_PE_fragments) does not work. How can I run this command in ubuntu or Mac? Your response will be much appreciated. Thanks a lot in advance

ADD REPLY

Login before adding your answer.

Traffic: 1671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6