Question

Very few assigned reads during featurecount function of bioconductor in bacterial RNAseq analysis

0

Entering edit mode

6.8 years ago

basucsmcri • 0

Hi everybody, I am new to analyzing RNAseq data. Kindly give your valuable suggestions on the bacterial RNAseq issue. I have successfully mapped the RNAseq reads against the reference genome, after that, I am trying to do read count using the feature-count function of Bioconductor to proceed with differential gene expression. But the problem is only very small percentage (2.4%) of reads are assigned and majority of sequences remain unassigned viz

> library(Rsubread)
> featureCounts(files = "CBC1_1_1.bam", annot.ext = "corrected.gff", isGTFAnnotationFile = TRUE)

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.28.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S CBC1_1_1.bam                                   ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file corrected.gff ...                                     ||
||    Features : 124                                                          ||
||    Meta-features : 1                                                       ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file CBC1_1_1.bam...                                           ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 4892776                                                   ||
||    Successfully assigned reads : 116593 (2.4%)                             ||
||    Running time : 0.24 minutes                                             ||
||                                                                            ||
||                         Read assignment finished.                          ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

$counts
              CBC1_1_1.bam
transcript_id       116593

$annotation
         GeneID
1 transcript_id
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Chr
1 NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1;NZ_CP013252.1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Start
1 62611;321838;323588;326618;326743;326825;331182;331318;331745;331866;344966;452121;452231;452312;452395;452503;452586;452669;452777;452868;467092;467464;468234;473561;475216;475295;475477;478458;478580;478661;504421;640573;724864;724967;725156;725240;725341;725442;725524;725603;725690;725771;725862;725967;726057;726177;726264;726343;726434;726530;726651;742097;742194;742537;742983;743065;743144;743230;745977;747630;747813;750795;815945;816037;816130;816214;816306;816400;816482;840689;846331;1146456;1159998;1161798;1164859;1294963;1389306;1396035;2897591;3174053;66210;66292;84564;84644;84744;85035;88742;90378;90600;93868;99435;99921;100004;100266;103410;115028;115359;118628;147582;175236;211354;236279;236609;239754;242156;242486;245631;248032;248292;251341;251549;276479;276562;276690;276791;276874;276957;277084;277185;277268;277351;277479;1833054;2972855
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                End
1 62686;323359;326504;326734;326819;326900;331272;331408;331835;331956;345040;452209;452306;452388;452483;452578;452662;452757;452852;452956;467166;467538;468308;475082;475291;475371;478395;478574;478655;478735;504496;640657;724939;725040;725232;725315;725414;725518;725599;725677;725765;725855;725936;726040;726132;726251;726339;726427;726508;726603;726727;742173;742267;742610;743059;743140;743218;743305;747498;747706;750729;750869;816028;816112;816204;816297;816381;816474;816565;840763;846405;1146802;1161519;1164714;1164975;1295318;1389380;1396120;2897677;3174127;66285;66408;84639;84720;84860;87951;90265;90494;93516;95389;99510;99996;100120;103182;104931;115144;118275;120149;147848;175325;211428;236395;239525;241275;242272;245402;247152;248148;251208;251416;253070;276555;276637;276764;276865;276950;277032;277158;277259;277344;277426;277553;1833128;2973046
                                                                                                                                                                                                                                                   Strand
1 +;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;+;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-
  Length
1  58070

$targets
[1] "CBC1_1_1.bam"

$stat
                          Status CBC1_1_1.bam
1                       Assigned       116593
2            Unassigned_Unmapped        33087
3      Unassigned_MappingQuality            0
4             Unassigned_Chimera            0
5      Unassigned_FragmentLength            0
6           Unassigned_Duplicate            0
7        Unassigned_MultiMapping            0
8           Unassigned_Secondary            0
9         Unassigned_Nonjunction            0
10         Unassigned_NoFeatures      4743096
11 Unassigned_Overlapping_Length            0
12          Unassigned_Ambiguity            0

Kindly suggest me what could be the problem.

RNA-Seq R • 3.0k views

ADD COMMENT • link updated 6.8 years ago by Devon Ryan 104k • written 6.8 years ago by basucsmcri • 0

1

Entering edit mode

Have a look at the BAM file in IGV. You'll need to visually determine if these results are sensible.

ADD REPLY • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

Dear Ryan, Thank you very much for your inputs, I looked into BAM file in IGV and the results are attached herewith for your kind perusal!

IGV image. Mapping image.

Also, I have attached mapping results

Actually, in .gff file, I am getting following warning, can it be the reason for low assignment to features

"Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_id' The attributes included in your GTF annotation are 'ID=id1;Parent=rna0;Dbxref=GeneID:29414814;anticodon=(pos:62644..62646);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Pro' "

$stat Status CBC1_1_1n.bam 1 Assigned 118940 2 Unassigned_Unmapped 14097 3 Unassigned_MappingQuality 0 4 Unassigned_Chimera 0 5 Unassigned_FragmentLength 0 6 Unassigned_Duplicate 0 7 Unassigned_MultiMapping 0 8 Unassigned_Secondary 0 9 Unassigned_Nonjunction 0 10 Unassigned_NoFeatures 4759739 11 Unassigned_Overlapping_Length 0 12 Unassigned_Ambiguity 0

I tried using .gtf file,

but i got the following error "Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file './.Rsubread_featureCounts_pid60181': No such file or directory"

Can you kindly help in solving these issues, I will be grateful. Many Thanks In Advance

ADD REPLY • link 6.8 years ago by basucsmcri • 0

1

Entering edit mode

GFF files are always problematic. I strongly encourage you to directly use featureCounts from the command line rather than via R. You'll have less to debug that way, though in this case I presume there's a typo somewhere in the file name or path.

ADD REPLY • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

Dear Ryan, Thank you very much for your inputs.

I could solve the problem by putting the following command line

fc <- featureCounts(files = "CBC1_1_1.bam, CBC2_1.bam, CBT2_1_1.bam, CBT2_1.bam, CBC2_1_1.bam, CBC1_1_1b.bam, CBT1_1_1.bam, CBT1_1.bam", annot.ext = "annotate.gff",isGTFAnnotationFile = TRUE,GTF.featureType = "CDS", isPairedEnd = TRUE, GTF.attrType = "ID")"

Also, I determined feature counts using HTSeq, But now the issue is since I am analyzing bacterial RNA seq and I am focussing on small genes which are a part of a larger operon. How feature count Rsubread and HTSeq count take care of that. Which one will be strict in taking count of small genes also??

Although I came across "count_PE_fragments.py [options]" which takes care of small genes, but the problem is I could not clone it in ubuntu. I could clone it in Mac but the command "count_PE_fragments.py [options]" (https://github.com/asafpr/count_PE_fragments) does not work. How can I run this command in ubuntu or Mac? Your response will be much appreciated. Thanks a lot in advance

ADD REPLY • link 6.8 years ago by basucsmcri • 0