Entering edit mode
2.3 years ago
Texx
▴
20
I'm doing an RNAseq analysis and encounter an error when trying to perform featureCounts using Subread tool.
The error is the following:
ERROR: No paired-end reads were detected in paired-end read library.
and this is the command and the parameters:
featureCounts -T 8 -p -a Salmonella.gff3 -t CDS,exon,rRNA,tmRNA,tRNA -g ID -o counts.txt 1.bam
however if I omit the parameter -p the data is processed, but I'm not sure if what I receive is correct (e.g. double counting of the reads?).
A bit of background:
- the sequencing data was paired-end
- the mapping of the sequenced reads to respective genome was performed using bowtie2; 1 output BAM file was produced (I suppose paired-end reads were merged into 1 file) from 2 input files containing paired end reads
So my question is whether is it correct to omit the -p parameter and process the data as single end reads? If not, what should I do?
You should find out why that happens. Please show the bowtie command, the error is probably in there.
Actually I now realized that even with the single-end processing it didn't work, there were 0 successfully assigned alignements.
Here's the bowtie command:
1.fq.gz already contains merged paired-end reads and was obtained before via BBsplit (because of slight contamination of sequencing reads), i.e. decon_sal.fq.gz was renamed to 1.fq.gz and fed to bowtie2.
And that's the counts.txt.summary from single-end processing
Assuming you are using a new version of
featureCounts
you should also add the second option below to make sure reads are counted as pairs.I still receive the same error, please check my comment above, could it be that there's something wrong with the BAM file?
Are the chromosome names matching in your BAM file/GTF etc? That is generally the first thing to check.
You say that you decontaminated the reads but what was the contamination from? Bacterial genomes can be very similar so try to
bbsplit
the data may not work that well. Looking at the command aboveout_sal=decon_sal.fq.gz
would actually contain reads that aligned to the Salmonella genome.clean*
reads are actually reads that did not align to both Sal/Enterobacter genomes. I don't think that is what you wanted correct?merged
as in actually merged to create a long single read or interleaved to create a single file from R1/R2 reads?How many reads did you lose in the process? What kind of alignment % did you get from your alignment to reference?
The sample was composed mainly of Salmonella and slightly contaminated with Enterobacter; I wanted only Salmonella genome so I kept only the reads that aligned to Salmonella genome. That should be fine?
Interleaved would be the correct term, at least that's what I intended. From what I understood this command produces interleaved
decon_sal.fq.gz file
from R1, R2 readsin1=trimmed_1.fq.gz in2=trimmed_2.fq.gz
Here's the log from BBSplit, so I guess the majority of reads were preserved.
Regarding the alignment to the reference I seem to have lost the stats file somewhere but it was very high (the reference Salmonella genome and the employed strain of Salmonella bacteria are the same).
I also have a feeling that the names might not be matching, I did this:
and this is the start of the GFF3 file
But that still doesn't explain why featureCounts doesn't detect the bam file as paired-end, any idea about that?
If you did not tell
bowtie2
that the reads were interleaved then they were simply treated as single end reads. So the BAM file that you currently have is incorrect.You can either re-do the alignments with correct flag that ATPoint mentions (exists) or split the interleaved reads into two files by doing
You can then use these files as input
-1 and -2
tobowtie
.You could have simply aligned the data using
bbmap.sh
instead of moving to bowtie (as long as you havesamtools
available BBMap will directly create BAM files).We still can't see what the chromosome name is. What does following show?
The chromosome name you see there, is it the same in your GTF file and your BAM?
Alright, I'll try to fix the problem regarding the alignment of paired-end data.
Regarding the
this is what I get:
Those ID's are pretty ugly and they are likely not matching your annotation file. They are likely to break something in this myriad chain of programs.
If you are going to redo the alignments then make sure your chromosome names are simplified and match what is in the annotation.
The IDs were indeed a problem, the processing worked after changing them. Thanks for all the help!
I do not know bbsplit and how it returns reads but in general, make sure that if you input a paired-end fastq set with two files (R1 and R2) that you get in return two files with the cleaned reads. I do not know what "merged" here means. If it means you have in a single file alternating R1-R2-R1-R2 etc (called interleaved) then you have to tell bowtie that it is interleaved. It has a flag to read that format, check its manual for it. Then as genomax says make sure the chromosome names in bam and gtf match.
I added another comment above in case it helps further. But you would then suggest that the problem with featureCounts is because bowtie2 didn't have "interleaved" specified but -U instead (unpaired reads)?
How to verify that the names in BAM and GFF3/GTF match?