Entering edit mode
8.2 years ago
shg018
▴
10
Hi I have been trying to run featureCounts on my mapped bam extension files with the command: featureCounts -s 2 -t exon -g gene_id -a /oasis/.../genes_l2.gtf -o /oasis/.../CompleteRNA.txt file.bam
and have been getting:
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v1.5.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| P FF1_1_R.bam ||
|| ||
|| Output file : /oasis/tscc/scratch/shagupta/L2_oocyte/Final ... ||
|| Summary : /oasis/tscc/scratch/shagupta/L2_oocyte/Final ... ||
|| Annotation : /oasis/tscc/scratch/shagupta/Genome_new/gene ... ||
|| Dir for temp files : /oasis/tscc/scratch/shagupta/L2_oocyte/Final ... ||
|| ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : no ||
|| Strand specific : reversely stranded ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Overlapping bases : 0.0% ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\\
|| ||
|| Load annotation file /oasis/.../genes_l2. ... ||
|| Features : 690739 ||
|| Meta-features : 46170 ||
|| Chromosomes/contigs : 165 ||
|| ||
|| Process BAM file FF1_1_R.bam... ||
|| Paired-end reads are included. ||
|| Assign reads to features... ||
featureCounts: readSummary.c:1578: process_pairer_header: Assertion `l_name < 100' failed.
/var/spool/torque/mom_priv/jobs/6660772.tscc-mgr.local.SC: line 18: 668 Aborted featureCounts -s 2 -t exon -g gene_id -a /oasis/.../genes_l2.gtf -o /oasis/.../CompleteRun1.txt ${file}
--------------------------------------------------------------------------------------------------------------------------------------------------------------
I was wondering about the possible reason for said error. Can it be due to low mapping percentage in bam file.
This suggests that you have a read whose name is at least 100 character long. Apparently the buffer used by featureCounts can't handle longer. That's not really a problem with you BAM file, though I've never seen a real file with a read name that long.
Thanks!! This really helped! :)
Would you recommend any way of fixing this, without going and manually editing gene name in the annotation file?
You mean the name in the BAM file...that and finding where this is coded in featureCounts and changing it are the only solutions. You might just give htseq-count a try (it's much slower, so keep that in mind).
So, I used htseq-counts to read counts this time, using the sam files this time, and I get a file with 0 reads:
and a warning during runtime:
and I was wondering if the output is because of this warning, (if you could also tell me what this warning means?). Or could the probable cause be that my gtf file was wrong (I used long-noncoding RNA specific gtf file which I also used in the genome build).
The warning isn't a big deal, the buffer that it uses to store a read while looking for its mate isn't that big.
If you alignments matched a feature then presumably your BAM and GTF files use different chromosome names (featureCounts can handle that, htseq-count can't).
But if I mapped using the same gtf file, that is I included the annotation file while building the genome, then how would my BAM/ SAM(in this case), have different names? Shouldn't it be continuous across all the files?
What matters is the fasta file chromosome names, if those match the GTF then the BAM will too. If not then you'll have issues.
If they do match, have a look at things in IGV.
Hi, sorry to keep bugging you, but what exactly do I look for in IGV?
Also, I used my gtf and fa files from http://www.gencodegenes.org/mouse_releases/current.html since I needed specific ones for lon-non coding rna's, and I don't observe this problem when I map to a complete genome! Would you recommend any other site to get lnc-specific gtf and fa files from to build my genome?
Hello shg018!
It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/87502/
This is typically not recommended as it runs the risk of annoying people in both communities.