Entering edit mode
3.5 years ago
aranyak111
•
0
I have 4 micro RNA fast Q files in question which have been trimmed using BBDuk for adaptor contamination. I have used human hairpin sequences from mirbase 22 to build the reference and human miRNA GTF file from the mirbase22 website. After the alignment, I have used feature counts from the Rusbread package to count the reads. The script I have used is
featureCounts -t miRNA -g Name -O -s 1 -M -a /gpfs/ycga/scratch60/nicoli/ag2646/Capmirseqtestdata/hsa.gff3 -o /gpfs/ycga/scratch60/nicoli/ag2646/Capmirseqtestdata//SRR326279_R1__feature_count.txt /gpfs/ycga/scratch60/nicoli/ag2646/Capmirseqtestdata/SRR326279_R1.sorted.bam
The feature count summary file looks like this
Assigned 0
Unassigned_Unmapped 151820
Unassigned_Read_Type 0
Unassigned_Singleton 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 0
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 164217
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 0.
The first few lines of the feature count file look like this
Geneid Chr Start End Strand Length /gpfs/ycga/scratch60/nicoli/ag2646/Capmirseqtestdata/SRR326279_R1.sorted.bam
hsa-miR-6859-5p chr1;chr1;chr15;chr16 17409;187931;101973529;17092 17431;187953;101973551;17114 -;-;+;- 92 0
hsa-miR-6859-3p chr1;chr1;chr15;chr16 17369;187891;101973569;17052 17391;187913;101973591;17074 -;-;+;- 92 0
hsa-miR-1302 chr1;chr12;chr15;chr19;chr2;chr2;chr20;chr7;chr8;chr9;chr9 30438;112695082;101960504;72045;113583004;207269328;50614689;18127234;141786255;30216;97363596 30458;112695102;101960524;72065;113583024;207269348;50614709;18127254;141786275;30236;97363616 +;-;-;+;-;-;-;-;-;+;- 231 0
hsa-miR-12136 chr1 632668 632685 - 18 0
hsa-miR-200b-5p chr1 1167124 1167145 + 22 0
hsa-miR-200b-3p chr1 1167160 1167181 + 22 0
hsa-miR-200a-5p chr1 1167878 1167899 + 22 0
Although I can see that the coordinate files are clearly mentioned, I still cannot get the counts of the alignment files. Any help will be useful.
I assume you added
"
to somehow show us parts of your command line? They should not be there in an actual command line. I suggest you edit the post and clean them out. Use101010
button to format code.Some options to try
Unassigned_NoFeatures: The fragment mapped to a region that is not annotated in the annotation file.
this post may help
Yes even with -s 0/1/2 the alignment mapped is 0.
did you check contig/chrom names are same between your ref and annotation file and annotation comes from same build of your reference?
1) The build is the same as both the sequence file used for reference and the gtf file has taken from the latest version of miRBase. The sequence file used for building the reference is the Human_mature_DNA.fa downloaded and extracted from mirBase. The first few lines of the sequence file used for building the reference is
There is no chromosomal information there. It will be easier for me to understand if you explain wherein the sequence file you hope to see the chromosomal position or names.
Since your reference is so short you could simply use
samtools idxstats
to get counts for these miRNA.I have tried this tool and I have one follow-up question. Some of the lines of the output file look like this
the documentation for samtools id stats says that the first column is the name of the reference miRNA, the second column is the length of reads, the third column is the mapped sequence and the fourth one is the unmapped sequence.
So my count file, in this case, would be the third column which is 10 reads for the first miRNA and 0 for the second?