Counting reads with feature counts.
0
0
Entering edit mode
3.5 years ago
aranyak111 • 0

I have 4 micro RNA fast Q files in question which have been trimmed using BBDuk for adaptor contamination. I have used human hairpin sequences from mirbase 22 to build the reference and human miRNA GTF file from the mirbase22 website. After the alignment, I have used feature counts from the Rusbread package to count the reads. The script I have used is

featureCounts -t miRNA -g Name -O -s 1 -M -a /gpfs/ycga/scratch60/nicoli/ag2646/Capmirseqtestdata/hsa.gff3 -o /gpfs/ycga/scratch60/nicoli/ag2646/Capmirseqtestdata//SRR326279_R1__feature_count.txt /gpfs/ycga/scratch60/nicoli/ag2646/Capmirseqtestdata/SRR326279_R1.sorted.bam

The feature count summary file looks like this

Assigned    0
Unassigned_Unmapped 151820
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 0
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures   164217
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    0.

The first few lines of the feature count file look like this

  Geneid    Chr Start   End Strand  Length  /gpfs/ycga/scratch60/nicoli/ag2646/Capmirseqtestdata/SRR326279_R1.sorted.bam
    hsa-miR-6859-5p chr1;chr1;chr15;chr16   17409;187931;101973529;17092    17431;187953;101973551;17114    -;-;+;- 92  0
    hsa-miR-6859-3p chr1;chr1;chr15;chr16   17369;187891;101973569;17052    17391;187913;101973591;17074    -;-;+;- 92  0
    hsa-miR-1302    chr1;chr12;chr15;chr19;chr2;chr2;chr20;chr7;chr8;chr9;chr9  30438;112695082;101960504;72045;113583004;207269328;50614689;18127234;141786255;30216;97363596  30458;112695102;101960524;72065;113583024;207269348;50614709;18127254;141786275;30236;97363616  +;-;-;+;-;-;-;-;-;+;-   231 0
    hsa-miR-12136   chr1    632668  632685  -   18  0
    hsa-miR-200b-5p chr1    1167124 1167145 +   22  0
    hsa-miR-200b-3p chr1    1167160 1167181 +   22  0
    hsa-miR-200a-5p chr1    1167878 1167899 +   22  0

Although I can see that the coordinate files are clearly mentioned, I still cannot get the counts of the alignment files. Any help will be useful.

Genomics • 1.7k views
ADD COMMENT
0
Entering edit mode

I assume you added " to somehow show us parts of your command line? They should not be there in an actual command line. I suggest you edit the post and clean them out. Use 101010 button to format code.

ADD REPLY
0
Entering edit mode

Some options to try

  • -s 0/1/2 [Do u see same result?]

Unassigned_NoFeatures: The fragment mapped to a region that is not annotated in the annotation file.

this post may help

ADD REPLY
0
Entering edit mode

Yes even with -s 0/1/2 the alignment mapped is 0.

ADD REPLY
0
Entering edit mode

did you check contig/chrom names are same between your ref and annotation file and annotation comes from same build of your reference?

ADD REPLY
0
Entering edit mode

1) The build is the same as both the sequence file used for reference and the gtf file has taken from the latest version of miRBase. The sequence file used for building the reference is the Human_mature_DNA.fa downloaded and extracted from mirBase. The first few lines of the sequence file used for building the reference is

hsa-miR-15a-5p MIMAT0000068 Homo sapiens miR-15a-5p TAGCAGCACATAATGGTTTGTG hsa-miR-15a-3p MIMAT0004488 Homo sapiens miR-15a-3p CAGGCCATATTGTGCTGCCTCA hsa-miR-16-5p MIMAT0000069 Homo sapiens miR-16-5p TAGCAGCACGTAAATATTGGCG hsa-miR-16-1-3p MIMAT0004489 Homo sapiens miR-16-1-3p. Default scripts given in the bowtie website is given for building the reference file from there.

There is no chromosomal information there. It will be easier for me to understand if you explain wherein the sequence file you hope to see the chromosomal position or names.

ADD REPLY
0
Entering edit mode

Since your reference is so short you could simply use samtools idxstats to get counts for these miRNA.

ADD REPLY
0
Entering edit mode

I have tried this tool and I have one follow-up question. Some of the lines of the output file look like this

hsa-miR-26a-5p  22  10  0
hsa-miR-26a-1-3p    22  0   0

the documentation for samtools id stats says that the first column is the name of the reference miRNA, the second column is the length of reads, the third column is the mapped sequence and the fourth one is the unmapped sequence.

So my count file, in this case, would be the third column which is 10 reads for the first miRNA and 0 for the second?

ADD REPLY

Login before adding your answer.

Traffic: 1594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6