3prime tsRNA is a part of a tsRNA which has some implications in the field of cancer we are investigating as a group in Stanford Alignment strategy- I have a reference sequence of the length of 25 nucleotides. (The length of the sequence is itself 22 nucleotides along with a CCA tail). I am using a bowtie to build a reference sequence using default parameters to build the genome index and subsequently doing alignments. A representative reference sequence is
Homo_sapiens_tRNA-Lys-TTT-13-1-p GGTTCAATCCCTTGCTGGGGCGCCA
The fastq file which I want to align has small RNA sequences of varying lengths. The distribution of length for my fastq files is from 1 – 76 .Majority of the reads are in the range of 15-25 nucleotides. I am selecting each of the sequences based on barcode sequences and then doing adapter trimming for both the 3 prime and 5 primes. end.
Index building step- Bowtie2 index -bowtie-build miRNA.3tsRNA25bp.mt.nmt-tRNA.fa iRNA.3tsRNA25bp.mt.nmt-tRNA
Index alignment- Bowtie2 alignment- bowtie2 -x microRNA_data -U tsRNA_filtered_1. fastq -S ysRNA_filtered_1.sam
I am using samtools idxstats to count the number of alignments and the length of alignments is always 25 nucleotides 1) I have two related questions. Is it the fact that if I use a reference sequence that is of the length of 25 nucleotides then when we do alignment bowtie will it do an exact matching and will detect a length of only 25 nucleotides from the fastq file has read of varying nucleotide distribution?
2) I am interested in 3prime ts RNA reads that are of length from 14-25 nucleotides. Is there a way to detect reads of length by capturing all the reads that have count sizes within that range when I use the reference sequence of length 25 nucleotides? Any information will be helpful.