Hello!
I am working with smallRNA data. I have analyzed the fastq with fastqc, and I saw that there were illumina small RNA 3' adapter, in fact my sequence length distribution were centered on 51. Therefore I have used cutadapt in order to remove that adapter and my sequence length distribution changed: https://drive.google.com/file/d/0B4m6-7p8GFwIa3B5VGRmT3FjUDA/view?usp=sharing
After that I aligned my reads against reference genome (hg38) with botwie, using default parameters, in order to see how it performed. I obtained a very very low percentage of mapped read (0.30%).
I have checked again my fastq file with fastqc and I saw that there were several kmers at the end of the reads. Is it normal?
I have upload all images from fastqc at this link: https://drive.google.com/open?id=0B4m6-7p8GFwIQmppNjNQVm5BRVU Are there other adapter that I should trim? At the beginning of the reads I saw that in some case there were N, should I trim them?
I reported here and extract of my fastq:
@HISEQ2500:231:C9L77ACXX:1:2316:21153:100286 1:N:0:NTAGCT
AAGCCGCCAGTTGAAGAACTGT
+
<7<B00<<0<BFBFFIIIIIII
@HISEQ2500:231:C9L77ACXX:1:2316:21183:100346 1:N:0:NTAGCT
CTCCAGGCCGAGGAC
+
<B<<<0<<BB<0<BB
@HISEQ2500:231:C9L77ACXX:1:1101:1376:1894 1:N:0:CTAGCT
NAGCTTATCAGACTGATGTTGA
+
#00BBFFFFFFFFFFFIIIIBF
@HISEQ2500:231:C9L77ACXX:1:1101:1314:1913 1:N:0:CTAGCT
NGCTACATCTGGCTACTGGGTCT
+
#0<FFFFFFFFFFIIIIIIIIII
As you can see, in the first read there isn't a N at the beginning of the read, but it is presented in the index of the reads. In the last read exactly the opposite is happening: N at the beginning of the read, but not in the index of the reads.
How should fix that issue?
Thank you in advance
Best
small RNA data analysis requires pre-processing of the data in specific ways (based on the kit used etc). You may want to try a dedicated pipeline (e.g. miRquant or miRdeep2 ) for this purpose.