Hi,
I am relatively new to bioinformatics and analysing sequencing data. I have come across this paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2765267/) that uses Illumina to perform small RNA sequencing. I am currently struggling to successfully remove the adapters from the raw reads.
I have read in other posts to perform FASTQC to see overrepresented sequences, finding on my fastq files among others the following:
TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA 578279 3.6543888098715436 Illumina Single End Adapter 1 (100% over 21bp)
TTACAAAGGTCGTATGCCGTCTTCTGCTTGAAAAAA 86817 0.548633226014809 Illumina PCR Primer Index 1 (95% over 22bp)
AAACTCTGAATTCTTCTATCGTATGCCGTCTTCTGC 82130 0.5190140969233706 TruSeq Adapter, Index 13 (95% over 21bp)
GTAGTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA 81426 0.5145652241091243 Illumina Single End Adapter 1 (95% over 22bp)
TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAATA 66058 0.41744835278904197 Illumina Single End Adapter 1 (100% over 21bp)
GCTACTCGTATGCCGTCTTCTGCTTGAAAAAAAAAA 60402 0.38170570415640365 TruSeq Adapter, Index 22 (95% over 24bp)
TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAT 55195 0.3488004758271696 Illumina Single End Adapter 1 (100% over 21bp)
I have used this information to create a fasta file with these sequences and trimmed them with cutadapt like the following
cutadapt file.fastq.gz -a file:file.fasta -o trimmed_file.fastq
The main issue I get is that most of the reads from this output are empty or shorther than 18 nt which suggests that the trimming has not been successful as these reads should be around 25nt ( it investigates miRNAs which are about that size). I was hoping someone have any suggestions to resolve this. Thanks
Thanks a lot!! I solved it now:)