Question

Small RNA Seq Data-Removing Adapters- Illumina GA2 instrument

0

Entering edit mode

9 weeks ago

jspe • 0

Hi,

I am relatively new to bioinformatics and analysing sequencing data. I have come across this paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2765267/) that uses Illumina to perform small RNA sequencing. I am currently struggling to successfully remove the adapters from the raw reads.

I have read in other posts to perform FASTQC to see overrepresented sequences, finding on my fastq files among others the following:

TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA    578279  3.6543888098715436  Illumina Single End Adapter 1 (100% over 21bp)
TTACAAAGGTCGTATGCCGTCTTCTGCTTGAAAAAA    86817   0.548633226014809   Illumina PCR Primer Index 1 (95% over 22bp)
AAACTCTGAATTCTTCTATCGTATGCCGTCTTCTGC    82130   0.5190140969233706  TruSeq Adapter, Index 13 (95% over 21bp)
GTAGTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA    81426   0.5145652241091243  Illumina Single End Adapter 1 (95% over 22bp)
TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAATA    66058   0.41744835278904197 Illumina Single End Adapter 1 (100% over 21bp)
GCTACTCGTATGCCGTCTTCTGCTTGAAAAAAAAAA    60402   0.38170570415640365 TruSeq Adapter, Index 22 (95% over 24bp)
TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAT    55195   0.3488004758271696  Illumina Single End Adapter 1 (100% over 21bp)

I have used this information to create a fasta file with these sequences and trimmed them with cutadapt like the following

cutadapt file.fastq.gz -a file:file.fasta -o trimmed_file.fastq

The main issue I get is that most of the reads from this output are empty or shorther than 18 nt which suggests that the trimming has not been successful as these reads should be around 25nt ( it investigates miRNAs which are about that size). I was hoping someone have any suggestions to resolve this. Thanks

Sequencing • 309 views

ADD COMMENT • link 9 weeks ago by jspe • 0

score 0 · Answer 1 · 2024-07-16

0

Entering edit mode

9 weeks ago

GenoMax 145k

See my answer here for options: How to figure out adapter sequence to use for trimming old miRNAseq data