Hi guys, I have a question. After running fasqc, I've discovered that some of my reads has overrepresnted sequences as follow
fastq.R1
Sequence Count Percentage Possible Source
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAA 141860 0.4972976921930607 TruSeq Adapter, Index 13 (97% over 40bp)
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 55712 0.19530134659142676 No Hit
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTAT 50886 0.17838354973167975 TruSeq Adapter, Index 13 (97% over 40bp)
fastq.R2
Sequence Count Percentage Possible Source
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 72457 0.2540018249205738 No Hit
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 61284 0.21483428569265142 No Hit
but the adapter content is perfect. I would like to remove those adapters or overrepresented sequences. I've never done that before in PE, so I'm trying to figure out. At that moment I'm trying:
cutadapt -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAA -a NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTAT -A GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -A NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN -o out.1.fastq -p out.2.fastq R1.fastq R2.fastq
any one with experience could tell me if this is right?
Thank you in advance!
If someone else has the same issue, I would like to add more information. If you are planning to map the sequences with STAR, for example, you may have an error like
EXITING because of FATAL ERROR in reads input: short read sequence line
. It can be solved if you add the parameter-m N
, in my case I've chosen it based on the minimum Sequence length reported in fastqc. I hope this may help!