Hi guys,
I have a question about adapter identification.
I have the raw fastq.tar.gz files from an RNAseq experiment.
I am trying to replicate the pre-processing of a service provider as a learning experience. I have run the FastQC and see that the quality is good and generally all seems correct. The over-represented sequences does not contain any sequences. So all looks well. Now I am trying to trim off the adapters but alas I do not know which adapters were used so I though if I provided the illumina_adapters.fa and to cutadapt (Version 1.4.1) using the following command line:
cutadapt -b file:illumina_adapters.fa -m 15 -O 10 -e 0.1 Sample_file.fastq -o trimmed_Sample_file.fastq
I am using the same parameters as the service provider however when I run it this way I seem to trim approximately 3,000 reads but the service provider trims only 500 reads.
Using the
P5 - AATGATACGGCGACCACCGA
Reverse compliment P7 - TCGTATGCCGTCTTCTGCTTG
Sequences I am able to pull what I think are adapter dimers. If this is the case am I correct in thinking I should be able to find the adapter sequence?
grep 'P5 Sequence' Sample_file.fastq
GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACAAAGT
GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACAGT
GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACTGT
GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACTGT
CTAAAGCTTCACACTTGATC-AATGATACGGCGACCACCGA-ACCCACTTTGC
grep 'P7 Sequence' Sample_file.fastq
CAAATGTATTTTAATAAGGTGATG-`TCGTATGCCGTCTTCTGCTTG-`AAAAAA
CTAAAGCTTCACACTTGATCAGGGATC-TCGTATGCCGTCTTCTGCTTG-AAA
CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAA
CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAA
CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAC
GTCGATGAGAGCCCAGAAATGTGAGAAAA-TCGTATGCCGTCTTCTGCTTG-A
Which part would be the adapter in this case?
Thanks in advance
It's always good to know from your service provider, which kit was used for preparing libraries.