Hi,
I have sent some samples to a company for Illumina whole genome sequencing. I have two questions:
1) They told me that adaptors and barcodes have been trimmed from the raw data and that only primers were left. The info given is: "To trim the primers use the trimLeft argument in the filterAndTrim function of dada2. The size of the V3-V4 primer used for the project are 16 for forward and 24 for reverse."
But it seems to me that dada2 filterandtrim function also needs the original untrimmed files, which I do not have. They only sent me the trimmed files.
out <- filterAndTrim(fwd, filt.fwd, rev, filt.rev, trimLeft=c(16,24))
Am I wrong? Can you recommend another tool to remove primers?
2) I am also wondering if the primers have actually been removed (the person from the company could not reply to this question). If the primers were still there, I would have expected to see the same (16bp or 24 bp) sequence on the left end of each reads but i cannot see it. Here below an example of one of the forward reads I got from them:
@A00197:374:HH5YWDSX2:3:1101:2067:1000 1:N:0:CTTGTACACC+AAGCGCGCTT GTTTTCAACCAACACTGGTTCGGGCCTCCATACGGTGTTACCCGTACTTCACCCTGGCCAAGGGTAGATCACCTCGCTTCGCGTCTATTCCCAGCGACTTGTCGCCCGTTTCGCACTCGCTAACGCTACGGCTCGCTAACGCTTAACCTC + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF:,FF:F,,,,FFFFF:::,FF,:F,FFF:,,FFF,:F:,F,F,F,,:F,:F,FFFF,F @A00197:374:HH5YWDSX2:3:1101:7925:1000 1:N:0:CTTGTACACC+TAGCGCGCTT AGACAAACCTGTCGAGTATGCGGTCCACATGCGGCGCCTACCTGCCGATCGAATGATGGACCGTCTGCTCGCCCGCGGACAGGTCACTGCGCCCATGGTCCGTCGGCTGGCGGAGAAGATGGCTCGCTTCCATGAGACGGCTGAGACGAG + FF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A00197:374:HH5YWDSX2:3:1101:9281:1000 1:N:0:CTTGTACACC+AAGCGCGCTT ATGCAGGCTGATTGTCTGCTTACGGCGATCAAATCCGCCCACAGACGATACGCCATACTTGGGATGACGCACCAGCGTCCCCCGTTTGAGACCCAGCGACCGCGTACCACCCTGCGGTCGTCTGATGCCGCCGTGTGCGAACTGCAGCAT + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFF
Thanks in advance
Thank you Istvan for replying.
I forgot to mention that I did run a
fastQC
analysis and for all my fastq the analysis fails the "adapter content" part because of Nextera Transposase sequences. In addition, for some files but not all of them, it also fails the "overrepresented sequence" part. I also run afastp
analysis only searching for "overrepresented sequences" and I get a very long list for each file and I not sure which ones are my primers.1) I guess that I can remove the overrepresented sequences from the fastq files that had the "overrepresented sequence" part failed in the
fastQC
analysis.2) But I am not sure how to get rid of the Nextera Transposase sequences that have been found in all my files.
Thanks again
if you run
fastp
it will find, report and trim common adapters on its own, the nextera adapter will likely be:but it might be different with other sample preps