Questions about how to read a fastq file and trimming primers
2
0
Entering edit mode
2.7 years ago
valentinavan ▴ 50

Hi,

I have sent some samples to a company for Illumina whole genome sequencing. I have two questions:

1) They told me that adaptors and barcodes have been trimmed from the raw data and that only primers were left. The info given is: "To trim the primers use the trimLeft argument in the filterAndTrim function of dada2. The size of the V3-V4 primer used for the project are 16 for forward and 24 for reverse." But it seems to me that dada2 filterandtrim function also needs the original untrimmed files, which I do not have. They only sent me the trimmed files. out <- filterAndTrim(fwd, filt.fwd, rev, filt.rev, trimLeft=c(16,24)) Am I wrong? Can you recommend another tool to remove primers?

2) I am also wondering if the primers have actually been removed (the person from the company could not reply to this question). If the primers were still there, I would have expected to see the same (16bp or 24 bp) sequence on the left end of each reads but i cannot see it. Here below an example of one of the forward reads I got from them:

@A00197:374:HH5YWDSX2:3:1101:2067:1000 1:N:0:CTTGTACACC+AAGCGCGCTT GTTTTCAACCAACACTGGTTCGGGCCTCCATACGGTGTTACCCGTACTTCACCCTGGCCAAGGGTAGATCACCTCGCTTCGCGTCTATTCCCAGCGACTTGTCGCCCGTTTCGCACTCGCTAACGCTACGGCTCGCTAACGCTTAACCTC + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF:,FF:F,,,,FFFFF:::,FF,:F,FFF:,,FFF,:F:,F,F,F,,:F,:F,FFFF,F @A00197:374:HH5YWDSX2:3:1101:7925:1000 1:N:0:CTTGTACACC+TAGCGCGCTT AGACAAACCTGTCGAGTATGCGGTCCACATGCGGCGCCTACCTGCCGATCGAATGATGGACCGTCTGCTCGCCCGCGGACAGGTCACTGCGCCCATGGTCCGTCGGCTGGCGGAGAAGATGGCTCGCTTCCATGAGACGGCTGAGACGAG + FF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A00197:374:HH5YWDSX2:3:1101:9281:1000 1:N:0:CTTGTACACC+AAGCGCGCTT ATGCAGGCTGATTGTCTGCTTACGGCGATCAAATCCGCCCACAGACGATACGCCATACTTGGGATGACGCACCAGCGTCCCCCGTTTGAGACCCAGCGACCGCGTACCACCCTGCGGTCGTCTGATGCCGCCGTGTGCGAACTGCAGCAT + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFF

Thanks in advance

primers illumina fastq trimming • 3.2k views
ADD COMMENT
1
Entering edit mode
2.7 years ago

Run your data through FastQC, it will detect common adapters in the data. Then look at overrepresented sequences.

You can also run fastp - it will generate an HTML file where you can investigate kmers for hint.

Finally, you can count kmers in the data, though usually that is last resort.

ADD COMMENT
0
Entering edit mode

Thank you Istvan for replying.

I forgot to mention that I did run a fastQC analysis and for all my fastq the analysis fails the "adapter content" part because of Nextera Transposase sequences. In addition, for some files but not all of them, it also fails the "overrepresented sequence" part. I also run a fastp analysis only searching for "overrepresented sequences" and I get a very long list for each file and I not sure which ones are my primers.

1) I guess that I can remove the overrepresented sequences from the fastq files that had the "overrepresented sequence" part failed in the fastQC analysis.

2) But I am not sure how to get rid of the Nextera Transposase sequences that have been found in all my files.

Thanks again

ADD REPLY
0
Entering edit mode

if you run fastp it will find, report and trim common adapters on its own, the nextera adapter will likely be:

>nextera
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC

but it might be different with other sample preps

ADD REPLY
0
Entering edit mode
2.7 years ago
AfinaM ▴ 30

Did they give you the primers that they used in their sequencing? Once you have that, you can also add in the list of adapters before you run fastQC so that you can also check whether the primers are still in your sequence data. This is my way of checking after pre-processing so hope it helps.

ADD COMMENT
0
Entering edit mode

I wish! They did not want to tell me the primer sequence, they said they cannot give them out!

ADD REPLY
0
Entering edit mode

Huh that is weird. They should provide you the sequence so that you could also rerun the analysis/processing on your side for validation. Btw, for your second question, you can use bbduk to remove any other adapter. Check out this link: bbduk guide

ADD REPLY
0
Entering edit mode

some of the sequencing primers may technically be considered as trade secrets, but these usually are not present in the data and get recognized and cut off before the data is reported

standard adapters are not secret and are listed in many sources

https://github.com/usadellab/Trimmomatic/tree/main/adapters

ADD REPLY

Login before adding your answer.

Traffic: 1811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6