If you look a bit closer, you can see that there is a pair of linkers in each of the reads.
The linkers are:
A - CCGCGATAT CTTA TCCAAC
B - CCGCGATAT ACAT TCCAAC
The linkers differ in only four bases in the middle.
I would like to split the fastq file into two, or trim each time either sides of the fastq file exactly between these two linkers (but still keep the quality values for the reads).
thanks for the help.
Unfortunately I was wrong in my description.
The combination of linkAlinkA is not correct. What I need is revcomp(linkA)linkA and rhe same for linker B. The reverse complement function I have already found:
sub reverse_complement {
my $dna = shift;
# reverse the DNA sequence
my $revcomp = reverse($dna);
# complement the reversed DNA sequence
$revcomp =~ tr/ACGTacgt/TGCAtgca/;
return $revcomp;
}
To make the script works better I would also like to add two things (if possible)
Is it possible to not only split the reads in the middle, but also trim them?
I would like to have for the left linker also 20 bases before the linker and for the right linker I would like to have the 20 bases after the linker sequences.
Is it possible to add an option for possible mismatches?
Sometimes due to sequencing errors the linker sequences is not as exact as given in the pattern for linker A and B. Is it possible to add an option for adding mismatches?
This will remove your linker sequence completely and separate the sequences into two with same detail lines. The result for your query will be the following:
Hi, thanks for that, But I do need the linker sequence for further down-stream analysis. But I can still do it the same way, just without loosing the linkers.
thanks for the help. Unfortunately I was wrong in my description. The combination of linkAlinkA is not correct. What I need is revcomp(linkA)linkA and rhe same for linker B. The reverse complement function I have already found:
To make the script works better I would also like to add two things (if possible)
Is it possible to not only split the reads in the middle, but also trim them? I would like to have for the left linker also 20 bases before the linker and for the right linker I would like to have the 20 bases after the linker sequences.
Is it possible to add an option for possible mismatches? Sometimes due to sequencing errors the linker sequences is not as exact as given in the pattern for linker A and B. Is it possible to add an option for adding mismatches?
Thanks Assa