Entering edit mode
5.2 years ago
zack.saud
▴
50
Hi all,
I'm looking for help or suggestions regarding a technique for extracting the variable regions in the image. Does anyone know of any tools (or a script) I could used to extract both of the variable regions from each read, by using the constant regions that flank the two variable regions. Ideally I would like a tool that could extract both of these variable regions from each read, and either place them in Excel, or create another FASTA file where the two variable regions from each read are linked together (ie with an x inbetween.
Many thanks in advance
Can you clarify if these are amplicons or whole genome reads? What format were they in originally, fastq/fasta? What does the image represent? BAM or fasta multiple sequence alignment?
If this is a BAM alignment it would be easy to extract reads that span the intervals you are looking for by using
samtools view
but depending what kind of reads these are, it would be tricky to extract the specific nucleotides that represent 350-375 bp and 450-475 bp in image above.Hi genomax, I have files with both amplicon (450 bp) AND whole plasmid reads (4800 bp) from the same sample. I have each file in both fastq and fasta. The image is an aligment of my reads to the backbone sequence with the variable region deleted (minimap2), it has no significance, I just hoped it might help explain the problem I am attempting to solve. The alignment produced a BAM file. I'll give SAMtools view a try, thank you.
Have you looked up "regular expressions"?
Hi Swabarnes,
I'd never heard of regular expressions, but upon reading up in it, it seems to be exactly what I need! Do you know of any tools that incorporate it for Fasta/Fastq files? Or does one of the linux tools (grep, sed, awk) contain regular expressions?
Many thanks
A fastq file is a plain text file, just gzipped. You don't need special software to handle it. You can apply any programming language, like python or perl or R, or perhaps you could string a bunch of unix commands together starting with grep.
Though now that I see your image, your "constant" region is not very constant. I'd rather parse the sam file, because the aligner will have already handled the fact that the flanking sequence isn't perfect, and you can eyeball the coordinates you want.
There is no image...