A method for extracting two variable regions from a FASTA file containing long reads (see image)
0
0
Entering edit mode
5.2 years ago
zack.saud ▴ 50

Hi all,

I'm looking for help or suggestions regarding a technique for extracting the variable regions in the image. Does anyone know of any tools (or a script) I could used to extract both of the variable regions from each read, by using the constant regions that flank the two variable regions. Ideally I would like a tool that could extract both of these variable regions from each read, and either place them in Excel, or create another FASTA file where the two variable regions from each read are linked together (ie with an x inbetween.

Many thanks in advance

enter image description here

sequencing next-gen • 1.3k views
ADD COMMENT
1
Entering edit mode

Can you clarify if these are amplicons or whole genome reads? What format were they in originally, fastq/fasta? What does the image represent? BAM or fasta multiple sequence alignment?

If this is a BAM alignment it would be easy to extract reads that span the intervals you are looking for by using samtools view but depending what kind of reads these are, it would be tricky to extract the specific nucleotides that represent 350-375 bp and 450-475 bp in image above.

ADD REPLY
0
Entering edit mode

Hi genomax, I have files with both amplicon (450 bp) AND whole plasmid reads (4800 bp) from the same sample. I have each file in both fastq and fasta. The image is an aligment of my reads to the backbone sequence with the variable region deleted (minimap2), it has no significance, I just hoped it might help explain the problem I am attempting to solve. The alignment produced a BAM file. I'll give SAMtools view a try, thank you.

ADD REPLY
0
Entering edit mode

Have you looked up "regular expressions"?

ADD REPLY
0
Entering edit mode

Hi Swabarnes,

I'd never heard of regular expressions, but upon reading up in it, it seems to be exactly what I need! Do you know of any tools that incorporate it for Fasta/Fastq files? Or does one of the linux tools (grep, sed, awk) contain regular expressions?

Many thanks

ADD REPLY
0
Entering edit mode

A fastq file is a plain text file, just gzipped. You don't need special software to handle it. You can apply any programming language, like python or perl or R, or perhaps you could string a bunch of unix commands together starting with grep.

Though now that I see your image, your "constant" region is not very constant. I'd rather parse the sam file, because the aligner will have already handled the fact that the flanking sequence isn't perfect, and you can eyeball the coordinates you want.

ADD REPLY
0
Entering edit mode

There is no image...

ADD REPLY

Login before adding your answer.

Traffic: 2473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6