How to cut/extract/splice different region from a single fasta sequence.
1
1
Entering edit mode
7.6 years ago

Hello,

I have a long fasta sequence of nucleotide and I want to extract following region from that single sequence

Bradi2g06210.1.p    198 414
Bradi2g50510.3.p    85  321
Bradi2g52160.4.p    15  236
Bradi2g31600.1.p    262 495
Bradi4g35760.1.p    329 561
Bradi4g21025.1.p    35  250

The above ID's and given region are tab separated and

My fasta sequence are like

>Chr_1
CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTA
AACCCTAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACC
CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAAAACCCTAAACCCTAACCCTAAACCCTAAACCCTAAACCCT
AAACCCTAACTTTTTAATCCGAGTTGAACCTGACCCACAAGCTCAAGTCTTTGGCCTAGTCCTCAAGTTGTTGCATTCTT
CAGTGTCTTTGCATTGGGGCCTCCAAGTGTTGCCGATCCTCATCCTCGAGTAGGAAGATATTTACAAAGAGCCTTCTATT
AAATGTTTATCGTGAGATCATTAGGATATTCTACAAGCTTGAGAGGTTGCATGAATATGGATGATATTAACCTTGGAAGA
CACTAGAAAAATATCTTTACGGAGAATTAGGAAAACTCTAAAAGTAAGATATTTGTGTAGAGTATTCTAGAATAAACATA
TTTATAGGTATAAGTTTTGAATGGTCGTCACTTGTCACTAATCTAACAGTAATGTAATCTTATAAATAAGAGTGTTCTCC
TTTCCCATGAGTTGTACAAACAAAACACCAAGAAAAAATATCCCTATTAGAAGAGTTGCGAGCTACTCCCTTCTTCCCAT
TTTATTCAGCTTTTGCTATTCGTTCTGCTAGTTGTTCATCATGGATTCGTTGATGAAGAGATTTATTGTAACTCCAACCT
TCCATGTAAAGTTTTAGGACCTGTTGTGTTGTTGTGTAATTTAATTTAGAGTTACATAAAAACGCGAAGGCAAGAAAATT
AAATCACCAAAAACTTACATCCTTCGTATTTTCATCATCAACTGCACGAAGATTGCCTGTCACAGCAATTCCATCTTCAT
CAACTTAAACAATATTGATTGGAAAATAAGTAATTGTTCCATTATATGAAATGCAGTCAGTCGAATTTTATAATAATGTC
TTACATTACACAATGTCTTCCATACACAAAGTCTATCGAACTCAACATTCTCGAAGTCACCAGCTTCAGTTTGCAAAATA
TGAATAATTCCATGATGCTGCTCAAGAAATACAGCAAAGTATAAGAAAAAAAATACGGACATGCAAAACCTTCATGAACA
TCAAAATGTAGATAAAATACATACCAAGCCCAGCAGCTTAAACTCTCCGTTTCTGTATAAAACGGCAAACTTCTGGTTGT

How I can extract above defined region from this sequence.

Thank You

shell perl • 3.7k views
ADD COMMENT
0
Entering edit mode

Hello sanjeet00001992!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLY
0
Entering edit mode

Dear Sir, That was not a similar question, Because their user wanted to extract different region from different sequences But i want to extract different region from a single fasta sequence.

Thank You..

ADD REPLY
0
Entering edit mode

But i want to extract different region from a single fasta sequence.

this is a generalization of the problem in Extract Fasta Sequences Sub Sets isn't it ?

ADD REPLY
0
Entering edit mode

No Sir that deals with the different id and region and printed differently but here i have to extract multiple region along with corresponding ID from a single sequence.

I wanted to print ID's along with sequence like

Bradi2g06210.1.p CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACC CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAAAACCCTAAACCCTAACCCTAAACCCTAAACCCTAAACCCT AAACCCTAACTTTTTAATCCGAGTTGAACCTGACCCACAAGCTCAAGTCTTTGGCCTAGTCCTCAAGTTGTTGCATTCTT CAGTGTCTTTGCATTGGGGCCTCCAAGTGTTGCCGATCCTCATCCTCG

Bradi2g50510.3.p CAGTGTCTTTGCATTGGGGCCTCCAAGTGTTGCCGATCCTCATCCTCGAGTAGGAAGATATTTACAAAGAGCCTTCTATT AAATGTTTATCGTGAGATCATTAGGATATTCTACAAGCTTGAGAGGTTGCATGAATATGGATGATATTAACCTTGGAAGA CACTAGAAAAATATCTTTACGGAGAATT

like this.

Thank You...

ADD REPLY
0
Entering edit mode
7.6 years ago

This task can be performed with bedtools and the getfasta command

http://bedtools.readthedocs.io/en/latest/

where you would specify the ranges that you want to extract as a BED file.

bedtools getfasta

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.26.0
Summary: Extract DNA sequences from a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

Options: 
    -fi Input FASTA file
    -bed    BED/GFF/VCF file of ranges to extract from -fi
    -name   Use the name field for the FASTA header
    -split  given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
    -tab    Write output in TAB delimited format.
        - Default is FASTA format.

    -s  Force strandedness. If the feature occupies the antisense,
        strand, the sequence will be reverse complemented.
        - By default, strand information is ignored.

    -fullHeader Use full fasta header.
        - By default, only the word before the first space or tab is used.
ADD COMMENT
1
Entering edit mode

bedtools definitely a great choice.

bioperl also works, which allow you have more complicated action on the sequence, for example reverse complement. samtools has this function too, and more quick for big genome chromosome fasta files, it works on binary files as I remembered.

ADD REPLY

Login before adding your answer.

Traffic: 1211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6