To create a multi fasta file which contains each gene's exons
2
0
Entering edit mode
3.1 years ago
Berk • 0

enter image description here

Can you do this , this is beyond my learnings.

orf Python fasta • 2.6k views
ADD COMMENT
3
Entering edit mode

I was gonna post 'sounds like a homework assignment' ... but this is beyond obvious apparently :)

ADD REPLY
0
Entering edit mode

Yeah , good instinct.😂

ADD REPLY
0
Entering edit mode

So the question is ,can someone do it? İ cant do this with my knowledge 😁

ADD REPLY
0
Entering edit mode

perhaps one of your team mates ? (I see it's a group assignment ;) )

ADD REPLY
1
Entering edit mode

Hi Berk

this is beyond my learnings

I guess they are the objectives of the assignment itself;

I'd also think that -as good start for a fruitful discussion- you could describe up to where your learnings brings you along the task.

ADD REPLY
1
Entering edit mode
3.1 years ago

The bed2faidxsta.pl script will convert your ORFs to FASTA. This gets you halfway there; you'll need to stitch exons together afterwards:

You'll need to install samtools and likely use the --fastaIsUncompressed option, i.e.:

$ bed2faidxsta.pl --fastaIsUncompressed < ORFs.bed > ORFs.fa

You'll need a file called NC_007605.1.fa in the same directory as ORFs.bed, which contains the sequence data for that chromosome.

You'll also need to edit line 67 to get FASTA headers to look the way you want. I won't explain how, but it should be straightforward from the code. As a further hint, how you process $id can help you stitch exons together further down the line.

Note: In the future, try to avoid posting graphics to questions. Graphics can't be indexed for search engines, the way text can. This question may get deleted by a moderator, for that reason.

ADD COMMENT
0
Entering edit mode

I tried this on python after installing samtools ,tested samtools for if its working and its ok. But code errored for some reason , even i edited line67.How can i fix it ?

ADD REPLY
1
Entering edit mode
#!/usr/bin/env perl 

is not python

ADD REPLY
0
Entering edit mode
3.1 years ago
Berk • 0
handle = open("new_file.txt", "w")
handle.write("My FIRST line of text.\n")
handle.write("My SECOND line of text.\n")
handle.write("My THIRD line of text.\n")

handle.close()
handle = open("new_file.txt","r")
handle.readline()
saved_line = handle.readline()
print(saved_line+ "is my favorite.")
print(saved_line.strip() + "is my favorite.")
for line in handle:

print (line.strip())

handle.close()
import sys

GeneBedfile = sys.argv[1]

with open(GeneBedfile) as bfile:
for line in bfile:
start = int(line.strip().split("\t")[1])
end = int(line.strip().split("\t")[2])
geneName = str(line.strip().split("\t")[3])
print(geneName, "gene", "starts at", start, "ends at", end)
ADD COMMENT
1
Entering edit mode

fyi we have markdown support. Select the code and press the 10101 button or simply use the markdown backticks like below to highlight your code:

```python
python...code
```

Or alternatively paste the code into a Gist, and then simply copy/paste the link here as Alex Reynolds does above, the code will then be displayed automatically.

ADD REPLY
0
Entering edit mode

İ think i didint understand it , can you explain it again , i need to do what alex did on python 😅.If you guys make me do this correctly , you ll do a great thing for.someone

ADD REPLY
1
Entering edit mode

You won't learn much if people write out the full solution for you. You have been given about 80% of the solution, both in Python and Perl. Work out the rest.

ADD REPLY
1
Entering edit mode

You'll need the chromosome as well. If you want to use Python, instead of Perl, here's one thought:

with open(geneBedFilename, 'r') as bedFileHandle:
    for line in bedFileHandle:
        (chromosome, start, end, geneName) = line.rstrip().split('\t')
        start = int(start)
        end = int(end)

Once you have this, you can build a samtools query with the values of chromosome, start, and end. You could then run that query with the subprocess Python library. You then take the output sequence and write that out with the header, via geneName. Remember that you still need to stitch exons, but this gets you most of the way there.

Another way to do this — without using samtools — is to read the genome FASTA into memory, into a Python dictionary.

In a dictionary key-value pair, the chromosome name is the key and the chromosome's sequence is the value:

currentChromosome = ''
genomeFasta = {}
with open(genomeFastaFilename, 'r') as genomeFastaFileHandle:
    for line in genomeFastaFileHandle:
        line = line.rstrip()
        if line.startswith('>'):
            currentChromosome = line.lstrip('>').split(' ')[0]
            genomeFasta[currentChromosome] = ''
        else:
            genomeFasta[currentChromosome] += line

Once you have read the viral genome into memory, you can then use the start and end values as offsets into the chromosome's sequence:

with open(geneBedFilename, 'r') as bedFileHandle:
    for line in bedFileHandle:
        (chromosome, start, end, geneName) = line.rstrip().split('\t')
        start = int(start)
        end = int(end)
        sequence = genomeFasta[chromosome][start:end-1]
        sys.stdout.write('>{}\n{}\n'.format(geneName, sequence))

This approach is okay for a viral genome as they don't tend to be very long (the longest known virus is about 1.3Mb). But this approach is probably not so practical for larger genomes, like human or mouse, where an indexed search is preferred. You may be better off learning how this is done via samtools.

ADD REPLY

Login before adding your answer.

Traffic: 2748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6