Question

To create a multi fasta file which contains each gene's exons

0

Entering edit mode

2.9 years ago

Berk • 0

enter image description here

Can you do this , this is beyond my learnings.

orf Python fasta • 2.5k views

ADD COMMENT • link updated 2.9 years ago by Alex Reynolds 36k • written 2.9 years ago by Berk • 0

3

Entering edit mode

I was gonna post 'sounds like a homework assignment' ... but this is beyond obvious apparently :)

ADD REPLY • link 2.9 years ago by lieven.sterck 15k

0

Entering edit mode

Yeah , good instinct.😂

ADD REPLY • link 2.9 years ago by Berk • 0

0

Entering edit mode

So the question is ,can someone do it? İ cant do this with my knowledge 😁

ADD REPLY • link 2.9 years ago by Berk • 0

0

Entering edit mode

perhaps one of your team mates ? (I see it's a group assignment ;) )

ADD REPLY • link 2.9 years ago by lieven.sterck 15k

1

Entering edit mode

Hi Berk

this is beyond my learnings

I guess they are the objectives of the assignment itself;

I'd also think that -as good start for a fruitful discussion- you could describe up to where your learnings brings you along the task.

ADD REPLY • link 2.9 years ago by patrickdm ▴ 240

0

Entering edit mode

2.9 years ago

Berk • 0

handle = open("new_file.txt", "w")
handle.write("My FIRST line of text.\n")
handle.write("My SECOND line of text.\n")
handle.write("My THIRD line of text.\n")

handle.close()
handle = open("new_file.txt","r")
handle.readline()
saved_line = handle.readline()
print(saved_line+ "is my favorite.")
print(saved_line.strip() + "is my favorite.")
for line in handle:

print (line.strip())

handle.close()
import sys

GeneBedfile = sys.argv[1]

with open(GeneBedfile) as bfile:
for line in bfile:
start = int(line.strip().split("\t")[1])
end = int(line.strip().split("\t")[2])
geneName = str(line.strip().split("\t")[3])
print(geneName, "gene", "starts at", start, "ends at", end)

ADD COMMENT • link updated 2.9 years ago by ATpoint 85k • written 2.9 years ago by Berk • 0

1

Entering edit mode

fyi we have markdown support. Select the code and press the 10101 button or simply use the markdown backticks like below to highlight your code:

```python
python...code
```

Or alternatively paste the code into a Gist, and then simply copy/paste the link here as Alex Reynolds does above, the code will then be displayed automatically.

ADD REPLY • link 2.9 years ago by ATpoint 85k

0

Entering edit mode

İ think i didint understand it , can you explain it again , i need to do what alex did on python 😅.If you guys make me do this correctly , you ll do a great thing for.someone

ADD REPLY • link 2.9 years ago by Berk • 0

1

Entering edit mode

You won't learn much if people write out the full solution for you. You have been given about 80% of the solution, both in Python and Perl. Work out the rest.

ADD REPLY • link 2.9 years ago by Alex Reynolds 36k

1

Entering edit mode

You'll need the chromosome as well. If you want to use Python, instead of Perl, here's one thought:

with open(geneBedFilename, 'r') as bedFileHandle:
    for line in bedFileHandle:
        (chromosome, start, end, geneName) = line.rstrip().split('\t')
        start = int(start)
        end = int(end)

Once you have this, you can build a samtools query with the values of chromosome, start, and end. You could then run that query with the subprocess Python library. You then take the output sequence and write that out with the header, via geneName. Remember that you still need to stitch exons, but this gets you most of the way there.

Another way to do this — without using samtools — is to read the genome FASTA into memory, into a Python dictionary.

In a dictionary key-value pair, the chromosome name is the key and the chromosome's sequence is the value:

currentChromosome = ''
genomeFasta = {}
with open(genomeFastaFilename, 'r') as genomeFastaFileHandle:
    for line in genomeFastaFileHandle:
        line = line.rstrip()
        if line.startswith('>'):
            currentChromosome = line.lstrip('>').split(' ')[0]
            genomeFasta[currentChromosome] = ''
        else:
            genomeFasta[currentChromosome] += line

Once you have read the viral genome into memory, you can then use the start and end values as offsets into the chromosome's sequence:

with open(geneBedFilename, 'r') as bedFileHandle:
    for line in bedFileHandle:
        (chromosome, start, end, geneName) = line.rstrip().split('\t')
        start = int(start)
        end = int(end)
        sequence = genomeFasta[chromosome][start:end-1]
        sys.stdout.write('>{}\n{}\n'.format(geneName, sequence))

This approach is okay for a viral genome as they don't tend to be very long (the longest known virus is about 1.3Mb). But this approach is probably not so practical for larger genomes, like human or mouse, where an indexed search is preferred. You may be better off learning how this is done via samtools.

ADD REPLY • link 2.9 years ago by Alex Reynolds 36k

score 1 · Accepted Answer · 2021-12-19

1

Entering edit mode

2.9 years ago

Alex Reynolds 36k

The bed2faidxsta.pl script will convert your ORFs to FASTA. This gets you halfway there; you'll need to stitch exons together afterwards:

You'll need to install samtools and likely use the --fastaIsUncompressed option, i.e.:

$ bed2faidxsta.pl --fastaIsUncompressed < ORFs.bed > ORFs.fa

You'll need a file called NC_007605.1.fa in the same directory as ORFs.bed, which contains the sequence data for that chromosome.

You'll also need to edit line 67 to get FASTA headers to look the way you want. I won't explain how, but it should be straightforward from the code. As a further hint, how you process $id can help you stitch exons together further down the line.

Note: In the future, try to avoid posting graphics to questions. Graphics can't be indexed for search engines, the way text can. This question may get deleted by a moderator, for that reason.

ADD COMMENT • link 2.9 years ago by Alex Reynolds 36k

0

Entering edit mode

I tried this on python after installing samtools ,tested samtools for if its working and its ok. But code errored for some reason , even i edited line67.How can i fix it ?

ADD REPLY • link 2.9 years ago by Berk • 0

1

Entering edit mode

#!/usr/bin/env perl

is not python

ADD REPLY • link 2.9 years ago by patrickdm ▴ 240