Can you do this , this is beyond my learnings.
Can you do this , this is beyond my learnings.
The bed2faidxsta.pl
script will convert your ORFs to FASTA. This gets you halfway there; you'll need to stitch exons together afterwards:
You'll need to install samtools
and likely use the --fastaIsUncompressed
option, i.e.:
$ bed2faidxsta.pl --fastaIsUncompressed < ORFs.bed > ORFs.fa
You'll need a file called NC_007605.1.fa
in the same directory as ORFs.bed
, which contains the sequence data for that chromosome.
You'll also need to edit line 67 to get FASTA headers to look the way you want. I won't explain how, but it should be straightforward from the code. As a further hint, how you process $id
can help you stitch exons together further down the line.
Note: In the future, try to avoid posting graphics to questions. Graphics can't be indexed for search engines, the way text can. This question may get deleted by a moderator, for that reason.
handle = open("new_file.txt", "w")
handle.write("My FIRST line of text.\n")
handle.write("My SECOND line of text.\n")
handle.write("My THIRD line of text.\n")
handle.close()
handle = open("new_file.txt","r")
handle.readline()
saved_line = handle.readline()
print(saved_line+ "is my favorite.")
print(saved_line.strip() + "is my favorite.")
for line in handle:
print (line.strip())
handle.close()
import sys
GeneBedfile = sys.argv[1]
with open(GeneBedfile) as bfile:
for line in bfile:
start = int(line.strip().split("\t")[1])
end = int(line.strip().split("\t")[2])
geneName = str(line.strip().split("\t")[3])
print(geneName, "gene", "starts at", start, "ends at", end)
fyi we have markdown support. Select the code and press the 10101
button or simply use the markdown backticks like below to highlight your code:
```python
python...code
```
Or alternatively paste the code into a Gist, and then simply copy/paste the link here as Alex Reynolds does above, the code will then be displayed automatically.
You'll need the chromosome as well. If you want to use Python, instead of Perl, here's one thought:
with open(geneBedFilename, 'r') as bedFileHandle:
for line in bedFileHandle:
(chromosome, start, end, geneName) = line.rstrip().split('\t')
start = int(start)
end = int(end)
Once you have this, you can build a samtools
query with the values of chromosome
, start
, and end
. You could then run that query with the subprocess
Python library. You then take the output sequence and write that out with the header, via geneName
. Remember that you still need to stitch exons, but this gets you most of the way there.
Another way to do this — without using samtools
— is to read the genome FASTA into memory, into a Python dictionary.
In a dictionary key-value pair, the chromosome name is the key and the chromosome's sequence is the value:
currentChromosome = ''
genomeFasta = {}
with open(genomeFastaFilename, 'r') as genomeFastaFileHandle:
for line in genomeFastaFileHandle:
line = line.rstrip()
if line.startswith('>'):
currentChromosome = line.lstrip('>').split(' ')[0]
genomeFasta[currentChromosome] = ''
else:
genomeFasta[currentChromosome] += line
Once you have read the viral genome into memory, you can then use the start
and end
values as offsets into the chromosome's sequence:
with open(geneBedFilename, 'r') as bedFileHandle:
for line in bedFileHandle:
(chromosome, start, end, geneName) = line.rstrip().split('\t')
start = int(start)
end = int(end)
sequence = genomeFasta[chromosome][start:end-1]
sys.stdout.write('>{}\n{}\n'.format(geneName, sequence))
This approach is okay for a viral genome as they don't tend to be very long (the longest known virus is about 1.3Mb). But this approach is probably not so practical for larger genomes, like human or mouse, where an indexed search is preferred. You may be better off learning how this is done via samtools
.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I was gonna post 'sounds like a homework assignment' ... but this is beyond obvious apparently :)
Yeah , good instinct.😂
So the question is ,can someone do it? İ cant do this with my knowledge 😁
perhaps one of your team mates ? (I see it's a group assignment ;) )
Hi Berk
I guess they are the objectives of the assignment itself;
I'd also think that -as good start for a fruitful discussion- you could describe up to where your learnings brings you along the task.