Mapping peptide to the source genomic region
3
0
Entering edit mode
9.7 years ago
genie66 ▴ 30

I have a list of peptide sequences, their respective protein names, their start and end co-ordinates in their protein sequences. Now I wanted to map them back to genomic source and get the genomic start and end co-ordinates(preferably exons) . I have tried several tools like proteogenomic mapping tools but no luck. Peptide atlas could able to provide the exonic co-ordinates but only one peptide is possible at a time, I have hundreds of peptides! Is there is any other way to do this! Please help me out! Thanks!

peptide mapping • 4.0k views
ADD COMMENT
1
Entering edit mode
8.0 years ago
microbe77 ▴ 30

Might be too late, but this is how to do it! 1. make a six frame peptide library from you genome (all possible peptides), I use 10 aa +, for 4.5M bp bacterium about 0.25M peptides 2. use this as a reference to get all peptides that map to your possible peptides 3. Get a fasta file that contains all the genome nucleotide sequence (this should be one entry fastafile that contains ALL nucleotides 4. make a nucleotide blast database using makeblastdb command from local blast installation 5. align your peptides to the genome database using tblastn: tblastn -query <your peptide="" fasta="" file=""> -db <your genome="" database="" (these="" are="" three="" files,="" just="" use="" name="" without="" extension)="" -out="" <name="" of="" the="" out="" file="" you="" want=""> -outfmt 6 (the -outfmt 6 will give you tabular results) -max_target_seqs <1 or more, use 1> (not sure about this option though double check!) -evalue 0.001 (to eliminate partial alignment)

  1. open file in excel and only keep genome name (useually NC_xxxx), start, stop. Save this file as .bed which will be readable in almost all genome browsers (I use IGB)
  2. The code that makes six frames is in python. I will paste the code hereunder:

better to find the code here: https://github.com/microbe777/fasta2six_frames

ADD COMMENT
0
Entering edit mode
9.7 years ago
raunakms ★ 1.1k

using tools like tBLASTn could be a good starting point where it compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).

ADD COMMENT
0
Entering edit mode
9.7 years ago
Siva ★ 1.9k

You could try Scipio which uses blat to search a query protein sequence against its genome. It outputs the intron/exon boundaries and splice sites.

ADD COMMENT

Login before adding your answer.

Traffic: 2910 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6