Question

Chromosome Location of Sequence in Fasta File using Biopython

0

Entering edit mode

5.1 years ago

muhammad.khizerkiet • 0

I need to make STR Profile From individual genome sequence data. I am using biopython.when I search for any Str allele or sequence I want to know the location of that sequence in that fasta file (chromosome no in which that sequence lies, base-pair location where that sequence starts and ends) but I am unable to do this thing by using biopython. I am told that fasta files does not contain chromosome information so what other thing I can do. I have also tried cram/bam file but biopython does not accept those files. I simply want to print location of each and every sequence of that fast file.

My working Code is here

for sequences in SeqIO.parse("MyFastaFile.fasta", "fasta"):
if(sequences.seq=="CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAACAGAATATATGATCGAGTGAATCTGGAGGACCTGTGGTAACTCAGCTCGTCGTGGCACTGCTTTTGTCGTGACCCTGCTTTGTTGTTGGGCCTCCTCAAGAGCTTTCATGGCAGGTTTGAACTTTAGTACGGTGCAGTTTGCGCCAAGTCATATAAAGCATCACTGATGAATGACATTATTGTCAGAAAAAATCAGAGGGGCAGTATGCTACTGAGCATGCCAGTGAATTTTTATGACTCTCGCAACGGATATCTTGGCTCTAACATCGATGAAGAACGCAGCTAAATGCGATAAGTGGTGTGAATTGCAGAATCCCGTGAACCATCGAGTCTTTGAACGCAAGTTGCGCTCGAGGCCATCAGGCTAAGGGCACGCCTGCCTGGGCGTCGTGTGTTGCGTCTCTCCTACCAATGCTTGCTTGGCATATCGCTAAGCTGGCATTATACGGATGTGAATGATTGGCCCCTTGTGCCTAGGTGCGGTGGGTCTAAGGATTGTTGCTTTGATGGGTAGGAATGTGGCACGAGGTGGAGAATGCTAACAGTCATAAGGCTGCTATTTGAATCCCCCATGTTGTTGTATTTTTTCGAACCTACACAAGAACCTAATTGAACCCCAATGGAGCTAAAATAACCATTGGGCAGTTGATTTCCATTCAGATGCGACCCCAGGTCAGGCGGGGCCACCCGCTGAGTTGAGGC"):
        print(sequences.id)
                print(sequences.seq)
                print(sequences.location)  //it does not work.

biopython python • 2.2k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 5.1 years ago by muhammad.khizerkiet • 0

0

Entering edit mode

How you do this depends what your input data is. If it's a multi fasta of chromosome sequences, you will need to either align, or use the gene name to find it. If the latter, you will need a genbank input instead of fasta

ADD REPLY • link 5.1 years ago by Joe 21k

0

Entering edit mode

No I have simple .fasta file . I have downloaded 1000 genome individual Data from

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00140/sequence_read/

I have downloaded fasta fromat data. in wohich all sequences are like this

>ERR251013.2 FCC1GTEACXX:7:1101:1213:2194/1
ATATGTTACCTGGGTTCTAAGCCCACCTGTTATTAGCTTTGAGACATTGGTAAGTCACTTAACCTCTTTGCAGACTGACTGCACAGTTTGTGCTCT

ACAC Now I need to extract genomic location of each and every sequence in that fasta file.

ADD REPLY • link updated 5.1 years ago by zx8754 12k • written 5.1 years ago by muhammad.khizerkiet • 0

0

Entering edit mode

I still don't follow.

According to that link, you don't have fasta data, you have fastq data. If you want to get the location of all those sequences you need to map the reads to obtain a BAM/SAM file.

This isn't a task for BioPython. It will simply be too slow.

ADD REPLY • link 5.1 years ago by Joe 21k

0

Entering edit mode

yes I have downloaded that data in fastq file then I have converted that data into fasta format. I have also tried BAM file but biopython doesn't recognize Bam format. from where I can download SAM file for 1000 genome project data.

ADD REPLY • link 5.1 years ago by muhammad.khizerkiet • 0

0

Entering edit mode

yes I have downloaded that data in fastq file then I have converted that data into fasta format.

Why? There's no reason to do this.

If you have a BAM, you don't need the fasta/fastq data. The BAM/SAM contains the information you asked for. You don't need BioPython.

What are you actually trying to do?

ADD REPLY • link 5.1 years ago by Joe 21k

0

Entering edit mode

I want to make STR Marker Profile By searching all CODIS STR Loci in genome data . for that I am trying to search diffenrent allele in sequence but I need to extact chromosome location to search that particular allele of STR Loci. For example I am searching for TPOX and it is located in chromosome 2, Now I need to reach chromosme 2 to search wether any allele of TPOX is present or not in that data.

ADD REPLY • link 5.1 years ago by muhammad.khizerkiet • 0

0

Entering edit mode

No I have downloaded 1000 genome project indivdual data from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00140/sequence_read/ in fastq format , Now I want to ask you that how can I align my fastq file ? please guide me

ADD REPLY • link 5.0 years ago by muhammad.khizerkiet • 0

0

Entering edit mode

This is a slightly different question. You need to google any standard tutorial about read alignment. Check out the user manuals for BWA or bowtie2

ADD REPLY • link 5.0 years ago by Joe 21k