I need to make STR Profile From individual genome sequence data. I am using biopython.when I search for any Str allele or sequence I want to know the location of that sequence in that fasta file (chromosome no in which that sequence lies, base-pair location where that sequence starts and ends) but I am unable to do this thing by using biopython. I am told that fasta files does not contain chromosome information so what other thing I can do. I have also tried cram/bam file but biopython does not accept those files. I simply want to print location of each and every sequence of that fast file.
My working Code is here
for sequences in SeqIO.parse("MyFastaFile.fasta", "fasta"):
if(sequences.seq=="CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAACAGAATATATGATCGAGTGAATCTGGAGGACCTGTGGTAACTCAGCTCGTCGTGGCACTGCTTTTGTCGTGACCCTGCTTTGTTGTTGGGCCTCCTCAAGAGCTTTCATGGCAGGTTTGAACTTTAGTACGGTGCAGTTTGCGCCAAGTCATATAAAGCATCACTGATGAATGACATTATTGTCAGAAAAAATCAGAGGGGCAGTATGCTACTGAGCATGCCAGTGAATTTTTATGACTCTCGCAACGGATATCTTGGCTCTAACATCGATGAAGAACGCAGCTAAATGCGATAAGTGGTGTGAATTGCAGAATCCCGTGAACCATCGAGTCTTTGAACGCAAGTTGCGCTCGAGGCCATCAGGCTAAGGGCACGCCTGCCTGGGCGTCGTGTGTTGCGTCTCTCCTACCAATGCTTGCTTGGCATATCGCTAAGCTGGCATTATACGGATGTGAATGATTGGCCCCTTGTGCCTAGGTGCGGTGGGTCTAAGGATTGTTGCTTTGATGGGTAGGAATGTGGCACGAGGTGGAGAATGCTAACAGTCATAAGGCTGCTATTTGAATCCCCCATGTTGTTGTATTTTTTCGAACCTACACAAGAACCTAATTGAACCCCAATGGAGCTAAAATAACCATTGGGCAGTTGATTTCCATTCAGATGCGACCCCAGGTCAGGCGGGGCCACCCGCTGAGTTGAGGC"):
print(sequences.id)
print(sequences.seq)
print(sequences.location) //it does not work.
How you do this depends what your input data is. If it's a multi fasta of chromosome sequences, you will need to either align, or use the gene name to find it. If the latter, you will need a genbank input instead of fasta
No I have simple .fasta file . I have downloaded 1000 genome individual Data from
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00140/sequence_read/
I have downloaded fasta fromat data. in wohich all sequences are like this
ACAC Now I need to extract genomic location of each and every sequence in that fasta file.
I still don't follow.
According to that link, you don't have fasta data, you have fastq data. If you want to get the location of all those sequences you need to map the reads to obtain a BAM/SAM file.
This isn't a task for BioPython. It will simply be too slow.
yes I have downloaded that data in fastq file then I have converted that data into fasta format. I have also tried BAM file but biopython doesn't recognize Bam format. from where I can download SAM file for 1000 genome project data.
Why? There's no reason to do this.
If you have a BAM, you don't need the fasta/fastq data. The BAM/SAM contains the information you asked for. You don't need BioPython.
What are you actually trying to do?
I want to make STR Marker Profile By searching all CODIS STR Loci in genome data . for that I am trying to search diffenrent allele in sequence but I need to extact chromosome location to search that particular allele of STR Loci. For example I am searching for TPOX and it is located in chromosome 2, Now I need to reach chromosme 2 to search wether any allele of TPOX is present or not in that data.
No I have downloaded 1000 genome project indivdual data from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00140/sequence_read/ in fastq format , Now I want to ask you that how can I align my fastq file ? please guide me
This is a slightly different question. You need to google any standard tutorial about read alignment. Check out the user manuals for
BWA
orbowtie2