Retrieving nucleotide sequence based on chromosome location offline
3
2
Entering edit mode
7.8 years ago
Mask ▴ 180

I have downloaded whole genome file from this link (ftp://ftp.broadinstitute.org/pub/genepattern/rna_seq/whole_genomes/Homo_sapiens_UCSC_hg19.fa). Now I need a program in Perl or Python to retrive nucleotide (user defined position) sequnce from the whole genome. I need to perform this offline The user input can be 1. Enter the chromosome number 2. Enter start position 3.Enter end position

The genome file i downloaded looks like

chrM

GATCGGTCTGACGTGCTgaTGATGATA GATCGGTCTGACGTGCTGATGATGATA

chr1

NNNNNNNNNNNNNNNNNNNNNNNNNNtggGGAATTttaag

genome sequence gene Python Perl • 1.6k views
ADD COMMENT
4
Entering edit mode
7.8 years ago

You can simply use samtools to index the FASTA file, and then query the indexed FASTA file with your interval of interest.

  1. Generate the index: samtools faidx in.fasta
  2. Query via: samtools faidx in.fasta chrN:X-Y

Replace chrN:X-Y with the chromosome name (chrN), start position (X) and stop position (Y) of interest.

ADD COMMENT
1
Entering edit mode
7.8 years ago
Asaf 10k

Look at the Biopython cookbook. Especially Sections 2.4 and 3.3

ADD COMMENT
0
Entering edit mode
7.8 years ago
gangireddy ▴ 160

look at 3rd answer here

ADD COMMENT

Login before adding your answer.

Traffic: 2875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6