How To Get Ref.Fasta
2
3
Entering edit mode
13.4 years ago
Zhshqzyc ▴ 520

Hi,

I want to use samtools command

samtools faidx <ref.fasta> [region1 [...]]

My question: where can I get ref.fasta or how to create ref.fasta by some command? Suppose I have a bam file already.

Thanks.

samtools sequence • 11k views
ADD COMMENT
0
Entering edit mode

Which genome was used to create your BAM file? By that I mean, to which genome were the reads aligned?

ADD REPLY
0
Entering edit mode

Human genome. dbGaP phenotype release

ADD REPLY
0
Entering edit mode

Which human genome? hg18, hg19, another one? Normally you can download the hgXX as single chromosomes and merged them to hgXX.fasta, meaning ref.fasta

ADD REPLY
0
Entering edit mode

hg18 genome. Where can I download it?

ADD REPLY
0
Entering edit mode

just a warning: if you already have a BAM file it means that the reads have already been mapped, so the reference file should have already been available. you should try to retrieve such reference file, because if you download a different file you would end having nomenclature or position errors that won't be easy to deal with.

ADD REPLY
4
Entering edit mode
13.4 years ago
Mdeng ▴ 530

Get it here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/

and download hg18.2bit, this is hg18 binary coded. This one you can convert to the fasta format using twoBitToFa. You can download this tool here:

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

//Edit:

Ok, then you can download the single chromosomes here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/

Skip that ones with random in the name, just from chr1.fa.gz to chrY.fa.gz (depends on what you need - may ask the guy who did the alignment).

After that unzip them and merge. How this works please find here:

Fasta File Vs Fa File

fa equals fasta

ADD COMMENT
0
Entering edit mode

mdeng, many many thanks.

ADD REPLY
0
Entering edit mode

Let us know if it works. If not try the Link which you posted. Seems also to be hg18.fasta and you don't have to convert formats.

ADD REPLY
0
Entering edit mode

twoBitToFa is a corrupt text file. If you have a direct download for hg18.fasta, please let me know. Thanks.

ADD REPLY
0
Entering edit mode

Edit my post...

ADD REPLY
2
Entering edit mode
13.4 years ago
Drio ▴ 920

The BAM file will not contain the reference genome (if that is what you are asking). Check the header:

samtools view -H my.bam

You may find some information about the exact version that was used to align the data. If you can't find anything I'd suggest you contact the person that generated the alignments.

ADD COMMENT
0
Entering edit mode

There is a header file. I found something like.

@SQ SN:chr1_random  LN:1663265  AS:HG18 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly18.fasta">http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly18.fasta   M5:cc05cb1554258add2eb62e88c0746394 SP:Homo sapiens

So should I download this file as reference fasta?

ADD REPLY
0
Entering edit mode

Yes, that's exactly what you want to do.

ADD REPLY

Login before adding your answer.

Traffic: 2055 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6