Which hg38 file?
1
3
Entering edit mode
2.4 years ago
amy__ ▴ 220

Hi,

I need the hg38 reference fasta file, does anyone know which download link it would be from this? https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/

enter image description here

Or if these are even the correct files?

Thanks, Amy

reference hg38 NCBI • 6.9k views
ADD COMMENT
0
Entering edit mode

Hello,

Choose the 5th sequence from top.

Read the README file for your reference.

ADD REPLY
0
Entering edit mode

Thanks @sunnykev97, I did think it was that one! Thanks, Amy

ADD REPLY
0
Entering edit mode

Someone’s told me to use the GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

So I’m still unsure! I have read the readme but still not sure which for WES germline analysis

ADD REPLY
1
Entering edit mode

Oh wait they may be correct: The no_alt_analysis_set contains the sequences, in FASTA format, of the chromosomes, mitochondrial genome, unlocalized scaffolds, and unplaced scaffolds. The alternate locus scaffolds are omitted because many Next Generation Sequence read alignment pipelines are incompatible with the full assembly model

ADD REPLY
1
Entering edit mode

Well, Two types of genome assembly

  1. Primary assembly - assembly at the Chromosome level only (23 + 1 mitogenome) in humans
  2. Secondary assembly - alternate loci information and some unplaced scaffolds. It's good to choose the alternate assembly for more information.

If you like the post, upvote.

ADD REPLY
4
Entering edit mode

No, it’s not ‚good‘ as this information requires special alignment procedures that is not trivial and not implemented in most aligners. It even leads to false alignment results if using standard aligners because reads from these loci would come out as multimappers. For most applications use the one without ALT.

ADD REPLY
0
Entering edit mode

So would you not recommend this tutorial as it is using GRCh38 with alternate contigs to map reads?

ADD REPLY
0
Entering edit mode

In the GATK tutorial you referenced, they are using BWA aligner and according to this post, BWA should be able to handle the ALTs...

ADD REPLY
4
Entering edit mode

I think Devon's answer could be misinterpreted here. There is a wrapper around bwa-mem called bwakit which can handle ALT haplotypes, but this is not the same as bwa-mem and also not extensively used to my knowledge. The take-home message is: Do not use ALT haplotypes unless you are running a dedicated ALT-aware analysis which is almost never the case. It only makes sense if you are specifically interested in genes with known ALTs, else it just adds unnecessary complexity.

Edit: I added a suggested edit to Devon's answer, but I do not have enough reputation on SE, so it first must be approved by others.

ADD REPLY
0
Entering edit mode

Thanks for the clarification! I came to update my original comment from BWA to BWA-mem (as a README under this ftp link tells that BWA-MEM is the ALT-aware version of BWA), but reading your comment I learned that actually there is a wrapper called bwakit which is ALT-aware! :)

ADD REPLY
5
Entering edit mode
2.4 years ago

As ATpoint said, ALT information is tricky to deal with. This blog post elaborates nicely on this issue of choosing a good reference genome.

ADD COMMENT
0
Entering edit mode

Thank you all, I appreciate the help!

ADD REPLY

Login before adding your answer.

Traffic: 1722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6