Ref Genome Conundrum
0
0
Entering edit mode
5.5 years ago
aalith ▴ 20

I am stumped as to which reference genome has these contigs... Any help? I've looked through the common ones (GRCh37/38, hg19/38, hs37d5)

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, M, 1_gl000191_random, 1_gl000192_random, 4_ctg9_hap1, 4_gl000193_random, 4_gl000194_random, 6_apd_hap1, 6_cox_hap2, 6_dbb_hap3, 6_mann_hap4, 6_mcf_hap5, 6_qbl_hap6, 6_ssto_hap7, 7_gl000195_random, 8_gl000196_random, 8_gl000197_random, 9_gl000198_random, 9_gl000199_random, 9_gl000200_random, 9_gl000201_random, 11_gl000202_random, 17_ctg5_hap1, 17_gl000203_random, 17_gl000204_random, 17_gl000205_random, 17_gl000206_random, 18_gl000207_random, 19_gl000208_random, 19_gl000209_random, 21_gl000210_random, Un_gl000211, Un_gl000212, Un_gl000213, Un_gl000214, Un_gl000215, Un_gl000216, Un_gl000217, Un_gl000218, Un_gl000219, Un_gl000220, Un_gl000221, Un_gl000222, Un_gl000223, Un_gl000224, Un_gl000225, Un_gl000226, Un_gl000227, Un_gl000228, Un_gl000229, Un_gl000230, Un_gl000231, Un_gl000232, Un_gl000233, Un_gl000234, Un_gl000235, Un_gl000236, Un_gl000237, Un_gl000238, Un_gl000239, Un_gl000240, Un_gl000241, Un_gl000242, Un_gl000243, Un_gl000244, Un_gl000245, Un_gl000246, Un_gl000247, Un_gl000248, Un_gl000249, AC_000005.1, AC_000006.1, AC_000007.1, AC_000008.1, AC_000017.1, AC_000018.1, AC_000019.1, NC_000883.2, NC_000898.1, NC_001348.1, NC_001352.1, NC_001354.1, NC_001355.1, NC_001356.1, NC_001357.1, NC_001405.1, NC_001430.1, NC_001434.1, NC_001436.1, NC_001454.1, NC_001457.1, NC_001458.1, NC_001460.1, NC_001472.1, NC_001488.1, NC_001489.1, NC_001490.1, NC_001526.2, NC_001531.1, NC_001576.1, NC_001583.1, NC_001586.1, NC_001587.1, NC_001591.1, NC_001593.1, NC_001595.1, NC_001596.1, NC_001612.1, NC_001617.1, NC_001653.2, NC_001655.1, NC_001664.2, NC_001676.1, NC_001690.1, NC_001691.1, NC_001693.1, NC_001694.1, NC_001710.1, NC_001716.2, NC_001722.1, NC_001781.1, NC_001796.2, NC_001798.1, NC_001802.1, NC_001806.1, NC_001837.1, NC_001897.1, NC_001943.1, NC_002645.1, NC_003266.2, NC_003443.1, NC_003461.1, NC_003977.1, NC_004102.1, NC_004104.1, NC_004148.2, NC_004295.1, NC_004500.1, NC_005134.2, NC_005147.1, NC_005831.2, NC_006273.2, NC_006577.2, NC_007018.1, NC_007026.1, NC_007027.1, NC_007455.1, NC_007605.1, NC_008188.1, NC_008189.1, NC_009333.1, NC_009334.1, NC_009823.1, NC_009824.1, NC_009825.1, NC_009826.1, NC_009827.1, NC_009887.1, NC_009996.1, NC_010329.1, NC_010810.1, NC_010956.1, NC_011202.1, NC_011203.1, NC_011800.1, NC_012042.1, NC_012213.1, NC_012485.1, NC_012486.1, NC_012564.1, NC_012729.2, NC_012798.1, NC_012800.1, NC_012801.1, NC_012802.1, NC_012950.1, NC_012959.1, NC_012986.1, NC_013035.1, NC_013114.1, NC_013115.1, NC_014185.1, NC_014952.1, NC_014953.1, NC_014954.1, NC_014955.1, NC_014956.1, NC_015150.1, NC_015630.1, NC_016157.1, NC_017993.1, NC_017994.1, NC_017995.1, NC_017996.1, NC_017997.1, NC_019023.1, NC_019026.1, NC_019027.1, NC_019028.1]

DNA Reference Genome alignment • 1.7k views
ADD COMMENT
0
Entering edit mode

Side note: given bam files have no helpful info in the header. Contig names listed above are from header file and from GATK spitting out error messages

ADD REPLY
0
Entering edit mode

A quick google of those accessions at NCBI suggests that isnt a single genome. One of those accessions is a complete coronavirus, another is a complete Papillomavirus.

I would guess, based on the 2 I checked, they're all viral genome records.

ADD REPLY
0
Entering edit mode

Sorry, I'm new to this stuff - so this means the ref genome used to align isn't one of the common ones? Is it some proprietary/concatenated reference?

ADD REPLY
0
Entering edit mode

To me, it looked like hs37d5 + more viral genome. I just can't find an updated version

ADD REPLY
0
Entering edit mode

I don't think you will. I was discussing this with someone else a couple of weeks back here. See: Where can I download GRCh38-lite.fa file and all_sequences.fa file for hg38 version

You will need to get GRCh38 and append viral genomes yourself if you need an updated version.

ADD REPLY
0
Entering edit mode

Thanks so much!! I was also given a GVCF file and was originally just using them. However, I wanted to do my own preprocessing and variant calling and compare to the given GVCF. Another point of confusion is that when filtering the GVCF, I used hg38 as reference. How is there a switch in reference genomes from bamfiles to GVCF?

Sorry, reached my 5 post limit as a newbie :(

ADD REPLY
0
Entering edit mode

I'm not sure I really follow. What has you under the impression that that is a reference genome?

Anything can be a reference genome. It's not so common to see multiple genomes concatenated together like that, unless someone was trying to make a 'viral database' or something.

Where did this file originate?

ADD REPLY

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6