How to determine the exact version of hg38 if I have only the FASTA file
1
0
Entering edit mode
18 months ago
mps • 0

I have a FASTA file which contains hg38 assembly. It contains the primary contigs, alt contigs, decoy, HLA, mito.

How do I determine the exact version of hg38 based on the FASTA?

Here some of the headers:

>chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38
>chr2  AC:CM000664.2  gi:568336022  LN:242193529  rl:Chromosome  M5:f98db672eb0993dcfdabafe2a882905c  AS:GRCh38
>chr3  AC:CM000665.2  gi:568336021  LN:198295559  rl:Chromosome  M5:76635a41ea913a405ded820447d067b0  AS:GRCh38
>chr4  AC:CM000666.2  gi:568336020  LN:190214555  rl:Chromosome  M5:3210fecf1eb92d5489da4346b3fddc6e  AS:GRCh38
>chr5  AC:CM000667.2  gi:568336019  LN:181538259  rl:Chromosome  M5:a811b3dc9fe66af729dc0dddf7fa4f13  AS:GRCh38  hm:47309185-49591369
>chr6  AC:CM000668.2  gi:568336018  LN:170805979  rl:Chromosome  M5:5691468a67c7e7a7b5f2a3a683792c29  AS:GRCh38
>chr7  AC:CM000669.2  gi:568336017  LN:159345973  rl:Chromosome  M5:cc044cc2256a1141212660fb07b6171e  AS:GRCh38
>chr8  AC:CM000670.2  gi:568336016  LN:145138636  rl:Chromosome  M5:c67955b5f7815a9a1edfaa15893d3616  AS:GRCh38
>chr9  AC:CM000671.2  gi:568336015  LN:138394717  rl:Chromosome  M5:6c198acf68b5af7b9d676dfdd531b5de  AS:GRCh38
>chr10  AC:CM000672.2  gi:568336014  LN:133797422  rl:Chromosome  M5:c0eeee7acfdaf31b770a509bdaa6e51a  AS:GRCh38
>chr11  AC:CM000673.2  gi:568336013  LN:135086622  rl:Chromosome  M5:1511375dc2dd1b633af8cf439ae90cec  AS:GRCh38
>chr12  AC:CM000674.2  gi:568336012  LN:133275309  rl:Chromosome  M5:96e414eace405d8c27a6d35ba19df56f  AS:GRCh38
>chr13  AC:CM000675.2  gi:568336011  LN:114364328  rl:Chromosome  M5:a5437debe2ef9c9ef8f3ea2874ae1d82  AS:GRCh38
>chr14  AC:CM000676.2  gi:568336010  LN:107043718  rl:Chromosome  M5:e0f0eecc3bcab6178c62b6211565c807  AS:GRCh38  hm:multiple
>chr15  AC:CM000677.2  gi:568336009  LN:101991189  rl:Chromosome  M5:f036bd11158407596ca6bf3581454706  AS:GRCh38
>chr16  AC:CM000678.2  gi:568336008  LN:90338345  rl:Chromosome  M5:db2d37c8b7d019caaf2dd64ba3a6f33a  AS:GRCh38
>chr17  AC:CM000679.2  gi:568336007  LN:83257441  rl:Chromosome  M5:f9a0fb01553adb183568e3eb9d8626db  AS:GRCh38
>chr18  AC:CM000680.2  gi:568336006  LN:80373285  rl:Chromosome  M5:11eeaa801f6b0e2e36a1138616b8ee9a  AS:GRCh38
>chr19  AC:CM000681.2  gi:568336005  LN:58617616  rl:Chromosome  M5:85f9f4fc152c58cb7913c06d6b98573a  AS:GRCh38  hm:multiple
>chr20  AC:CM000682.2  gi:568336004  LN:64444167  rl:Chromosome  M5:b18e6c531b0bd70e949a7fc20859cb01  AS:GRCh38
>chr21  AC:CM000683.2  gi:568336003  LN:46709983  rl:Chromosome  M5:974dc7aec0b755b19f031418fdedf293  AS:GRCh38  hm:multiple
>chr22  AC:CM000684.2  gi:568336002  LN:50818468  rl:Chromosome  M5:ac37ec46683600f808cdd41eac1d55cd  AS:GRCh38  hm:multiple
>chrX  AC:CM000685.2  gi:568336001  LN:156040895  rl:Chromosome  M5:2b3a55ff7f58eb308420c8a9b11cac50  AS:GRCh38
>chrY  AC:CM000686.2  gi:568336000  LN:57227415  rl:Chromosome  M5:ce3e31103314a704255f3cd90369ecce  AS:GRCh38  hm:10001-2781479,56887903-57217415
>chrM  AC:J01415.2  gi:113200490  LN:16569  rl:Mitochondrion  M5:c68f52674c9fb33aef52dcf399755519  AS:GRCh38  tp:circular
>chr1_KI270706v1_random  AC:KI270706.1  gi:568335410  LN:175055  rg:chr1  rl:unlocalized  M5:62def1a794b3e18192863d187af956e6  AS:GRCh38
>chr1_KI270707v1_random  AC:KI270707.1  gi:568335409  LN:32032  rg:chr1  rl:unlocalized  M5:78135804eb15220565483b7cdd02f3be  AS:GRCh38
>chr1_KI270708v1_random  AC:KI270708.1  gi:568335408  LN:127682  rg:chr1  rl:unlocalized  M5:1e95e047b98ed92148dd84d6c037158c  AS:GRCh38
>chr1_KI270709v1_random  AC:KI270709.1  gi:568335407  LN:66860  rg:chr1  rl:unlocalized  M5:4e2db2933ea96aee8dab54af60ecb37d  AS:GRCh38
>chr1_KI270710v1_random  AC:KI270710.1  gi:568335406  LN:40176  rg:chr1  rl:unlocalized  M5:9949f776680c6214512ee738ac5da289  AS:GRCh38
>chr1_KI270711v1_random  AC:KI270711.1  gi:568335405  LN:42210  rg:chr1  rl:unlocalized  M5:af383f98cf4492c1f1c4e750c26cbb40  AS:GRCh38
>chr1_KI270712v1_random  AC:KI270712.1  gi:568335404  LN:176043  rg:chr1  rl:unlocalized  M5:c38a0fecae6a1838a405406f724d6838  AS:GRCh38
>chr1_KI270713v1_random  AC:KI270713.1  gi:568335403  LN:40745  rg:chr1  rl:unlocalized  M5:cb78d48cc0adbc58822a1c6fe89e3569  AS:GRCh38
>chr1_KI270714v1_random  AC:KI270714.1  gi:568335402  LN:41717  rg:chr1  rl:unlocalized  M5:42f7a452b8b769d051ad738ee9f00631  AS:GRCh38
>chr2_KI270715v1_random  AC:KI270715.1  gi:568335401  LN:161471  rg:chr2  rl:unlocalized  M5:b65a8af1d7bbb7f3c77eea85423452bb  AS:GRCh38
>chr2_KI270716v1_random  AC:KI270716.1  gi:568335400  LN:153799  rg:chr2  rl:unlocalized  M5:2828e63b8edc5e845bf48e75fbad2926  AS:GRCh38
>chr3_GL000221v1_random  AC:GL000221.1  gi:224183270  LN:155397  rg:chr3  rl:unlocalized  M5:3238fb74ea87ae857f9c7508d315babb  AS:GRCh38
>chr4_GL000008v2_random  AC:GL000008.2  gi:568335399  LN:209709  rg:chr4  rl:unlocalized  M5:a999388c587908f80406444cebe80ba3  AS:GRCh38
>chr5_GL000208v1_random  AC:GL000208.1  gi:224183050  LN:92689  rg:chr5  rl:unlocalized  M5:aa81be49bf3fe63a79bdc6a6f279abf6  AS:GRCh38
>chr9_KI270717v1_random  AC:KI270717.1  gi:568335398  LN:40062  rg:chr9  rl:unlocalized  M5:796773a1ee67c988b4de887addbed9e7  AS:GRCh38
>chr9_KI270718v1_random  AC:KI270718.1  gi:568335397  LN:38054  rg:chr9  rl:unlocalized  M5:b0c463c8efa8d64442b48e936368dad5  AS:GRCh38
>chr9_KI270719v1_random  AC:KI270719.1  gi:568335396  LN:176845  rg:chr9  rl:unlocalized  M5:cd5e932cfc4c74d05bb64e2126873a3a  AS:GRCh38
>chr9_KI270720v1_random  AC:KI270720.1  gi:568335395  LN:39050  rg:chr9  rl:unlocalized  M5:8c2683400a4aeeb40abff96652b9b127  AS:GRCh38
>chr11_KI270721v1_random  AC:KI270721.1  gi:568335394  LN:100316  rg:chr11  rl:unlocalized  M5:9654b5d3f36845bb9d19a6dbd15d2f22  AS:GRCh38
>chr14_GL000009v2_random  AC:GL000009.2  gi:568335393  LN:201709  rg:chr14  rl:unlocalized  M5:862f555045546733591ff7ab15bcecbe  AS:GRCh38
>chr14_GL000225v1_random  AC:GL000225.1  gi:224183274  LN:211173  rg:chr14  rl:unlocalized  M5:63945c3e6962f28ffd469719a747e73c  AS:GRCh38
>chr14_KI270722v1_random  AC:KI270722.1  gi:568335392  LN:194050  rg:chr14  rl:unlocalized  M5:51f46c9093929e6edc3b4dfd50d803fc  AS:GRCh38
>chr14_GL000194v1_random  AC:GL000194.1  gi:224183213  LN:191469  rg:chr14  rl:unlocalized  M5:6ac8f815bf8e845bb3031b73f812c012  AS:GRCh38
>chr14_KI270723v1_random  AC:KI270723.1  gi:568335391  LN:38115  rg:chr14  rl:unlocalized  M5:74a4b480675592095fb0c577c515b5df  AS:GRCh38
>chr14_KI270724v1_random  AC:KI270724.1  gi:568335390  LN:39555  rg:chr14  rl:unlocalized  M5:c3fcb15dddf45f91ef7d94e2623ce13b  AS:GRCh38
>chr14_KI270725v1_random  AC:KI270725.1  gi:568335389  LN:172810  rg:chr14  rl:unlocalized  M5:edc6402e58396b90b8738a5e37bf773d  AS:GRCh38
>chr14_KI270726v1_random  AC:KI270726.1  gi:568335388  LN:43739  rg:chr14  rl:unlocalized  M5:fbe54a3197e2b469ccb2f4b161cfbe86  AS:GRCh38
>chr15_KI270727v1_random  AC:KI270727.1  gi:568335387  LN:448248  rg:chr15  rl:unlocalized  M5:84fe18a7bf03f3b7fc76cbac8eb583f1  AS:GRCh38
>chr16_KI270728v1_random  AC:KI270728.1  gi:568335386  LN:1872759  rg:chr16  rl:unlocalized  M5:369ff74cf36683b3066a2ca929d9c40d  AS:GRCh38
>chr17_GL000205v2_random  AC:GL000205.2  gi:568335385  LN:185591  rg:chr17  rl:unlocalized  M5:458e71cd53dd1df4083dc7983a6c82c4  AS:GRCh38
>chr17_KI270729v1_random  AC:KI270729.1  gi:568335384  LN:280839  rg:chr17  rl:unlocalized  M5:2756f6ee4f5780acce31e995443508b6  AS:GRCh38
>chr17_KI270730v1_random  AC:KI270730.1  gi:568335383  LN:112551  rg:chr17  rl:unlocalized  M5:48f98ede8e28a06d241ab2e946c15e07  AS:GRCh38
>chr22_KI270731v1_random  AC:KI270731.1  gi:568335382  LN:150754  rg:chr22  rl:unlocalized  M5:8176d9a20401e8d9f01b7ca8b51d9c08  AS:GRCh38
>chr22_KI270732v1_random  AC:KI270732.1  gi:568335381  LN:41543  rg:chr22  rl:unlocalized  M5:d837bab5e416450df6e1038ae6cd0817  AS:GRCh38
>chr22_KI270733v1_random  AC:KI270733.1  gi:568335380  LN:179772  rg:chr22  rl:unlocalized  M5:f1fa05d48bb0c1f87237a28b66f0be0b  AS:GRCh38
>chr22_KI270734v1_random  AC:KI270734.1  gi:568335379  LN:165050  rg:chr22  rl:unlocalized  M5:1d17410ae2569c758e6dd51616412d32  AS:GRCh38
>chr22_KI270735v1_random  AC:KI270735.1  gi:568335378  LN:42811  rg:chr22  rl:unlocalized  M5:eb6b07b73dd9a47252098ed3d9fb78b8  AS:GRCh38
>chr22_KI270736v1_random  AC:KI270736.1  gi:568335377  LN:181920  rg:chr22  rl:unlocalized  M5:2ff189f33cfa52f321accddf648c5616  AS:GRCh38
>chr22_KI270737v1_random  AC:KI270737.1  gi:568335376  LN:103838  rg:chr22  rl:unlocalized  M5:2ea8bc113a8193d1d700b584b2c5f42a  AS:GRCh38
>chr22_KI270738v1_random  AC:KI270738.1  gi:568335375  LN:99375  rg:chr22  rl:unlocalized  M5:854ec525c7b6a79e7268f515b6a9877c  AS:GRCh38
>chr22_KI270739v1_random  AC:KI270739.1  gi:568335374  LN:73985  rg:chr22  rl:unlocalized  M5:760fbd73515fedcc9f37737c4a722d6a  AS:GRCh38
>chrY_KI270740v1_random  AC:KI270740.1  gi:568335373  LN:37240  rg:chrY  rl:unlocalized  M5:69e42252aead509bf56f1ea6fda91405  AS:GRCh38
>chrUn_KI270302v1  AC:KI270302.1  gi:568335372  LN:2274  rl:unplaced  M5:ee6dff38036f7d03478c70717643196e  AS:GRCh38
>chrUn_KI270304v1  AC:KI270304.1  gi:568335371  LN:2165  rl:unplaced  M5:9423c1b46a48aa6331a77ab5c702ac9d  AS:GRCh38
>chrUn_KI270303v1  AC:KI270303.1  gi:568335370  LN:1942  rl:unplaced  M5:2cb746c78e0faa11e628603a4bc9bd58  AS:GRCh38
reference-genome FASTA • 726 views
ADD COMMENT
1
Entering edit mode
18 months ago

You could use blastn to align your fasta reference against the other common fasta references and find out which one you have:

blast -query file1.fasta -subject file2.fasta -outfmt 6 -out results.txt

If there are no gaps between each two chr sequences aligned in the results file you found the correct reference. :)

Another quick and dirty method would be to check the number of contigs you have, the name of the contigs, and the size of the sequences as well (Ex. chrM, chrY and chrX).

ADD COMMENT

Login before adding your answer.

Traffic: 1661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6