Hi,
I have a .bam file generated using the hg19 release of human genome downloaded from UCSC database. Now I'm trying to indentificate genomic variants using GATK UnifiedGenotyper but it returns an error cause my bam file is Lexicographically sorted. I've tried to use Picard ReorderSam function but I've noted that my reference genome file (hg19) is in the same lexicographic order.
Is there any way to convert my hg19 fasta file in the karyotypic order? Or is there any place where I can download a version of the reference human genome sorted in this way?
Thanks
Hi, thanks for your code. I have numerous scaffolds that begin with KI as well as GL. eg 26 KI270728.1 1872759 3091464192 27 KI270727.1 448248 3093561344 28 KI270442.1 392061 3094085632 29 KI270729.1 280839 3094609920 30 GL000225.1 211173 3095134208 31 KI270743.1 210658 3095396352 32 GL000008.2 209709 3095658496 33 GL000009.2 201709 3095920640 34 KI270747.1 198735 3096182784 35 KI270722.1 194050 3096444928 36 GL000194.1 191469 3096707072 37 KI270742.1 186739 3096969216 38 GL000205.2 185591 3097231360 39 GL000195.1 182896 3097493504 40 KI270736.1 181920 3097755648 41 KI270733.1 179772 3098017792 42 GL000224.1 179693 3098279936 43 GL000219.1 179198 3098542080
How do I concatenate these in the correct order? I tried print
cat $d/KI*&GL*
' > GRCh38.reordered.fa but got an error -GL*: not found
Anything within backticks are simply shell commands that run and return with their standard output for Perl to parse. So in this case, you just add one more argument to the Unix
cat
command. Like this -cat $d/GL* $d/KI*
Btw, I don't recommend using in-line Perl in a production workflow. Besides, good bioinformatics tools should be able to handle FASTAs without karyotypically ordered chromosomes.