Entering edit mode
12 months ago
Chen
•
0
Hi,
I tried to annotate chromosome with prefix "chr" in a fasta file like:
sed 's/^>/>chr/' human_g1k_v37.fasta > human_g1k_v37.annotate.fasta
However, after that, I failed to view header of the new fasta file:
samtools view -H human_g1k_v37.annotate.fasta
>>> [main_samview] fail to read the header from "human_g1k_v37.annotate.fasta".
What might potentially cause the error and if there's any alternative way to annotate a fasta file? Thanks.
Hi, sorry for the confusion, I am referring to rename chromosome in fasta, for example, convert >1 to >chr1. Thanks.
You renamed the fasta with sed...
If you're doing that then it probably means you downloaded the wrong copy of the genome. Eg GRCh37 uses ">1" and GRCh38 uses ">chr1". Just editing the names may get you past the first hurdle, but cause vastly bigger problems downstream.
I'd go back to square one. Why do you think g1k_v37 is the correct reference? Is it actually that, or is it the almost-but-not-quite-identical hg19? If you don't know, go back to source and ask them, or look at the meta-data in the SQ lines. Maybe it'll give proper data provenance. (Although sadly many people consider basic things like keeping track of what they're doing to not be an integral part of science!)