Hello everybody! Sorry for the post, but I have a silly question...
I am using FreeBayes and Manta to detect SVs in the genome of a individual woman. First at all, a coworker pass me the reads aligned with BWA. So my question is the next: This BAM file obtained with BWA has any information about if it is haploid or diploid?
Explanation: I have a diploid genome of a woman, I sequenced it and obtained many reads that contain sequences of both alleles (diploid), then I aligned the sequences with BWA to the hg19 genome (that is haploid). So well, when I obtained the BAM file I have only the haploid genome, correct? However, when I use FreeBayes to detect SVs, I obtaine a VCF file that indicates the SVs with GT = 0/0, GT = 0/1 or GT = 1/1. I found that 0/0 means that the SV is homozygous to the reference allele (genome); 0/1 means that the SV is heterozygous, with only an allele equal to the reference genome; and 1/1 means that the SV is homozygous to the alternate allele. So... How FreeBayes know that information? Then, if I try to annotate the SVs, I need to specify that is haploid or diploid? Because I thought that is haploid.
Thank you very much!
Just a point on terminology:
SNV: single-nucleotide variant
SV: structural variant
These are very different variant types.
Good point; I assumed structural variant but you're likely correct the question was really about SNPs (or maybe small indels).
You are correct, when I used FreeBayes was to detect small indels. Also, I used Manta to detect SVs so I confused the concepts.
Hi Enrique,
if you are working with sequence files in FASTA format, maybe you can be interested in our SEDA (http://www.sing-group.org/seda/) application for easily processing FASTA files (filtering, merging, modifying headers, and so on).
Regards,
Hugo.
Thank you Hugo, now I am using SeqTrimNext, but I will keep it in mind in the future!