Hello everyone,
I have a tissue sample for which I have sequencing data available in several formats - FASTQ, BAM, and VCF. The alignment has been done against the GRCh37 reference genome.
I am interested in finding out whether this sample contains sequences from Human Papillomavirus type 33 (HPV-33). I have been looking into various bioinformatics tools and methods, but I am a bit uncertain about the best way to proceed.
For the FASTQ files, I was considering using a tool like Bowtie2 or BWA to align the reads against the HPV-33 reference genome, but I am unsure if this is the optimal approach.
For the BAM and VCF files, would it be more appropriate to use a tool that can identify viral integration sites? I've heard of tools like VirStrain and VIcaller, but I haven't used them before.
I would be grateful for any advice or suggestions. In particular, I'm interested in:
Recommended methods for detecting HPV-33 in FASTQ, BAM, and VCF files Any specific tools or databases that might be helpful Any quality control steps or considerations that I should keep in mind
Any specific workflow examples would be extremely helpful.
Thank you for your help!
You don't mention the source of your sequencing data - RNA? DNA? WES? Cell lines vs. tissue?