Hi,
I am very new to Long-read sequence data processing.
I have downloaded raw data from NCBI SRA from this paper: https://doi.org/10.1016/j.gene.2022.146438
My study also involves finding structural variants involved in Alpha-thalassemia from long-read data. So I took this as an example data. However, the data processing steps are not clearly mentioned in the paper.
Following details are only given in the paper.
After purification and quantification, the pooled library was converted to a SMRTbell library with Sequel Binding and Internal Ctrl Kit 3.0 (Pacific Biosciences) and sequenced on the Sequel II platform (Pacific Biosciences) under CCS mode. Then raw subreads were analyzed by CCS software (Pacific Biosciences) to generate CCS reads, debarcoded by lima in the Pbbioconda package (Pacific Biosciences) and aligned to genome build hg38 by pbmn2 (Pacific Biosciences). Finally, structural variations were identi- fied according to the HbVar, Ithanet and LOVD databases. SNVs and indels were identified by FreeBayes1.3.4
Now I have downloaded data from SRA which is in fastq format.
I would like to know which aligner and structural variant will be suitable for this data? I have already alignment using ngmlr, pbmm2 and variant calling using sniffles2, pbsv. But I could not replicate the results.
Please suggest me some methods.
Just to add to what michael said, you generally see at least 2 SV callers used, and a consensus between the calls is what is taken forward. Another option that I like for merging calls is SURVIVOR - I'm unfamiliar with combiSV. Though I am unsure what the output formats for long read callers is, but I suspect it's also a VCF in most cases.