I followed this workflow: https://unclineberger.org/vincentlab/wp-content/uploads/sites/1083/2020/10/hervquant-instructions.pdf (also tested here: https://github.com/leiwaaping/hervQuant_test)
Workflow, briefly, is a custom .fa reference with hg19 + curated HERV sequences. STAR is used as aligner, hg19 reads are removed subsequently by sed (.fa file for hg19 has UCSC gene ids in form of uc###***.# and chromosome locations for HERVs) and remaining reads are quantified using salmon.
My deviation from this pipeline is that I am starting with single-cell type RNA-seq unmapped BAM files so I use bedtools to convert them to r1 and r2 fastq files and then align it to the .fa file available in the workflow website. I am not tagging and removing bead barcodes and UMIs so STAR doesn't really know about that. In any case aligner output has a very low unique mapping rate due to: 1- multi-mapping to ERV sequences 2- r1 has virtually no transcript in it due to bead barcodes and UMIs 3- other things I don't know
My question is how to sanity check the outputs of this alignment? The .SAM output looks normal-ish to me. I can provide portions of the output if needed. The second question related is that I can't view these outputs on IGV, considering that references have no coordinates in them. Is there a work around to this?