I'm working with some RNA-seq data. I have alignments done in STAR and the resultant BAM file. I'd like to annotate this BAM alignment data with custom tags using data that are stored in a separate file. The data in the second file contain are a read ID, a barcode, and a UMI. I want to add the barcode and UMI to all reads in the BAM file that match the read ID in the second file.
To summarise:
First file: BAM output from STAR
Second file: Read ID (matching those in STAR BAM file), UMI, barcode.
How do I get the UMI and barcode in file 2 tagged onto the reads in the BAM file?
Intensive Google and forum searching have yielded little info about this but I have a feeling there's a simple answer. Can anyone help?
I don't think there's a simple answer for this question, but if you want to write a script you might find simplesam (https://github.com/mdshw5/simplesam) useful:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Please post example lines from the two files.