Entering edit mode
4.5 years ago
naeem40thju
▴
10
Hi, I have appended the random barcode (18nt) from the 5'- end of each reads at the head like below. Now I want to do mapping skipping 18nt from 5'- end and 10nt from 3'-end using bowtie -2. After alignment with the reference sequence, I want to count the number of reads at each position and the barcodes which were unique to those reads from the SAM file. For example, if I get 100 reads at 15th position and those 100 reads came from 2 types of unique barcodes.
Anyone has any written python scripts to do that or can assist to perform it. Thanks in advance.
@ST-E00205:943:HCF3YCCX2:4:1101:11495:1678 1:N:0:NCCACGCG+NGATCTCG CCAGCCCAAAGCCACCCG
ACCGGATGGTAGACCTGGAGGAGGGGAAAGCCGAGGTGGTGACGGGAGCGGCTGGGGGGGGAGTCCGGGATGGTAGGCGGAGCGGGCAGAGCACAGCAGCTCGTGTAGAAATGG
+
7-<--7--7-7F-----77----7---7-------------------7----77-7-----7------7---------7-7------7--7----77----------77-7---
@ST-E00205:943:HCF3YCCX2:4:1101:1012:1696 1:N:0:NCTTGACC+NGATCTCG CANCCTCCCAAGGCGCCC
AATAAACAGTTGCAGCCCCAGATCGGAAGAGCGGTTCAGCAGGATGCCCGAAAACGATTTGGTTTGTCTTCTCAGCATTGAAAAAAAATAAGAATTAAGGCTTAATTCGGAACA
+
-FJ<JJ-JAFJ-F-AF<AJJJ<AFJFFFJFJFJJFJ-FFJ<JJF--777-7----7-----------7-7-7--7---7--7A-7---7--7-------7--7-----------
What do
CCAGCCCAAAGCCACCCG
andCANCCTCCCAAGGCGCCC
in the fastq header represent?Is the barcode still in the reads? Or you have moved that to the header and that is what the oligo above is.
Barcode is not in the read anymore. I have moved them from the read to the header to keep the record which barcode is from which read.
Did you use
umi-tools
to do that? It is going to be very tricky to handle those UMI's since most aligners will simply drop/ignore them when they write the BAM files. Those would have to be transferred to the alignment using a custom SAM tag.I don't understand this requirement. Most aligners are not going to have an option to let you do that. You would need to remove those bases before alignment, if you don't want them considered for alignment.
bbduk.sh
from BBMap suite can hard crop reads like that.Thanks for your advice. I've also found the umi-tools. I am checking whether I can achieve my goal by it.
Hi there, as far I have understood the Umi-tools, it is working very well in deduplication the reads. After duplicating, I can count the number of mapped reads at each position of the reference sequence. My aim was to count the number of umi at each position. For example, I have got 100 reads at a particular position and those 100 reads may have 1/ 2/ 3 umi (s), I would like to get this umi number at each position. Any idea, please? Thanks.