Hi all,
A very basic question but I can't solve it easily. After a raw-read mapping against a list of references, how to make a list of reads that match with more than one reference (and which ones) to highlight them?
Thank you very much!
Hi all,
A very basic question but I can't solve it easily. After a raw-read mapping against a list of references, how to make a list of reads that match with more than one reference (and which ones) to highlight them?
Thank you very much!
extract the names of the first-in-pair + mapped reads and their reference, sort+unique to get the unique (name+REF), cut to get the uniq names, sort+uniq again to the duplicated names (sames read-name but different reference):
samtools view -f 64 -F 4 in.bam | cut -f1,3 | sort | uniq | cut -f1 | sort | uniq -d > namesF.txt
same for 2nd in pair
samtools view -f 128 -F 4 in.bam | cut -f1,3 | sort | uniq | cut -f1 | sort | uniq -d > namesR.txt
and with which references associated
re-use those files with grep -f
or ( join
if the list is large) to extract those reads
count that tell me that x reads match with references
it should be a simple sort | uniq -c
Hi Pierre,
Thank you for your answer, that's what I've tried but I wanted to be sure that I miss anything
I tried this:
samtools view -F 4 file.bam | cut -f1,3 | sort | uniq | cut -f1 | sort | uniq -d > names.txt
samtools view file.bam | grep -f names.txt > sequences.sam
cat sequences.sam | sort | uniq -c > interm
output:
1 ABDC8:10138:09707 2048 ref1 1 60 23H77M185H * 0 0 ACATGGATGTTACGCAGCAGGGCAGTCGCCCTAAAACAAAGCACGATGCACTAAGCACATACCTGCTCACAGCCTAC :9858///484668::::::;4;:::666;3:<<<0;;;3;=5;;;::;=///6:99977717178899::994:=1 NM:i:0 MD:Z:77 AS:i:77 XS:i:0 SA:Z:ref2,1,+,64S36M1I18M1I19M1I97M1D13M35S,60,4;
to format the ouput I did
awk '{print $2,$4,$17}' interm >final
and I got:
ABDC8:10138:09707 ref1 SA:Z:ref2,1,+,64S36M1I18M1I19M1I97M1D13M35S,60,4;
question: how to put tabs between my selected columns and to select only ref2 instead of all column 17 (I have a lot of reference names with a lot of different character sizes)
Thanks again!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
define highlight please.
I just meant to work with them after. Sorry if my question is not clear : I'm trying to get two different things: