Inside a .sam file, how do I know which read maps where to which scaffold in the assembly?
1
0
Entering edit mode
4.2 years ago

I have a fasta file assembly and combining it with the raw reads we produced a .bam file which I converted to .sam .

The .sam information lines look like this:

A00321:42:HLLVYDSXX:2:2302:6153:3505    99      NODE_1_length_3415511_cov_137.721502    16      60      128M    =       607     742     CGATTAGTCCGGCCAAATCGCCGTCGAGCGCAATGAACATAACGGTCTTGCCCTCAGCGCGCAGCGCATCGGCCTTGGCGTCGATTGTGGAGTGCTCGACGCCCATGATGTCCATCATAGCACCATTG        FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        RX:Z:TTGAGGGTATAGTAGT   QX:Z:FFFFFFFFFFFFFFFF   TR:Z:GACACCG    TQ:Z:FFFFFFF    BC:Z:AGTTGCAG   QT:Z:FFFFFFFF   XS:i:-10        AS:i:0  XM:Z:0  AM:Z:0  XT:i:1  RG:Z:over_1kb:LibraryNotSpecified:1:unknown_fc:0        OM:i:60

Separated by mandatory fields it would be something like this:

QNAME: A00321:42:HLLVYDSXX:1:1644:2248:3881
FLAG: 99
RNAME: NODE_1_length_3415511_cov_137.721502
POS: 1
MAPQ: 60
CIGAR: 1S127M
RNEXT: =
PNEXT: 536
TLEN: 386
SEQ:  ATCGGGTCTGACACCGCGATTAGTCCGGCCAAATCGCCGTCGAGCGCAATGAACATAACGGTCTTGCCCTCAGCGCGCAGCGCATCGGCCTTGGCGTCGATTGTGGAGTGCTCGACGCCCATGATGTC
QUAL: FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

I'm actually interested in the meta data. I want to know how the RX: and BC: fields are distributed across the scaffolds in the original assembly.

I imagined the .sam file already contains the information about the assembly used to produce it. If I'm wrong, I'm sorry and please correct me, I'm just assuming.

What I want to do is, for each read in the .sam file, I find out its position in the assembled scaffold, and I record, Read_ID,Scaffold_ID,Read_Position_Inside_Scaffold,RX,BC

Then I want to use that database to analyse the distribution of RX and BC inside each scaffold.

That's what I want.

Ultimately what I'm trying to do is evaluate the quality of my assemblies based on the Barcode distribution.

I'm good at programming and parsing, I'm just having trouble figuring out, where, inside the .sam file, can I find the scaffold and scaffold position of each read.

assembly sam fasta • 777 views
ADD COMMENT
0
Entering edit mode
4.2 years ago

Looks like this one : NODE_1_length_3415511_cov_137.721502

Check the SAM spec for further details! https://samtools.github.io/hts-specs/SAMv1.pdf

ADD COMMENT

Login before adding your answer.

Traffic: 1579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6