Question

Is there a limit for SAMTOOLS tview option?

0

Entering edit mode

7.6 years ago

bioplanet ▴ 60

Hi all,

I am new to the genomics field and I have recently started working with SAMTOOLS. My question is the following:

I have my SAM file with 2,5 M aligned reads.
I convert it to BAM file with SAMTOOLS
I make the index of the BAM and the reference file

What I want to do is use the tview command to study a region of interest (say position 150-160). So I write this:

samtools tview -d T -p ref_cons:145 BAM_FILE --reference REFERENCE_FILE

It works fine, BUT the problem is that it returns only ~8,000 lines (reads). My question then is if there is some limit of the number of alignments that are outputted and, if so, how can I make it so that it prints all of them? I attach a screen-shot of the output I am getting.

enter image description here

Thank you!

alignment • 6.9k views

ADD COMMENT • link updated 2.8 years ago by jkim ▴ 190 • written 7.6 years ago by bioplanet ▴ 60

1

Entering edit mode

ok, let's see this in a different way: WHY would you need to visualize more than 8K rows ?

ADD REPLY • link 7.6 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Maybe it is because of hard-coded depth limit in samtools. Please see here: https://biostar.usegalaxy.org/p/6650/

ADD REPLY • link 7.6 years ago by Sej Modha 5.3k

0

Entering edit mode

The thing is that I want to grep all this "sub-regions" in all my aligned reads. These are barcodes from an amplicon-seq experiment that have been mapped to my reference construct (which I use as reference "genome" in this case), so I need to know all the different barcodes that were generated...

ADD REPLY • link 7.6 years ago by bioplanet ▴ 60

0

Entering edit mode

so what about reading this information from the bam directly. How do you know which section of the reads is assigned to a barcode ?

BTW: it's a typical xy problem http://xyproblem.info/ :-)

ADD REPLY • link 7.6 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

Thank you for the kind help. Unfortunately I just changed fields from studying proteins to genomics and all these are new to me - sorry if my question was not very clear. I don't know how to do as you suggest from the BAM file actually. Regarding the second question, I actually know it because the reads were mapped to the NNNNNNN on my reference construct (I used the STAR aligner that allows for mapping to NNNNN). Therefore I am confident that if, in theory, I could print all the 2,5M reads, I would be able to grep the respective region (say pos 150-160 that I have my NNNNs) in each of them and using that to get all the different possible barcodes. Does that make more sense know (at least what I was planning to do)?

ADD REPLY • link 7.6 years ago by bioplanet ▴ 60

0

Entering edit mode

If there is an inline barcode in your reads you could separate them first before aligning them to anything.

ADD REPLY • link 7.6 years ago by GenoMax 151k

0

Entering edit mode

Hello bioplanet!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=212153

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 7.6 years ago by WouterDeCoster 48k

0

Entering edit mode

Hi all and sorry for the cross-posting. The way I see it, this is not doable then. What genomax kindly suggested I am not sure I understand. The barcodes are random, i.e. I have to prior information about them, that is why I used STAR to align my sequencing reads against my reference construct, and then, based on the position of the NNNNNs in it, grep the respective regions of the aligned reads. The approach would work perfectly, if not for the (apparent) limitation of 8,000 reads to be outputted...

ADD REPLY • link 7.6 years ago by bioplanet ▴ 60

0

Entering edit mode

Hi Pierre!

Many thanks for your help! Yes, this is what I want to do..

I installed the bioalcidaejdk and I stored the code you wrote as script.js.

Then I try:

samtools view -h input.bam "reference:150-300" | java -jar dist/bioalcidaejdk.jar -f script.js -F bam

But I get lots of errors. Is this at least the correct way to do this or I misread your post?

ADD REPLY • link 7.6 years ago by bioplanet ▴ 60

0

Entering edit mode

But I get lots of errors

How can I know ?

ADD REPLY • link 7.6 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This response should have gone under @Pierre's answer.

ADD REPLY • link 7.6 years ago by GenoMax 151k

0

Entering edit mode

Hi Pierre and, again, many many thanks for the help!

Ok, here comes my script.js:

    final String chrom = "ref_cons";
final int start=145;
final int end=165;

stream().filter(R->!R.getReadUnmappedFlag()).
    filter(R->R.getContig().equals(chrom) && !(R.getEnd()<start || R.getStart()>end)).
    forEach(R->{

    for(int pos=start; pos< end ;++pos)
        {
        char base='.';
        int ref=R.getStart();
        int read=0;
        for(CigarElement ce:R.getCigar())
            {
            CigarOperator op=ce.getOperator();
            switch(op)
                {
                case P: break;
                case H: break;
                case S : read+=ce.getLength(); break;
                case D: case N: ref+=ce.getLength();break;
                case I: read+=ce.getLength();break;
                case EQ:case X: case M:
                    {
                    for(int i=0;i< ce.getLength() ;i++)
                        {
                        if(ref==pos)
                            {
                            base = R.getReadString().charAt(read);
                            break;
                            }
                        if(ref>pos) break;
                        read++;
                        ref++;
                        }
                    break;
                    }
                default:break;
                }
            if(base!='.')break;
            }
        print(base);
        }
    println();
});

ref_cons is the name of my construct (as I said, my "reference genome" is only 1 sequence), 145 start 165 end.

ADD REPLY • link 7.6 years ago by bioplanet ▴ 60

0

Entering edit mode

Errors:

         [DEBUG][BioAlcidaeJdk] Compiling :
         1  import java.util.*;
         2  import java.util.stream.*;
         3  import java.util.function.*;
         4  import htsjdk.samtools.*;
         5  import htsjdk.samtools.util.*;
         6  import htsjdk.variant.variantcontext.*;
         7  import htsjdk.variant.vcf.*;
         8  import com.github.lindenb.jvarkit.util.bio.fasta.FastaSequence;
         9  import javax.annotation.Generated;
        10  import htsjdk.variant.vcf.*;
        11  @Generated(value="BioAlcidaeJdk",date="2017-10-26T16:35:29+0200")
        12  public class BioAlcidaeJdkCustom1850364251 extends com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk.SAMHandler {
        13    public BioAlcidaeJdkCustom1850364251() {
        14    }
        15    @Override
        16    public void execute() throws Exception {
        17     // user's code starts here 
        18     final String chrom = "ref_cons";
        19  final int start=145;
        20  final int end=165;
        21  
        22  stream().filter(R->!R.getReadUnmappedFlag()).
        23      filter(R->R.getContig().equals(chrom) && !(R.getEnd()<start || R.getStart()>end)).
        24      forEach(R->{
        25  
        26      for(int pos=start; pos< end ;++pos)
        27          {
        28          char base='.';
        29          int ref=R.getStart();
        30          int read=0;
        31          for(CigarElement ce:R.getCigar())
        32              {
        33              CigarOperator op=ce.getOperator();
        34              switch(op)
        35                  {
        36                  case P: break;
        37                  case H: break;
        38                  case S : read+=ce.getLength(); break;
        39                  case D: case N: ref+=ce.getLength();break;
        40                  case I: read+=ce.getLength();break;
        41                  case EQ:case X: case M:
        42                      {
        43                      for(int i=0;i< ce.getLength() ;i++)
        44                          {
        45                          if(ref==pos)
        46                              {
        47                              base = R.getReadString().charAt(read);
        48                              break;
        49                              }
        50                          if(ref>pos) break;
        51                          read++;
        52                          ref++;
        53                          }
        54                      break;
        55                      }
        56                  default:break;
        57                  }
        58              if(base!='.')break;
        59              }
        60          print(base);
        61          }
        62      println();
        63  });
        64  
        65      //user's code ends here 
        66     }
        67  }
[SEVERE][InMemoryCompiler]ToolProvider.getSystemJavaCompiler() failed.
java.lang.RuntimeException: ToolProvider.getSystemJavaCompiler() failed.
    at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:165)
    at com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk$AbstractHandlerFactory.getConstructor(BioAlcidaeJdk.java:473)
    at com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk$SAMHandlerFactory.execute(BioAlcidaeJdk.java:564)
    at com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk.doWork(BioAlcidaeJdk.java:796)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1162)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1320)
    at com.github.lindenb.jvarkit.tools.bioalcidae.BioAlcidaeJdk.main(BioAlcidaeJdk.java:821)
[SEVERE][BioAlcidaeJdk]java.lang.RuntimeException: Cannot compile custom class BioAlcidaeJdkCustom1850364251

ADD REPLY • link 7.6 years ago by bioplanet ▴ 60

0

Entering edit mode

what is your version of java/javac ?

java -version

ADD REPLY • link 7.6 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Is this a problem?

ADD REPLY • link 7.6 years ago by bioplanet ▴ 60

0

Entering edit mode

Hi Pierre and everyone!

I just wanted to let you know that it works! I tested it on another machine because I noticed you advise against using OpenJDK and it seems to be fetching the regions properly!

Many thanks for your help, time and code, much appreciated :)

ADD REPLY • link 7.6 years ago by bioplanet ▴ 60

0

Entering edit mode

cool. please mark my answer as "accepted" (green mark on the left)

ADD REPLY • link 7.6 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

I also wanted to load more reads for samtools tview and then I came across this post.

changed "maxcnt = 8000" to "maxcnt = 80000" in samtools-1.x.x/htslib-1.x.x/sam.c and it works for me.

ADD REPLY • link 2.8 years ago by jkim ▴ 190

score 2 · Accepted Answer · 2017-10-26

Ok, I'm not sure If I've understood but the followings script for bioalcidaejdk ( http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html ) should print the read sequences in a given region of the genome (indels are skipped!)

final String chrom = "rotavirus";
final int start=150;
final int end=200;

stream().filter(R->!R.getReadUnmappedFlag()).
    filter(R->R.getContig().equals(chrom) && !(R.getEnd()<start || R.getStart()>end)).
    forEach(R->{

    for(int pos=start; pos< end ;++pos)
        {
        char base='.';
        int ref=R.getStart();
        int read=0;
        for(CigarElement ce:R.getCigar())
            {
            CigarOperator op=ce.getOperator();
            switch(op)
                {
                case P: break;
                case H: break;
                case S : read+=ce.getLength(); break;
                case D: case N: ref+=ce.getLength();break;
                case I: read+=ce.getLength();break;
                case EQ:case X: case M:
                    {
                    for(int i=0;i< ce.getLength() ;i++)
                        {
                        if(ref==pos)
                            {
                            base = R.getReadString().charAt(read);
                            break;
                            }
                        if(ref>pos) break;
                        read++;
                        ref++;
                        }
                    break;
                    }
                default:break;
                }
            if(base!='.')break;
            }
        print(base);
        }
    println();
});

usage:

samtools view -h input.bam "rotavirus:150-300" | java -jar dist/bioalcidaejdk.jar -f script.js -F bam 

TAATTCCCAGAGTTCAAAGTAAAAATGATAATGTGATGGATGACTCTGTT
TATTTACCAGATTTAAAAGTAACTTTGATTATGTGATGGATGACTCTGGT
TATTAACCAGAGTTACAAGTAAATTTGATTATGTGCTGGATGACTCTGGT
TATTTCCCAGAGATAAACGTCAATTTGATTCTGTGATGGATGACTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTCTGTGATGGATTCCTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTCTGTGATGGATGACACTGGT
TATTTACCAGAGATAAAAGTAAATTTGATTATGTGATGGATGACTCTGGT
TATTTACCAGAGTTAAAAGTAAATTAGATTATTTGATGGATGACTCTGGT
TATTAACCAGAGTTAAAAGTCAATTTGATTCTGTGATGGATGACTCAGGT
TATTTACCAGAGTTAAAAGTAAATTTGATAATTTGATGGATGACTCAGGT
TATTTACCAGATTTAAAAGTAAATTTGAATATGTGAAGGCTGACTCTGGT
TATATACCAGAGATAAAAGTAAATTTGATAATGTGATGGATGACTCTTGT
TATTTAGCAGATTACAAAGTAAATTTGATTATGTTATGGAAGACTGTGTT
TATTTACCAGATTTAAAAGTAAATTTGAATATGAGATTGATGACTCTGGA
TATTTACCAGAGTTAAAAGTAAATTTGATTATGTTATGGATGACTCTGGT
TATTTACCATAGTTAAAAGTAAATTTGATTATGAGATGGATGACTCTGGT
TATTTACGCGAGTTAAAAGTAAATTTGATTATGAGATGGATGACTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATAATGAGATGGATGACTCTTGT
TATTTACCATAGTAAAAAGTACATTTGATTATGTGATGGATGACTCTGGT
TATTAACCAGAGTTAAAAGTAAATTTGATTAAGAGATGGATGACTCTGGT
TATTTACCAGAGTTAAACTTAAATTTGATTATGTTATGGATGACTCTGGT
TATTTACCAGCGATACAAGTAACTTTTATTATGTGAAGGATGACTCTGGT
TATTAACCATAGTTAAAAGTCAATTTGATTATGTGAAGGCTGACTCTGGT
TATTTACCAGAGTTAAACGTAAATTTGATTATGTGATGGATGACTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTATGAGATGGATTACTCTGGT
TATTAACCAGAGTTAAAAGTAAATTTGATTCTGTGATGGATGACTGTGGT
TATTTACCAGAGTTAAAAGAAAATTAGATTATGTGCTGGCTGCCTCTGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTATGTGATGGATGACTCAGGT
TATTTACCAGAGTTAAAAGTAAATTTGATTATGTGATGGCTGACTCTGGT
TATTTACCAGATTTACAAGTAAATTTGATAATGTGATGGATGACTCTGGA
TATTTCCCATAGTTAAAAGTAAATTTGATTAAGTGATGGCTGACTCTGGT
(...)