I have paired-end exome data (X_R1.fastq and X_R2.fastq) for several samples. I would like to run quality check and get coverage for these data.
I am wondering whether I should run fastx_quality_stats on the individual fastq files directly or I need to merge individual reads to a single file before doing QC.
What tool do you use to merge paired end reads ? I am not sure if the solution in this question is ideal for all my data.
What is the best practice in QC, to perform QC on individual reads or merged reads ?
How can I get coverage from paired end reads ?
Depends on the organism. if it's available at UCSC, http://genome.ucsc.edu/cgi-bin/hgTables?command=start, then choose your organism and version and knownGenes (or whichever) then choose output format BED. Then in the next page: Exons plus 0 bases.
Note that coverageBed now takea BAM directly. samtools view -uf 0x2 reads.bam | coverageBed -abam stdin -b exons.bed
I need to update the usage examples...
Thanks a lot for the details Brent, this is what I am looking for. How do I get that exons.bed ?
Many thanks ! I am working on humans. So it's there, am downloading now.
I am trying the samtools command and getting a warning and the program exits "It looks as though you have less than 3 columns at line: 1. Are you sure your files are tab-delimited?" Have you come across something like this before ?
which part is creating that error? the samtools view? your data is in bam format, not sam?
Oops. I think i used a wrong input file. It's running now. Thanks.
Thanks a lot Aaron !