Entering edit mode
4.7 years ago
shrutidabral
▴
10
I am trying to do alignment to reference genome hg19 using bowtie2 2.4.1.
I have total 10 sample form which 5 raw file giving error during alignment using same command. Following error i am facing for some files ::
Command
/bowtie2-2.4.1/bowtie2 --no-discordant --no-mixed --local -p 20 -x /WXS_test/index/hg19 -1 /data/fastqfile/R_1.fastq.gz -2 /data/fastqfile/R_2.fastq.gz -S /WXS_test/alignment/R.sam
Error, fewer reads in file specified with -1 than in file specified with -2
terminate called after throwing an instance of 'int'
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)
Action taken
Quality check of raw file using FASTQC tool . All files pass the criteria. * No memory issues.
Is their any other way to check the fastq files ,if error is related to raw files ?
make sure the read number in
R_1.fastq.gz
andR_2.fastq.gz
are identical. check is out infastqc
output, "Summary".What are the numbers?
Length of reads R1 : 31988936 R2 : 32047104
The files are of unequal length. Your data or published data? Did you trim data and did not use a dedicated paired-end trimmer? Did you manipulate the files? Try to find out why there are not equal numbers of reads in both files. You can try
repair.sh
from bbmap but I would try to sniff out the reason first.Length of reads R1 : 31988936 R2 : 32047104
I did not trim or manipulate the data . this data is published one. This data is in SRA files , I supposed to convert them directly into bam files but due to error :: [E::sam_hrecs_error] Malformed key:value pair at line 86: "@RG ID:PM164 PL:Illumina LB:GA LNID:L001 FCID:H9CB8ADXX DT:2014-04-21T00:00:00-0400 BCID:AGTACAAG SM:PM164_X1_1_Case" could not complete this .
Should i run trimmomatic or can you recommend some other tools ?
Please post all commands related to download and conversion from sra to fastq. I guess the file contains singletons, did you use
split-3
option withfastq-dump
? If not but rathersplit-files
than then I suggest to simply runrepair.sh
from bbmap to remove the singletons. It can of course also be that the uploader simply messed things up by uploading corrupted files. Can you share an accession number?I have converted srr to fastq file by using " -I --split-files " as a results the read in both file have different number and when i used " split-3 " function it gives same number of reads in both R1 and R2 files .
I am not able to understand difference between these two parameters . Out of 10 raw file split-file command is not producing correct fastq reads .I have report this to author also. Is it somewhere issue in uploading or something else .
Would you paste the command here? and also one of the SRR number.
If it is paired end reads, use
--split-3
will save each read into separate files. (Read 1, read 2, and orphaned reads(if exists) ). but --split-files will not.So, try
--split-3
if you can make sure the SRA contains Paired-end reads.Last, here is a nice tool, https://sra-explorer.info/; you can find the links for SRA and also raw fastq files. (both NCBI and EBI).
I can't give you SRR number sorry.
This description is from https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump"" fastq-dump -I --split-files SRR390728 Produces two fastq files (--split-files) containing ".1" and ".2" read suffices (-I) for paired-end data.
Fine, the details what you did help other to reproduce your problem.
I guess, one possible reason, the sra file is truncated that downloaded by
fastq-dump
(not tested),Anyway, it is recommended to download files using
prefetch
, notfastq-dump
.if you tried as exactly as the example, it means
fastq-dump
will download the *sra file first, then convert it to fastq file. (it is not recommend in this way)Here is a example: using
prefetch
(also suggested by NCBI). And then convert it usingfasterq-dump
.The two files fastq files might not a pair of PE reads, or one of the file is truncated.
If not from a pair of PE reads
you could check the first reads (eg: 10 reads) names. if they are identical in two files:
If the name list are not identical, you should find the correct source of the reads.
If reads from correct PE reads
you can subset the
R_2.fastq.gz
to the same length ofR_1.fastq.gz
.Then you can give it a try:
R_1.fastq.gz
andR_fixed_2.fastq.gz
for alignment.This advice is not correct. By doing this you are bound to mess up the order of reads in the files.
I am going to move this answer to a comment. Please explain if you feel your solution is correct.
shrutidabral : Please use
repair.sh
from BBMap suite to make sure your read order is set right and singletons removed to a different file. This generally happens if you trim paired-end reads independently.@genomax, Thanks, I did not consider the situation, PE reads trimmed separately, as you mentioned. And
repair.sh
is a good solution for two files, read names was messed up.my previous comment is for the situation, reads are not in correct pairs, or files are truncated.
If it is true, trimmed separately, I will re-do the trimming in PE mode.
i did not preform any trimming in this data .
repair.sh
from bbmap does all this automatically while being more reliable since your solution will not correct for any corrupted entires within the file. Maybe for whatever reason some reads are out-of-sync or are singletons. I assume either something went wrong during sra2fastq conversion or the file contains singetons in which casefastq-dump
was probably not used withsplit-3
.OK i will follow up with your suggestion will report you back . thank you for clarity .