Hello Biostars Community,
I am trying to map NextSeq reads to a reference genome to create an assembly that I can use for variant analysis, but I am having some troubles getting my raw data figured out.
My data is coming from a NextSeq PE run, and when I download the fastq files from BaseSpace I get 8 files total; The filenaming is leading me to believe that data from each lane is being separated:
genome_S1_L001_R1_001.fastq.gz
genome_S1_L001_R2_001.fastq.gz
genome_S1_L002_R1_001.fastq.gz
genome_S1_L002_R2_001.fastq.gz
genome_S1_L003_R1_001.fastq.gz
genome_S1_L003_R2_001.fastq.gz
genome_S1_L004_R1_001.fastq.gz
genome_S1_L004_R2_001.fastq.gz
I would like to use the Map with BWA tool in galaxy, but I can't get the tool to accept the data in this format. In the past I've had simple _f
and _r
reads that have behaved nicely in my workflows so this is throwing me for a loop. Any suggestions for tools to convert this type of data for an easier analysis?
Thanks for your help!
Please post this over at Galaxy help site for prompt response.
What is happening exactly? Are you getting an error? You can either concatenate all R1 and R2 files (in same order) to create a single file (assuming this is same sample running on all
4 lanes
) or start 4 parallel mapping jobs and then merge alignment files downstream.I have tried running the R1 and R2 files from a single lane as paired reads and also as single reads. In either scenario the Map with BWA tool will start the job but will eventually fail. In some cases I get this error: