Question

Mapping NextSeq reads to reference genome in Galaxy

0

Entering edit mode

5.2 years ago

WCG3 ▴ 10

Hello Biostars Community,

I am trying to map NextSeq reads to a reference genome to create an assembly that I can use for variant analysis, but I am having some troubles getting my raw data figured out.

My data is coming from a NextSeq PE run, and when I download the fastq files from BaseSpace I get 8 files total; The filenaming is leading me to believe that data from each lane is being separated:

genome_S1_L001_R1_001.fastq.gz
genome_S1_L001_R2_001.fastq.gz
genome_S1_L002_R1_001.fastq.gz
genome_S1_L002_R2_001.fastq.gz
genome_S1_L003_R1_001.fastq.gz
genome_S1_L003_R2_001.fastq.gz
genome_S1_L004_R1_001.fastq.gz
genome_S1_L004_R2_001.fastq.gz

I would like to use the Map with BWA tool in galaxy, but I can't get the tool to accept the data in this format. In the past I've had simple _f and _r reads that have behaved nicely in my workflows so this is throwing me for a loop. Any suggestions for tools to convert this type of data for an easier analysis?

Thanks for your help!

Assembly nextseq • 1.2k views

ADD COMMENT • link updated 16 months ago by Ram 44k • written 5.2 years ago by WCG3 ▴ 10

0

Entering edit mode

Please post this over at Galaxy help site for prompt response.

ADD REPLY • link 5.2 years ago by GenoMax 147k

0

Entering edit mode

I can't get the tool to accept the data in this format.

What is happening exactly? Are you getting an error? You can either concatenate all R1 and R2 files (in same order) to create a single file (assuming this is same sample running on all 4 lanes) or start 4 parallel mapping jobs and then merge alignment files downstream.

ADD REPLY • link 5.2 years ago by GenoMax 147k

0

Entering edit mode

I have tried running the R1 and R2 files from a single lane as paired reads and also as single reads. In either scenario the Map with BWA tool will start the job but will eventually fail. In some cases I get this error:

 paired reads have different names: "NS500623:145:HJWYVAFXX:1:11101:10756:1037", "NS500623:145:HJWYVAFXX:1:11101:24255:1101"

ADD REPLY • link 5.2 years ago by WCG3 ▴ 10

score 0 · Answer 1 · 2019-09-30

0

Entering edit mode

5.2 years ago

GenoMax 147k

I can only speculate but it is possible that the data file were trimmed individually (R1 alone, R2 alone) which led to reads getting out of sync in R1/R2 files.

If you did the trimming yourself then you need to go back to the original files and redo the trimming with R1/R2 file together for each lane. This way if a read is removed from R1 or R2 file its mate is also removed from other file maintaining the sync on rest of the reads.

If you did not trim the data yourself then ask your sequencing provider.

ADD COMMENT • link 5.2 years ago by GenoMax 147k

0

Entering edit mode

Your instincts are good! After concatenating the R1/R2 files and retrimming I had no problem with the alignments. Thanks for the troubleshooting advice!

ADD REPLY • link 5.2 years ago by WCG3 ▴ 10