Question

Question: Converting FASTQ to BAM by using Bowtie2

0

Entering edit mode

5.8 years ago

Sakhaa ▴ 10

Hello everyone,

I'm working on DNA data and my task generates a pipeline VCF from Fastq file, with testing different read aliner tool and different VC tools.

Now: I want to convert FASTQ to BAM by using Bowtie2 tools. I need an example to explain that step by step in a shell command.

could you help me if you know some useful resources or open source pipelines?

The reference that I'm using isGRch37 and Data from NIST

alignment sequencing SNP genome • 4.9k views

ADD COMMENT • link updated 5.8 years ago by Istvan Albert 102k • written 5.8 years ago by Sakhaa ▴ 10

score 2 · Answer 1 · 2019-03-25

2

Entering edit mode

5.8 years ago

Istvan Albert 102k

On the Bioinformatics Recipe website I have an example for how to use bowtie2:

https://www.bioinformatics.recipes/recipe/view/recipe-bowtie/

in a nutshell you have to build an index for the reference then run the

see other recipes here:

https://www.bioinformatics.recipes/recipe/list/bio-data-analysis/

ADD COMMENT • link 5.8 years ago by Istvan Albert 102k

0

Entering edit mode

@Istvan unless you are managing this differently inside recipes this Entrezdirect part of the recipe is not correct: efetch -db=nuccore -format=fasta. There should be no =, just spaces.

ADD REPLY • link 5.8 years ago by GenoMax 148k

0

Entering edit mode

heh, I don't even remember how I ended up with = signs, I believe they are valid to use, since the recipe does work. That's what's great about recipes. The code actually gets run by the website, the code is "alive" so to speak, errors that crash it out will get caught right away unlike "static" code examples. See the results tab for each recipe or this:

https://www.bioinformatics.recipes/job/view/39441b05/

but I do agree that we should not using = signs at all so I will fix that.

(PS: I think the usage might be a remnant of an old version of entrez-direct that did require the = to be present).

ADD REPLY • link 5.8 years ago by Istvan Albert 102k

0

Entering edit mode

Thank you very much @Istvan, I'm CS student and I'm a beginner in bioinformatics field, I want to ask some questions related to human reference, I have heard that a reference genome such as humans is generated by randomly choosing samples from a group of donors. But why do we call the DNA sequence generated as a reference? Why should we believe those few samples can represent all humans, from which we need to align with?

Also, about building the index how we build the index for GRCh37

ADD REPLY • link 5.8 years ago by Sakhaa ▴ 10

0

Entering edit mode

Also, about building the index how we build the index for GRCh37

That is described in this step in the recipe that @Istvan linked to. You can download the reference fasta sequence for GRCh37 from NCBI here or from GENCODE here.

bowtie2-build $REF $REF  1>> log.txt 2>> log.txt

As for the validity of what we consider the human reference is a philosophical question. It is the best we have for now and what has been used for years. Since most of the research is relative to that reference you can continue using that method. There are efforts underway to get more representative full genomes from diverse populations.

Also see:

Where Were The Human Genome Reference Samples Taken From?
Which human reference genome should I use?

ADD REPLY • link 5.8 years ago by GenoMax 148k

0

Entering edit mode

The way this site works is that we try to keep each page on a single topic. Hence we don't recommend asking new questions in the comment section. It won't show up to other people unless they dig deep in the comment section etc. Another way to say this is that the site is not a back and forth discussion forum, but a focused Q&A. I would suggest that you ask your question as a new, separate top-level question. But first, consult the topics that genomax has posted.

ADD REPLY • link 5.8 years ago by Istvan Albert 102k