Entering edit mode
5.8 years ago
Sakhaa
▴
10
Hello everyone,
I'm working on DNA data and my task generates a pipeline VCF from Fastq file, with testing different read aliner tool and different VC tools.
Now: I want to convert FASTQ to BAM by using Bowtie2 tools. I need an example to explain that step by step in a shell command.
could you help me if you know some useful resources or open source pipelines?
The reference that I'm using isGRch37 and Data from NIST
@Istvan unless you are managing this differently inside recipes this Entrezdirect part of the recipe is not correct:
efetch -db=nuccore -format=fasta
. There should be no=
, just spaces.heh, I don't even remember how I ended up with
=
signs, I believe they are valid to use, since the recipe does work. That's what's great about recipes. The code actually gets run by the website, the code is "alive" so to speak, errors that crash it out will get caught right away unlike "static" code examples. See the results tab for each recipe or this:but I do agree that we should not using
=
signs at all so I will fix that.(PS: I think the usage might be a remnant of an old version of entrez-direct that did require the
=
to be present).Thank you very much @Istvan, I'm CS student and I'm a beginner in bioinformatics field, I want to ask some questions related to human reference, I have heard that a reference genome such as humans is generated by randomly choosing samples from a group of donors. But why do we call the DNA sequence generated as a reference? Why should we believe those few samples can represent all humans, from which we need to align with?
Also, about building the index how we build the index for GRCh37
That is described in this step in the recipe that @Istvan linked to. You can download the reference fasta sequence for GRCh37 from NCBI here or from GENCODE here.
As for the validity of what we consider the human reference is a philosophical question. It is the best we have for now and what has been used for years. Since most of the research is relative to that reference you can continue using that method. There are efforts underway to get more representative full genomes from diverse populations.
Also see:
Where Were The Human Genome Reference Samples Taken From?
Which human reference genome should I use?
The way this site works is that we try to keep each page on a single topic. Hence we don't recommend asking new questions in the comment section. It won't show up to other people unless they dig deep in the comment section etc. Another way to say this is that the site is not a back and forth discussion forum, but a focused Q&A. I would suggest that you ask your question as a new, separate top-level question. But first, consult the topics that genomax has posted.