Mapping Illumina reads against one single gene
2
2
Entering edit mode
9.8 years ago
User000 ▴ 710

Hello,

I have approx. 250,000,000 Illumina paired-end reads of 100bp length from one organism. I have a gene of 5000bp length from another organism. I wanted to map reads to my sequence and create a consensus sequence. My question is: Is it OK mapping illumina DNA-seq or RNA-seq reads against one single gene? Cause in literature everything is mapped against the whole genome, which in my case is limited due to computer space. Any suggestions are appreciated.

next-gen • 11k views
ADD COMMENT
0
Entering edit mode

Just a thought: Presumably you don't really need to align *all* the 250M reads, unless you need a coverage of millions x (!!). So maybe better to get a sample of few hundred thousand, good quality reads and align them with a sensitive aligner like blast. I mean, NGS aligners are usually a trade off between speed, efficiency and sensitivity/specificity. In your case efficiency is not an issue as the reference is small and you might get away with few reads.

ADD REPLY
0
Entering edit mode

But that only works, if the complete genome is small. A few 100k of 250M is not enough, if we are talking about a big genome.

ADD REPLY
1
Entering edit mode

Ahhh... Sure I misunderstood the question, I thought the 250M reads came from that gene alone as if they sequenced just that amplicon... Apologies about the confusion.

ADD REPLY
1
Entering edit mode
9.8 years ago

There is nothing wrong in mapping the reads to a single gene. But try to use bwa-mem instead of bowtie2 as it is different organism. You can also try to do a local blast (convert fastq to fasta) against the gene of interest. But you should be really careful with reads that are generated partially from gene of interest.

ADD COMMENT
0
Entering edit mode

Thank you for the answer. Indeed, I used BWA. So in order to save time and space If I know the coordinates, I can just take my short sequence and map against it.

ADD REPLY
0
Entering edit mode

you can use bedtools getfasta to get fasta sequence from regions of interest (in a bed format).

ADD REPLY
0
Entering edit mode

Thanks, what do you mean by 'reads that are generated partially from gene of interest'?

ADD REPLY
1
Entering edit mode

if its a whole genome sequence data, the entire read might not have generated only from genic region. So first or last part of read is from gene and rest is from non-genic region. When you map this read only to gene, it will map partially, and hence you should look at CIGAR string and keep in mind about soft/hard clipping.

ADD REPLY
1
Entering edit mode
9.8 years ago

I don't know the question you want to answer, but perhaps it would make sense to add this one gene to the whole genome sequence of the species with which you did the experiment (assuming you have it). Just add it to the whole genome fasta as a new fasta entry in the end and create the index over the new fasta file. This way, you would assure that reads that have a better hit in the whole genome will not be mis-places (with more errors) to the one gene. But if you do that, you should allow multiple mapping loci, or turn of multiple mapping completely (depending on the tool you use).

ADD COMMENT
0
Entering edit mode

but if you add it to whole genome, definitely reads will map to two regions, one to the genome (region where actual gene is present) and to the extra added gene. SO I think we should definitely enable to report multiple mapping locations.

ADD REPLY
0
Entering edit mode

I also think that you should enable the multiple mappings.

But as I said: It highly depends on the question you want to answer.

Lets say User000 wants to find only reads that map uniquely to this one gene and nowhere else. Question would be: Do I have that gene present in my organism. Or: Was that gene introduced to my genome, or not. Then it would be good to turn of the multiple mapping option, since most tools need much more time with this parameter turned on.

But I have no clue, what the question might be! ;)

ADD REPLY
0
Entering edit mode

I want to see if that particular gene is present in sequenced organism, basically, this is why I am interested in doing mapping of reads against only one gene.

ADD REPLY
0
Entering edit mode

and if I do not have the whole genome sequence..? Basically, the question is, does it make sense to map illumina seq reads using as a reference genome just one single gene in fasta format..

ADD REPLY
1
Entering edit mode

It makes sense, but don't trust mappings with too many errors, since they have a high chance to be wrong mapping loci.

ADD REPLY
0
Entering edit mode

Yes. you can.

ADD REPLY

Login before adding your answer.

Traffic: 843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6