Hi,
I am aligning a full sample of paired-end on only chromosome X (human). the results show that only less than 5 percent is uniquely aligned and more than 95 percent is not aligned.
Does it seem correct or I am heading a wrong way to down sample my data.
the way I down sample is first wc -l sample forward and wc -l sample reverse
then the number is divided by two for each. next, head -n <the result="" of="" dividing=""> full sample > sub_sample for each of the reverse and forward. then I align them to the chromosome X.
Your immediate reply is appreciated.
Thanks
Haven't we discussed downsampling before in another thread? I think this can be a duplicate and therefore closed.
Dear Wouter, Based on that thread, these are the results. The question is about the results which look weird.
You are aligning a full RNA-seq dataset to chrX only and surprised that you get a low rate of alignment?
You are right. We had already discussed this in other thread: C: Sub_sampling Paired_end reads in Fastq.gz format.
I also entirely miss the point of all this downsampling, but that's perhaps not so important.
Dear Wouter,
I am aligning samples one by one. I got an error saying read ERR188257.7222000 HWI-962:71:D0PEYACXX:2:1205:3186:5646/1 has more read characters than quality values.
I divided the whole sample by four. It gave 0.5. so I added an extra 0.5. the down-sample is now 14444000 volume. Is it cousing the problem ? or I have to re-download the full sample ?
Thanks
Downsampling done right should never break individual fastq records. You must have a corrupt fastq record/file.
Dear Genomax,
I re-downloaded a sample. But again the same issue !!
I downloaded via wget from ebi.ac.uk I am unable to work with 4 samples of 12 samples because of this issue. samples are: ERR204916, ERR188428, ERR188401, ERR188257,
What do you suggest me to do ?
Thanks
This is hard to read. I don't understand. But you probably created a corrupt fastq file. Remember that a fastq record has 4 lines. Why you selected this method of 'sampling' (it's not real sampling, you are just taking a subset of the reads) after the answer you got here How to down-sample a full data is beyond me.
Dear Wouter, I have already downloaded the whole full data set consisting of 12 samples (paired-end). Then Divided each samples length by 2 once, and then by four and then by 8, now I have 3 sub-samples of one half, one fourth and one eighth of the whole full sample. (I did so for both forward and reverse). I did all this by getting the length of them first by wc -l and then, head command in Ubuntu. and then mapped them.
What does
grep -A 3 "ERR188257.7222000 HWI-962:71:D0PEYACXX:2:1205:3186:5646/1"
on the file in question show before splitting it?my system is busy for minutes ..., I will let you know as soon as it is finished
I just stopped the process. it gives no result. Instead I installed seqtk to downsample. This is the command I wrote and got the result. I hope it is now fine and I can continue with mapping 0.1 of sample to only chromosome X.