Hi, I'm an undergraduate student. Please help me.
I want mapping-bam-file from sra-dataset from NCBI in order to analyze heterogeneity of mouse ESC.
The dataset is generated from a paper below.(GSE60749)
Roshan M.Kumar et al. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature(2014)
Please teach me how to get adapter sequence used in this experimentation
and what I should use for quality control(Prinseq?ShortRead?).
For example, I want to try GEO Sample GSM1486817 sra file.
Can anyone give me process of quality control of this sra?
I thank you for reading it through.
Any help will be appreciated.
You can also reproduce all these steps in Genestack platform:
There are also other preprocess apps that you can use to improve the quality of your data.
Actually using using FastQC is not recommended for finding adapters for the simple fact that it is not able to find adapters which it does not known a priori (i.e. they are no in its database). Therefore if that GSE60749 if uses some adapters which are not known by FastQC, then FastQC will not find them. FastQC is not able to find unknown adapters.
Obviously if those adapters are not known, fastQC will not know. But in general, for the majority of the experiments a well known adapters are used. FastQC will retrieve as over-represented sequence all the sequences that are repeated more than X times. These sequences could be adapters, contaminants, polyA... If they are adapters and there are present in fastQC db, you'll have a "tag" indicating it. If it isn't in the db, you'll have also the sequence but without "tag", I mean, you won't know if it is an adapter or another type of repeated seq. Just try it and see what it happen.
Actually, our experience is that FastQC will fail to find adapters in most of the cases.
Our experience is that in 99% of the cases researchers do not validate the results of FastQC and they trust blindly the info which FastQC gives regarding the adapters.
Incidentally, BBMerge is able to find unknown adapters, if the reads are paired:
Indeed BBMerge is able to find unknown adapters. I used it myself all the time. Another one is fusioncatcher's remove_adapter script
There are actually even more than these tools for finding unknown adapters.
Thank you so much everyone!!
To Evgeniia Golovina
I dont know Genestack platforms. This may makes my analysis so smooth. I can omit the process to install some apps for RNAseq analysis.
To enxxx23
Your advice is very helpful for me. I would like to try another way.
To airan
I got some over-represented sequences from FastQC report. There same over-represented sequences between some samples.
I guess they would be adpters. But I dont find the "tag" you said...
To Brian Bushnell
Thanks!! I want to try BBMerge right now.