I am searching for a human mRNA sample on SRA database that is untrimmed, but I do not know how to check if it's trimmed or not
I am searching for a human mRNA sample on SRA database that is untrimmed, but I do not know how to check if it's trimmed or not
You can use FASTQC to check if the sequences are trimmed or you need to remove the adapters, etc...
FastQC can be of help. If data is untrimmed then all reads will be reported as full size and will match the reported length of sequencing. Generally after trimming reads will have a distribution in FastQC read length plot since all of them may not remain full length after trimming.
Note: There is a possibility that the data has NO extraneous sequence and thus would still remain full length after trimming.
Basically you don't. While it is convention (afaik) to upload the raw data as they come from demultiplexing, the actual uploaded data is what the authors well...uploaded, and this in theory can be anything. There is no bullet-proof way to know beside emailing them.
Though, trimming would usually result in unequal read lengths throughout the files (adapter-containing reads get trimmed, others remain untrimmed) so this is something you can check. I mean in the end it does not really matter, does it? If you want to use the public dataset you are after then you have to use what is provided, and a good QC should always start with something like fastqc
to assess whether trimming for adapters or quality was necessary, so this you anyway have to do, regardless how the data have been treated by the uploader before.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
In my understanding we submit raw data in NCBI GEO database with md5sum information not trimmed data. Although, you can perform quality check to see if data is trimmed.