Adaptor trimming for RNAseq
1
0
Entering edit mode
6.0 years ago
MAPK ★ 2.1k

I have a few samples of RNA-seq data. I have this adaptor sequience (AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG (NNNNNN= 6 nt index)). Can someone please tell me what NNNNNNN =6 index indicates here? I was using BBMap tools to do the trimming and created the adaptor file like below, but I do not see any change in my new trimmed file (in terms of file size).

>adaptor1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
RNA-Seq • 1.4k views
ADD COMMENT
1
Entering edit mode

firstly, file size is not always the best indicator of success or failure ;-) .

where did you get that adaptor sequence from? Did the seq facility provide that to you? I think they simply want to indicate there is a 6nt index present at the location of the Ns

ADD REPLY
0
Entering edit mode

Yes, it was provided by the sequencing facility.

ADD REPLY
0
Entering edit mode

and they did not provide any additional information, eg. what the index might be then?

long shot, but might they be present in the read headers in the fastq file?

ADD REPLY
0
Entering edit mode

No, they did not provide anymore information, except for that adaptor sequence. The fastq header only has this info:

@K00317:102:HKVG3BBXX:4:1101:15889:1349 1:N:0:TGGTGAAG+TGGTGAAG
GTAGACTTGATAGTGATACCACGCTCTTGTTCAACGGCACGAGTATCGGGAGCTCTGGCATCACCAGCCTTTGCGGCGGA
+
AAFFFJJJ7FJFJJFJAJ<JJJJJFJJJJ<JJF-FFJJJJ-<JJJ7JJJ7FA<-A7FJJJFJJJAJAAJFAJ7AA7--<F
@K00317:102:HKVG3BBXX:4:1101:16295:1349 1:N:0:TGGTGAAG+TGGTGAAG
GGACAGGTCTGGCTCGGTTTTTGATATCTTCAGTTTAAGGGGGCGAATATCGTTACCCATAGATACCGAAAGTGGCCGAC
+
ADD REPLY
0
Entering edit mode

What does the summary stats say about the trimming? Perhaps there are no adapters (always theoretically possible) or you were given an incorrect adapter sequence (also possible).

ADD REPLY
3
Entering edit mode
6.0 years ago
GenoMax 147k

I would not worry about what NNNNN is, if the core sequence you are trying to remove is common. Use literal=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC ktrim=r to get all variations of NNNNN.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. So just to be clear, should I just have

>adaptor1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

and omit NNNNNNATCTCGTATGCCGTCTTCTGCTTG?

ADD REPLY
1
Entering edit mode

Sure. As long as the leading sequence is identical for all adapters.

ADD REPLY
0
Entering edit mode

or the trailing? [confused myself here]

ADD REPLY
0
Entering edit mode

Once the first part is identified one generally wants to remove everything till the end on right so the trailing part gets automatically included.

ADD REPLY
0
Entering edit mode

ok, so that was my confusion: the right side is the end thus (== not next to the actual sequence?)

ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6