Newer
Older
**Rnaseq** workflow , which agree to FAIR principles , was built in Nexflow dsl2 language, with singularity container for used softwares, optimized in terms of computing resources (cpu, memory), and its use on a informatic farm with a slurm scheduler.
## Install rnaseq flow and build singularity image
Clone rnaseq git and build local singularity image (with system admin rights) based on the provided singularity definition file.
```bash
git clone https://forgemia.inra.fr/lpgp/rnaseq.git
sudo singularity build ./rnaseq/singularity/rnaseq.sif ./rnaseq/singularity/rnaseq.def
```
## Usage example with STAR for bulkRNAseq
### paired end
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "star_htseq-count" \
--genome "genome.fa.gz" \
--sjdbOverhang 80 \
--gff_gtf "annot.gtf.gz" \
--clip_r1 12 \
--three_prime_clip_r1 2 \
--clip_r2 12 \
--three_prime_clip_r2 2 \
--out_dir "${PWD}/results"
```
## Usage example with Salmon for bulkRNAseq
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
module load containers/singularity/3.9.9
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--method "salmon_saf" \
--genome "genome.fa.gz" \
--transcriptome "transcriptome.fa.gz" \
### paired end
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "salmon_saf" \
--genome "genome.fa.gz" \
--transcriptome "transcriptome.fa.gz" \
--clip_r1 12 \
--three_prime_clip_r1 2 \
--clip_r2 12 \
--three_prime_clip_r2 2 \
--out_dir "${PWD}/results"
```
## Usage example with STAR SOLO for BRB bulkRNAseq
As explained by BRB-Seq authors [BRB-seqTools](https://github.com/DeplanckeLab/BRB-seqTools) suite could be replace by STARsolo.
*BRB-seq tools suite was created in the early days of multiplexed libraries, when there was not many other alternatives to analyze BRB-seq libraries. Now, this is not the case anymore, so we would recommend using STARsolo instead, which should produce similar results in a single command.*
*In particular, do NOT use the output of the Trimming step of BRB-seq Tools as input for STARsolo, as this will not produce correct UMI values (without displaying an error message). STARsolo can provide read trimming that matches BRB-seq specificities using the --clipAdapterType CellRanger4 option*
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--genome "genome.fa.gz" \
--sjdbOverhang 99 \
--gff_gtf "annot.gtf.gz" \
--method "star_solo" \
--soloCBstart 1 \
--soloCBlen 12 \
--soloUMIstart 13 \
--soloUMIlen 16 \
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
## Usage example with STAR SOLO for scRNAseq
```bash
# get The 10X Chromium V3 whitelist
wget https://github.com/10XGenomics/cellranger/raw/master/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
# unzip
gunzip 3M-february-2018.txt.gz
```
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/21.10.6
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--genome "genome.fa.gz" \
--gff_gtf "annot.gtf.gz" \
--method "star_solo" \
--whitelist "3M-february-2018.txt" \
--soloCBstart 1 \
--soloCBlen 16 \
--soloUMIstart 17 \
--soloUMIlen 12 \
--out_dir "${PWD}/results"
```
## Usage example with ALEVIN-fry (through simpleaf) for scRNAseq
[simpleaf](https://combine-lab.github.io/alevin-fry-tutorials/2023/simpleaf-piscem/)
### splici index (splice + intron)
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "alevin_fry" \
--genome "genome.fa.gz" \
--gff_gtf "annot.gtf.gz" \
--rlen 91 \
--out_dir "${PWD}/results"
```
### spliceu index (splice + unsplice)
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "alevin_fry" \
--genome "genome.fa.gz" \
--gff_gtf "annot.gtf.gz" \
--spliceu \
--out_dir "${PWD}/results"
```
## Usage example with SALMON ALEVIN for scRNA-seq (deprecasted)
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "alevin_saf" \
--transcriptome "transcriptome.fa.gz" \
Please refer to [Trim Galore](https://github.com/FelixKrueger/TrimGalore), [STAR](https://github.com/alexdobin/STAR), [htseq-count](https://htseq.readthedocs.io/en/master/), and
[Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) for complete arguments explanation.
# sequences
input = false
genome = false
transcriptome = false
# fastqc
skip_fastqc = false
# trimming
skip_trimming = false
clip_r1 = 0
clip_r2 = 0
three_prime_clip_r1 = 0
three_prime_clip_r2 = 0
# alignment mode should be star_htseq-count and/or salmon_saf for bulk-RNAseq
# alignment mode should be star_solo and/or alevin_saf and/or alevin_fry for BRBseq or scRNAseq
method = false
# STAR options
star_index = false
gff_gtf = false
keep_star_index = false
htseq_count_multimapped = false
# SALMON options
salmon_index = false
keep_salmon_index = false
writeMappings = false
alevin_fry_index = false
keep_alevin_fry_index = false
rlen = false
spliceu = false
# STAR SOLO options
star_index = false
gff_gtf = false
keep_star_index = false
barcodes = false
soloCBstart = 1
soloCBlen = 12
soloUMIstart = 13
soloUMIlen = 16
## References
1. Krueger F, Galore T. A wrapper tool around cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files [Internet]. Available from: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
2. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
3. Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9.
4. Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol. 2020;21:239.
5. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14:417–9.