README.md

# rnaseq v1.1

**Rnaseq** workflow , which agree to FAIR principles , was built in Nexflow dsl2 language, with singularity container for used softwares, optimized in terms of computing resources (cpu, memory), and its use on a informatic farm with a slurm scheduler.

## Install rnaseq flow and build singularity image

Clone rnaseq git and build local singularity image (with system admin rights) based on the provided singularity definition file.

```bash
git clone https://forgemia.inra.fr/lpgp/rnaseq.git
sudo singularity build ./rnaseq/singularity/rnaseq.sif ./rnaseq/singularity/rnaseq.def
```

## Usage example with STAR for bulkRNAseq

### paired end
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/23.04.3
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "star_htseq-count" \
--genome "genome.fa.gz" \
--sjdbOverhang 80 \
--gff_gtf "annot.gtf.gz" \
--clip_r1 12 \
--three_prime_clip_r1 2 \
--clip_r2 12 \
--three_prime_clip_r2 2 \
--out_dir "${PWD}/results"
```

## Usage example with Salmon for bulkRNAseq

### single end
```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/23.04.3
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*.fastq.gz" \
--method "salmon_saf" \
--genome "genome.fa.gz" \
--transcriptome "transcriptome.fa.gz" \
--clip_r1 12 \
--three_prime_clip_r1 2 \
--out_dir "${PWD}/results"
```

### paired end

```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/23.04.3
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "salmon_saf" \
--genome "genome.fa.gz" \
--transcriptome "transcriptome.fa.gz" \
--clip_r1 12 \
--three_prime_clip_r1 2 \
--clip_r2 12 \
--three_prime_clip_r2 2 \
--out_dir "${PWD}/results"
```

## Usage example with STAR SOLO for BRB bulkRNAseq

As explained by BRB-Seq authors [BRB-seqTools](https://github.com/DeplanckeLab/BRB-seqTools) suite could be replace by STARsolo.

*BRB-seq tools suite was created in the early days of multiplexed libraries, when there was not many other alternatives to analyze BRB-seq libraries. Now, this is not the case anymore, so we would recommend using STARsolo instead, which should produce similar results in a single command.*

Trimmming option will be autamotically skipped when using STARsolo.

*In particular, do NOT use the output of the Trimming step of BRB-seq Tools as input for STARsolo, as this will not produce correct UMI values (without displaying an error message). STARsolo can provide read trimming that matches BRB-seq specificities using the --clipAdapterType CellRanger4 option*

```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/23.04.3
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--genome "genome.fa.gz" \
--sjdbOverhang 99 \
--gff_gtf "annot.gtf.gz" \
--method "star_solo" \
--barcodes  "barcodes.txt" \
--soloCBstart  1 \
--soloCBlen 12 \
--soloUMIstart 13 \
--soloUMIlen 16 \
--out_dir "${PWD}/results"
```

## Usage example with ALEVIN-fry (through simpleeaf) for scRNAseq

[simpleeaf](https://combine-lab.github.io/alevin-fry-tutorials/2023/simpleaf-piscem/)

### splici index (splice + intron)

```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/23.04.3
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "alevin_fry" \
--genome "genome.fa.gz" \
--gff_gtf "annot.gtf.gz" \
--rlen 91 \
--out_dir "${PWD}/results"
```

### spliceu index (splice + unsplice)

```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/23.04.3
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "alevin_fry" \
--genome "genome.fa.gz" \
--gff_gtf "annot.gtf.gz" \
--spliceu \
--out_dir "${PWD}/results"
```


## Usage example with SALMON ALEVIN for scRNA-seq (deprecasted)

```bash
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH -p unlimitq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/23.04.3
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "./raw/*_{1,2}.fastq.gz" \
--method "alevin_saf" \
--genome "genome.fa.gz" \
--transcriptome "transcriptome.fa.gz" \
--out_dir "${PWD}/results"
```

## Defaults parameters

Please refer to [Trim Galore](https://github.com/FelixKrueger/TrimGalore), [STAR](https://github.com/alexdobin/STAR), [htseq-count](https://htseq.readthedocs.io/en/master/), and
[Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) for complete arguments explanation.

```bash
# sequences
input = false
genome = false
transcriptome = false

# fastqc
skip_fastqc = false

# trimming
skip_trimming = false
clip_r1 = 0
clip_r2 = 0
three_prime_clip_r1 = 0
three_prime_clip_r2 = 0

# alignment mode should be star_htseq-count and/or salmon_saf for bulk-RNAseq
# alignment mode should be star_solo and/or alevin_saf and/or alevin_fry for BRBseq or scRNAseq
method = false

# STAR options
star_index = false
gff_gtf = false
sjdbOverhang = 99
keep_star_index = false
htseq_count_multimapped = false
feature_type = "exon"

# SALMON options
salmon_index = false
keep_salmon_index = false
writeMappings = false

# ALEVIN options
alevin_fry_index = false
keep_alevin_fry_index = false
rlen = false
spliceu = false

# STAR SOLO options
star_index = false
gff_gtf = false
keep_star_index = false
barcodes = false
soloCBstart =  1
soloCBlen = 12
soloUMIstart = 13
soloUMIlen = 16

# save directory
out_dir = "${PWD}/results"
```

## References

1. Krueger F, Galore T. A wrapper tool around cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files [Internet]. Available from: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
2. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
3. Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9.
4. Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol. 2020;21:239.
5. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14:417–9.