You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Mariano Santoro 65004c4927 README up to date 4 months ago
scripts Cleaned up scripts and README up to date 4 months ago README up to date 4 months ago

This repository hosts the bash and R scripts used for the data analysis in "Acclimation of Nodularia spumigena CCY9414 to inorganic phosphate limitation - Identification of the P-limitation stimulon via RNA-seq" (authors: Santoro M., Hassenrueck C., Labrenz M., Hagemann M.), where we evaluated the phyisiological response of Nodularia spumigena CCY9414 to different phosphate concentrations in two independent cultivation experiments.

Measured parameters:

  • ammonium and phosphate concentrations in the growth medium (descriptive stats only)
  • dry weight and polyphosphate amount
  • gene expression (RNA-seq)

Data availability:

  • Transcriptomic reads and processed feature counts are accessible from the GEO database ( with the following accession number: GSE213384.
  • Physiological data have been submitted to PANGAEA database (DOI number pending)

Bioinformatic sequence processing and RNA-seq analysis

Scripts: and rnaseq_analysis.R and

  • Quality-trimming and adapter-clipping of the reads with BBDuk using a sliding window approach with a window size of 4 bp and an average base quality of 15
  • Removal of Poly-G repeats longer than 10 bp and discard of reads shorter than 50 bp
  • Mapping of quality-trimmed reads against the reference genome of Nodularia CCY9414 (NCBI RefSeq accession: GCF_000340565.2) using the program bwa-mem
  • Exclusion of remaining hits to ribosomal RNA genes
  • Further filtering of the mapping results to remove secondary and supplementary alignments, and alignments shorter than 50 bp and with less than 95% sequence identity across the whole read to the reference
  • Read counts per gene were then calculated with featureCounts and converted to transcript percentages accounting for variable gene length
  • Differential gene expression was assessed using DESeq2 between P-replete and P-deplete conditions at each sampling time point and between day 7 and day 14 in each P condition
  • Re-annotation of the reference genome of Nodularia CCY9414 against KEGG using diamond blastp version in sensitive mode, supplemented by kofamscan version 1.3.0
  • Operons were predicted with OperonMapper (
  • Functional enrichment analysis based on the KEGG pathway hierarchy using the proportion of genes per pathway of the total number of genes in the genome in a X² goodness-of-fit analysis

Statistical data analysis of dry weights and polyphosphate amounts

Scripts: dry_weights_and_nutrients_data_analysis.R and Polyp_experiments_analysis_and_plots.R

  • General Linear Mixed Model for assessment of the effect of sampling time point and P conditions with experiment iteration as random factor
  • To meet the assumption of normality, polyphosphate concentrations were square-root transformed
  • Removal of outlier observations with a Cook's distance of more than 4 divided by sample size
  • Post Hoc test of the General Linear Mixed model using the emmeans package


The R code to genearte the figures for the manuscript and supplementary material is included in the analysis scripts.