|
4 months ago | |
---|---|---|
scripts | 4 months ago | |
README.md | 4 months ago |
README.md
This repository hosts the bash and R scripts used for the data analysis in "Acclimation of Nodularia spumigena CCY9414 to inorganic phosphate limitation - Identification of the P-limitation stimulon via RNA-seq" (authors: Santoro M., Hassenrueck C., Labrenz M., Hagemann M.), where we evaluated the phyisiological response of Nodularia spumigena CCY9414 to different phosphate concentrations in two independent cultivation experiments.
Measured parameters:
- ammonium and phosphate concentrations in the growth medium (descriptive stats only)
- dry weight and polyphosphate amount
- gene expression (RNA-seq)
Data availability:
- Transcriptomic reads and processed feature counts are accessible from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) with the following accession number: GSE213384.
- Physiological data have been submitted to PANGAEA database (DOI number pending)
Bioinformatic sequence processing and RNA-seq analysis
Scripts: rnaseq_seqprep.sh and rnaseq_analysis.R and reannotation_reference.sh
- Quality-trimming and adapter-clipping of the reads with BBDuk using a sliding window approach with a window size of 4 bp and an average base quality of 15
- Removal of Poly-G repeats longer than 10 bp and discard of reads shorter than 50 bp
- Mapping of quality-trimmed reads against the reference genome of Nodularia CCY9414 (NCBI RefSeq accession: GCF_000340565.2) using the program bwa-mem
- Exclusion of remaining hits to ribosomal RNA genes
- Further filtering of the mapping results to remove secondary and supplementary alignments, and alignments shorter than 50 bp and with less than 95% sequence identity across the whole read to the reference
- Read counts per gene were then calculated with featureCounts and converted to transcript percentages accounting for variable gene length
- Differential gene expression was assessed using DESeq2 between P-replete and P-deplete conditions at each sampling time point and between day 7 and day 14 in each P condition
- Re-annotation of the reference genome of Nodularia CCY9414 against KEGG using diamond blastp version 2.0.14.152 in sensitive mode, supplemented by kofamscan version 1.3.0
- Operons were predicted with OperonMapper (https://biocomputo.ibt.unam.mx/operon_mapper/)
- Functional enrichment analysis based on the KEGG pathway hierarchy using the proportion of genes per pathway of the total number of genes in the genome in a X² goodness-of-fit analysis
Statistical data analysis of dry weights and polyphosphate amounts
Scripts: dry_weights_and_nutrients_data_analysis.R and Polyp_experiments_analysis_and_plots.R
- General Linear Mixed Model for assessment of the effect of sampling time point and P conditions with experiment iteration as random factor
- To meet the assumption of normality, polyphosphate concentrations were square-root transformed
- Removal of outlier observations with a Cook's distance of more than 4 divided by sample size
- Post Hoc test of the General Linear Mixed model using the emmeans package
Plotting
The R code to genearte the figures for the manuscript and supplementary material is included in the analysis scripts.