Stochastic logistic models reproduce experimental time series of microbial communities
 LanaDescheemaeker12
 SophiedeBuyl[email protected]12
 Research Article
 Computational and Systems Biology
 stochastic generalized lotkavolterra equations
 logistic model
 microbial communities dynamics
 noise analysis
 None
 publisherid55650
 doi10.7554/eLife.55650
 elocationide55650
Abstract
We analyze properties of experimental microbial time series, from plankton and the human microbiome, and investigate whether stochastic generalized LotkaVolterra models could reproduce those properties. We show that this is the case when the noise term is large and a linear function of the species abundance, while the strength of the selfinteractions varies over multiple orders of magnitude. We stress the fact that all the observed stochastic properties can be obtained from a logistic model, that is, without interactions, even the niche character of the experimental time series. Linear noise is associated with growth rate stochasticity, which is related to changes in the environment. This suggests that fluctuations in the sparsely sampled experimental time series may be caused by extrinsic sources.
#general imports
# Data manipulation
import pandas as pd
import numpy as np
# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30
from IPython import get_ipython
ipython = get_ipython()
# autoreload extension
if 'autoreload' not in ipython.extension_manager.loaded:
%load_ext autoreload
%autoreload 2
import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline
import time
np.random.seed(int(time.time()))
#specific imports
import matplotlib as mpl
from noise_analysis import noise_color
from scipy import stats
from noise_properties_plotting import noise_cmap_ww, noise_lim, PiecewiseNormalize, \
PlotTimeseriesComparison, PlotNoiseColorComparison
from generate_timeseries import Timeseries
from noise_parameters import NOISE, MODEL
from matplotlib import font_manager
font_manager._rebuild()
from elife_settings import set_elife_settings, ELIFE
set_elife_settings()
Introduction
Microbial communities are found everywhere on earth, from oceans and soils to gastrointestinal tracts of animals, and play a key role in shaping ecological systems. Because of their importance for our health, humanassociated microbial communities have recently received a lot of attention. According to the latest estimates, for each human cell in our body, we count one microbe 27Sender et al., 2016. Dysbiosis in the gut microbiome is associated with many diseases from obesity, chronic inflammatory diseases, some types of cancer to autism spectrum disorder 12Gilbert et al., 2016. It is therefore crucial to recognize what a healthy composition is, and if unbalanced, be able to shift the composition to a healthy state. This asks for an understanding of the ecological processes shaping the community and dynamical modeling.
The dynamics of complex ecosystems can be studied by considering the number of individuals of each species, referred to as abundances, at subsequent time points. There are several ways to characterize experimental time series properties. Models typically focus on one specific aspect such as the stability of the community 22May, 19723Coyte et al., 201517Levine et al., 201714Grilli et al., 20179Gavina et al., 201810Gibbs et al., 2018, the neutrality 8Fisher and Mehta, 201433Washburne et al., 2016, or mechanisms leading to longtailed rank abundance distributions 30Solé et al., 20021Brown et al., 200224McGill et al., 200721Matthews and Whittaker, 2015. Different types of dynamical models have been proposed. A first distinction can be made between neutral and nonneutral models. Neutral models assume that species are ecologically equivalent and that all variation between species is caused by randomness. In such models, no competitive or other interactions are included. A second distinction is made between populationlevel and individualbased models. Generalized LotkaVolterra (gLV) models describe the system at the population level and assume that the interactions between species dictate the community’s time evolution. Both deterministic and stochastic implementations exist for gLV models. Stochastic models include a noise term. There are multiple origins of the noise: intrinsic noise captures the fluctuations due to small numbers, extrinsic noise models external factors such as changing immigration rates of species or changing growth rates mediated by a varying flux of nutrients. Individualbased or agentbased models include selforganized instability models 30Solé et al., 2002 and the controversial neutral model of 16Hubbell, 2001; 26Rosindell et al., 2011. A classification scheme that assesses the relative importance of different ecological processes from time series has been proposed in 7Faust et al., 2018. The scheme is based on a test for temporal structure in the time series via an analysis of the noise color and neutrality. Applied to the time series of human stool microbiota, it tells us that stochastic gLV or selforganized instability models are more realistic. Here, we will however only focus on stochastic gLV models. The reason for this is twofold. First, one can encompass the whole spectrum of ecosystems from neutral to niche with gLV models 8Fisher and Mehta, 2014. Second, we aim at describing dense ecosystems and even though an individualbased model might be more accurate, in the large number limit it will be captured by a Langevin approximation, that is, by the stochastic gLV model.
Our goal is to compare time series generated by stochastic gLV models with experimental time series of microbial communities. We aim at capturing all observed properties mentioned above—the rank abundance profile, the noise color, and the niche character—as well as the statistical properties of the differences between abundances at successive time points with one model. As is shown in Properties of experimental time series, the abundance distribution is heavytailed, which means that few species are highly abundant and many species have low abundances. Despite the large differences in abundances, the ratios of abundances at successive time points and the noise color are independent of these abundances and although the fluctuations are large, the results of the neutrality tests indicate that the experimental time series are in the niche regime. To sum up, we seek growth rates, interaction matrices, immigration rates, and an implementation of the noise in stochastic gLV models to obtain the experimental characteristics.
We simulated time series using gLV equations. The interaction matrices are random as was introduced by 22May, 1972. The growth rates are determined by the choice of the steadystate, which is set to either equal abundances for all species or abundances according to the rank abundance profiles found for experimental data. For the noise, we consider different implementations corresponding to different sources of intrinsic and extrinsic noise.
Our analysis constrains the type of stochastic gLV models able to grasp the properties of experimental time series. First, we show that there is a correlation between the noise color and the product of the mean abundance and the selfinteraction of a species. The noise color profile for such models will, therefore, depend on the steadystate. This implies that imposing equal selfinteraction strengths for all species, what can be done to ensure stability 8Fisher and Mehta, 201411Gibson et al., 2016, is incompatible with the properties of experimental time series. Second, from the differences between abundances at successive time points, we conclude that a model with mostly extrinsic (linear) noise agrees best with the experimental time series. Third, neutrality tests often result in the niche regime for time series generated by noninteracting species with noise. We, therefore, conclude that all stochastic properties of experimental time series are captured by a logistic model with large linear noise. However, interactions are not incompatible with those properties. This suggests using stochastic logistic models as null models to test for interactions. Our results go along the lines of the ones obtained by 15Grilli, 2019 which state that the stochastic logistic model can be interpreted as an effective model capturing the statistics of individual species fluctuations.
All codes are available online (see Additional files: Code).
Results
Properties of experimental time series
We study time series from different microbial systems: the human gut microbiome 4David et al., 2014, marine plankton 20MartinPlatero et al., 2018, and diverse body sites (hand palm, tongue, fecal) (2Caporaso et al., 2011; Figure 1A). A study of the different characteristics for a selection of these data is represented in Figure 1. The complete study of all time series can be found in Supplementary file 1: Analysis of experimental data. We propose a detailed description of the properties of the experimental time series. They fall essentially into two categories. The stability and rank abundance are tightly connected to the deterministic part of the equations while the differences between abundances at successive time points and noise color explain the stochastic behavior. The neutrality is more subtle and depends on the complete system.
# Load dataframes with experimental data
# MartinPlatero plankton data
df_ts = {}
path = 'Data/MartinPlatero/'
files = ['41467_2017_2571_MOESM5_ESM_MartinPlatero_Plankton_Eukarya.csv']
#['41467_2017_2571_MOESM4_ESM_MartinPlatero_Plankton_Bacteria.csv']
keys = ['plankton_eukarya']
#['plankton_bacteria']
for i, f in enumerate(files):
x = pd.read_csv(path+f, na_values='NAN', index_col=0)
x = x.iloc[:, :1] # delete last columns which contains details on the otu's
# only keep 200 most abundant species
sum_x = x.sum(axis='columns')
x = x[sum_x >= np.sort(sum_x)[200]]
x = x.T # species are in rows instead of columns
x.insert(0, 'time', [int(j[4:7]) for j in x.index]) # day
x = x.groupby('time').agg('mean').reset_index()
x.columns = ['time'] + ['species_%d' % j for j in range(1, len(x.columns))]
df_ts[keys[i]] = x
# David stool data
files = ['Data/Faust/25_timeseries/25_timeseries.txt']
keys = ['David_stool_A']
for i, f in enumerate(files):
x = pd.read_csv(f, na_values='NAN', delimiter='\t', header=None)
x = x.T
x.insert(0, 'time', range(len(x)))
x.columns = ['time'] + ['species_%d' % j for j in range(1, len(x.columns))]
df_ts[keys[i]] = x
# Caporaso body sites data
sites = ['F4_L_palm_L6', 'F4_tongue_L6']
for site in sites:
file = 'Data/Caporaso/' + site + '.txt'
key = 'Caporaso_' + site
x = pd.read_csv(file, delimiter='\t', skiprows=1, index_col=0, header=None)
#x = x[x.sum(axis='rows') > 0]
x.index = ['time'] + ['species_%d' % j for j in range(1, len(x.index))]
x = x.T
# only keep 200 most abundant species
if len(x.columns) > 201:
sum_x = x.sum(axis='rows')
sum_x['time'] = np.inf
sum_x.sort_values(ascending=False, inplace=True)
x = x[sum_x.index.tolist()[:201]]
x.columns = ['time'] + ['species_%d' % j for j in range(1, len(x.columns))]
df_ts[key] = x
#calculate noise color for each exp timeseries
df_ns = {}
keys = ['plankton_eukarya', 'David_stool_A',
'Caporaso_F4_L_palm_L6', 'Caporaso_F4_tongue_L6']
for i, key in enumerate(keys):
ts = df_ts[key]
df_ns[key] = noise_color(ts)['slope_linear']
#calculate width of distribution
df_disdx = {}
#keys = ['Caporaso_F4_L_palm_L6']
keys = ['plankton_eukarya', 'David_stool_A',
'Caporaso_F4_L_palm_L6', 'Caporaso_F4_tongue_L6']
def fit_ratio(x):
# ratios of succesive time points
x = x = [x1/x2 for x1, x2 in zip(x[:1], x[1:]) if x1 != 0 and x2 != 0 ]
if len(x) > 5:
a, b, c = stats.lognorm.fit(x, floc=0) # Gives the paramters of the fit
stat, pval = stats.kstest(x, 'lognorm', args=((a, b, c))) # get pvalue for kolmogorovsmirnov test
# (null hypothesis: ratios of succesive time points follow lognorm distribution)
return a, b, c, stat, pval
else:
return (np.nan, np.nan, np.nan, np.nan, np.nan)
count = 0
for i, key in enumerate(keys):
ts = df_ts[key]
dx_ratio = pd.DataFrame(index=ts.columns, columns=['s', 'loc', 'scale', 'ksstat', 'kspval'])
dx_ratio.drop('time', inplace=True)
for idx in dx_ratio.index:
fit_par = fit_ratio(ts[idx].values)
dx_ratio.loc[idx] = fit_par
if False and fit_par[1] > 0.5 and count < 10:
print(key, idx, fit_par[1])
print(x[:5])
x = ts[idx].values
x_transf = x[:1] / x[1:] # ratios of succesive time points
x_transf = x_transf[np.isfinite(x_transf)] # remove infinities
a, b, c, _, pval = fit_par
x_fit = np.logspace(1.5,1.5,100)
pdf_fitted = stats.lognorm.pdf(x_fit,a,b,c) #Gives the PDF
plt.figure()
plt.hist(x_transf, alpha=0.4, normed=True, bins = np.logspace(1.5,1.5,30))
plt.plot(x_fit, pdf_fitted, label='%.2f, %.2f, %.2f'%(a,b,c))
plt.xscale('log')
plt.legend()
plt.show()
count += 1
if count == 10:
break;
df_disdx[key] = dx_ratio
Box 1.
Definitions of the studied characteristics We study multiple characteristics of the dynamics of microbial communities.
We here define these characteristics. The labels (AF) denote the different figures of Figure 1 and Figure 4.
A. A time series represents the time evolution of the abundances of different species of the community.
B. The rank abundance distribution describes the commonness and rarity of all species. It can be represented by a rank abundance plot, in which the abundances of the species are given as a function of the rank of the species, where the rank is determined by sorting the species from high to low abundance. These curves can generally be fitted with power law, lognormal, or logarithmic series functions 18Limpert et al., 200124McGill et al., 20071Brown et al., 2002.
C. The noise color describes the distribution of the frequencies of the fluctuations of a time series of a species. It is defined by the slope of a linear fit through the power spectral density. White, pink, brown and black noise correspond to slopes around 0,–1, −2 and −3 respectively. The more negative the slope is—this corresponds to darker noise—the more structure there is in the time series 7Faust et al., 2018.
D. We study the mean absolute difference between abundances at successive time points as a function of the mean abundance . These values represent the jumps of the abundances from one time point to the next.
E. We measure the ratios of the abundances at two successive time points . The advantage of this method is that it captures the direction of a jump between two time points: for ratios higher than one the jump is positive, for ratios lower than one the jump is negative. The distribution of these ratios fits a lognormal curve with a mean at one as the fluctuations occur around steadystate and the width of the distribution tells how large the fluctuations of a time series are. The goodness of the fit is defined by the pvalue of the KolmogorovSmirnov test. Higher pvalues denote a better fit. We use the width as a characteristic and compare the widths of different species. Examples of the fitted lognormal curve can be found in Supplementary file 1: Supporting results.
F. The KullbackLeibler divergence measures how different the multivariate distribution of species abundances is from a distribution constructed under the assumption of ecological neutrality. The idea of the neutral covariance test is to compare the time series with a WrightFisher process. A WrightFisher process is a continuous approximation of Hubbell’s neutral model for a large and finite community. In particular, it tests the invariance with respect to grouping. More about the validity of these neutrality measures can be found in the Supplementary file 1: Supporting results.
The time series show fluctuations over time
The experimental time series show large fluctuations over time. We can ask the question whether the origin of this variation is biological or technical, and assume that most of the variation can be contributed to biological processes. This hypothesis is supported by the results of 29Silverman et al., 2018 for microbial communities of an artificial gut. Here, the biological variation becomes five to six times more important than the technical variation for the sampling interval of a day. Also, 15Grilli, 2019 shows the time correlation of experimental time series which is nonzero. In the case where the variation is mostly due technical errors, we expect to see no correlation. Because no experimental errorbars are available for most of the data and because we assume most variation has a biological origin, we did not consider the errors on the species abundances.
The abundance distribution is heavytailed
The first aspect of community modeling that has been widely studied during the last years is the stability of the steadystates. Large random networks tend to be unstable 22May, 1972. This problem is often solved by considering only weak interactions, sparse interaction matrices 23May, 2001 or by introducing higherorder interactions 14Grilli et al., 20179Gavina et al., 201828Sidhom and Galla, 2019. Although the stability of gLV models decreases with an increasing number of participating species, the stability only depends on the interaction matrix and not on the abundances 10Gibbs et al., 2018. The abundance distribution of the experimental data is heavytailed. This means that there are few common and many rare species. The distribution of the steadystate values can also be represented by a rank abundance curve (see Box 1B). Although the abundances show large fluctuations over time, the rank abundance remains stable (Figure 1B).
The differences between abundances at successive time points are large and linear with respect to the species abundance
Time series can be described by the differences between abundances at successive time points. We propose to focus on two specific representations of the information contained in those differences. First, we consider the mean absolute difference between abundances at successive time points as a function of the mean abundance (see Box 1D). For the experimental data, the relation between these variables is a monomial—this means that it is linear on the loglog scale (Figure 1D). The fact that the slope of this line is almost one hints at a linear nature of the noise.
Second, we examine the distribution of the ratios of the abundances at two successive time points (see Box 1E). The width of this distribution tells how large the fluctuations are. To measure this width, we fit the distribution with a lognormal curve for which the mean is fixed to be one as the fluctuations occur around steadystate. For most of the species of experimental data (except for the stool data), the fit of the distribution to a lognormal curve is good (Figure 1E). Furthermore, we notice that the distribution is wide—in the order of 1—and that the width does not depend on the mean abundance of the species (Figure 1E).
The noise color is independent of the mean abundance of the species
The noise of a time series can be studied by considering the distribution of the frequencies of the fluctuations. This distribution can be defined by its slope, which is interpreted as the noise color (see Box 1C). We notice that there is no correlation between the noise color and the mean abundance of the species for experimental time series (Figure 1C).
Experimental time series are in the niche regime
In neutral theory, it is assumed that all species or individuals are functionally equivalent. It is challenging to test whether a given time series was generated by neutral or niche dynamics. We use two definitions of neutrality measures: the KullbackLeibler divergence as used in 8Fisher and Mehta, 2014 and the neutral covariance test as proposed by 33Washburne et al., 2016 (see Box 1F). Both neutrality measures indicate that most experimental time series are in the niche regime (Figure 1F).
Reproducing properties of experimental time series from stochastic generalized LotkaVolterra models
We find that the aforementioned characteristics of experimental time series can be reproduced by stochastic logistic equations. We first explain how to choose the growth rate to obtain the heavytailed experimental abundance distribution. Next, we discuss how the noise color determines the selfinteraction of a species given its abundance and how the implementation of the noise determines the slope of the mean absolute increment and the mean abundance (such as in Figure 1D). In the end, by using the appropriate choice for the selfinteractions, growth rates, and noise implementation, we conclude that a stochastic logistic model can reproduce all the stochastic properties, including the niche regime for the neutrality tests although the model does not include any interactions.
The rank abundance distribution can be imposed by fixing the growth rate
Random matrix models do typically not give rise to heavytailed abundance distributions. Neither is it known which properties of the interaction matrix and growth rates are required to obtain a realistic rank abundance distribution. We can however enforce the desired rank abundance artificially by solving the steadystate of the gLV equations. Given the steadystate abundance vector and interaction matrix ω, we impose the growth rate . One model that results in heavytailed distributions is the selforganized instability model proposed by 30Solé et al., 2002.
For logistic models, the growth rate is equal to the product of the selfinteraction and mean abundance. The noise color and the width of the distribution of ratios depend on this product. To obtain given characteristics—a predefined noise color and width of the distribution of ratios —the choice of the growth rate will dictate the choice of the remaining free parameters, the sampling time step δt and the noise strength σ.
The noise color is determined by the mean abundance and the selfinteraction of the species
To study the noise color, we first consider a model where the species are not interacting. The noise color is independent of the implementation of the noise but depends on the product of the mean abundance and the selfinteraction of the species (Figure 2A). For noninteracting species, the growth rate equals the product of the selfinteraction and the steadystate abundance. Because we consider fluctuations around steadystate, the mean and the steadystate abundance are nearly equal and the xaxis of Figure 2A; Figure 2B; Figure 2C; can be interpreted as the growth rate. Also, the strength of the noise does not change its color (Figure 2C). A parameter that is important for the noise color is the sampling rate: the higher the sampling frequency the darker the noise becomes (Figure 2B). This is in agreement with the results of 7Faust et al., 2018. Darker noise corresponds to more structure in the time series. The more frequent the abundances are sampled the more details are visible and the underlying interactions become more visible. We conclude that the noise color is only dependent on the mean abundance, the selfinteractions, and the sampling rate. Figures of the dependence on the mean abundance and selfinteraction separately can be found in Supplementary file 1: Supporting results.
For interacting species, increasing the strength of the interactions makes the color of the noise darker in the high mean abundance range (Figure 2D; Figure 2E). Importantly, for interacting species with a lognormal rank abundance, the correlation between the noise color and mean abundance is preserved (Figure 2E). The data can be fit to obtain a bijective function between the product of the mean abundance and the selfinteraction, and the noise color. Assuming this model is correct, we can obtain an estimate for the selfinteraction coefficients given the mean abundance and noise color by fixing the sampling rate and the interaction strength. The uncertainty on the estimates is larger where the fitted curve is more flat (slopes of the power spectral density around −1.7 and 0), but many experimental values of the stool microbiome data lie in the pink region where the selfinteraction can be estimated for this model.
The implementation of the noise determines the correlation between the mean absolute increment and the mean abundance
Next, we study the differences between abundances at successive time points (see Figure 1D). From the results of the noise color, we can estimate the selfinteraction for the dynamics of the experimental data. We use the rank abundance and the selfinteraction inferred from noise color of the microbiome data of the human stool to perform simulations and calculate the characteristics of the distribution of differences between abundances at successive time points. We here assume that there are no interactions. More results for dynamics with interactions are in Supplementary file 1: Supporting results. We first study the correlation between the mean absolute difference between abundances at successive time points and the mean abundance . For linear multiplicative noise, the slope of the curve of the logarithm of the mean absolute difference between abundances at successive time points as a function of the logarithm of the mean abundance is one. For multiplicative noise that scales with the square root of the abundance, the slope is around 0.66 and for additive noise, the slope is zero. By combining both linear noise and noise that scales with the square root of the abundance, slopes with values between 0.6 and 1 can be obtained (Figure 3B). The slopes of experimental data range between 0.84 and 0.99, we therefore conclude that linear noise is a relatively good approximation to perform stochastic modeling of microbial communities.
path = 'results/width_ratios/'
df1 = pd.read_csv(path + 'width_lognormal_fit_1.csv')
df2 = pd.read_csv(path + 'width_lognormal_fit_1_interaction0.05.csv')
df3 = pd.read_csv(path + 'width_lognormal_fit_1_interaction0.1.csv')
df4 = pd.read_csv(path + 'width_lognormal_fit_1_interaction0.15.csv')
sigmas = [0.01, 0.1, 1.0]
cmap = mpl.cm.get_cmap('coolwarm') #viridis')
norm = mpl.colors.Normalize(vmin=0, vmax=0.21, clip=True)
mapper = mpl.cm.ScalarMappable(norm=norm, cmap='summer')
fig = plt.figure(figsize=(ELIFE.TEXTWIDTH,2.3))
gs_l = gridspec.GridSpec(2,1, height_ratios=[8,1], hspace= 0.8,
right=0.95, left=0.8, top=0.85, bottom=0.2)
gs_r = gridspec.GridSpec(2,2, height_ratios=[8,1], hspace= 0.8, wspace=0.05,
right=0.65, left=0.15, top=0.85, bottom=0.2)
ax_mat = fig.add_subplot(gs_l[0])
ax = fig.add_subplot(gs_r[0])
ax2 = fig.add_subplot(gs_r[1], sharey=ax)
ax_mat.text(0.2, 1.08, 'B', transform=ax_mat.transAxes,
fontsize=10, fontweight='bold', va='bottom', ha='right')
ax.text(0.2, 1.08, 'A', transform=ax.transAxes,
fontsize=10, fontweight='bold', va='bottom', ha='right')
ax_mat_cbar = fig.add_subplot(gs_l[1])
ax_legend = fig.add_subplot(gs_r[2])
ax_cbar = fig.add_subplot(gs_r[3])
df_slopes2 = pd.read_csv('results/slopes/slopes_equal_abundances.csv', index_col=0, na_values='NAN')
df_slopes2['slope'] = df_slopes2.iloc[:,2:12].mean(axis=1)
df_slopes2['slope_std'] = df_slopes2.iloc[:,2:12].std(axis=1)
df_slopes2.drop(['%d'%i for i in range(10)], axis=1, inplace=True)
slope = df_slopes2.drop(['implementation', 'interaction', 'slope_std'], axis=1)
std_slope = df_slopes2.drop(['implementation', 'interaction', 'slope'], axis=1)
slope = slope.groupby(['noise_lin', 'noise_sqrt']).agg('mean')
std_slope = std_slope.groupby(['noise_lin', 'noise_sqrt']).agg('mean')
slope = slope.unstack() #.iloc[:4, :4]
val = slope.values
mat = ax_mat.matshow(val, cmap='coolwarm', vmin=0.65, vmax=1.1)
ax_mat.set_xlabel(r'$\sigma_\mathregular{sqrt}$')
ax_mat.set_ylabel(r'$\sigma_\mathregular{lin}$')
ax_mat.set_xticks([0,1,2,3,4])
ax_mat.set_yticks([0,1,2,3,4])
ax_mat.set_xticklabels([0, 0.01, 0.1, 0.5, 1.0], rotation=90)
ax_mat.set_yticklabels([0, 0.01, 0.1, 0.5, 1.0])
cbar = plt.colorbar(mat, cax=ax_mat_cbar, orientation='horizontal')
cbar.set_label(r'Slope $\left< \mid x(t+\delta t)  x(t) \mid \right>$') #'Slope steps')
#for i, df, alpha in zip(range(4), [df1, df2, df3, df4], [0, 0.05, 0.1, 0.15]):
for i, df, alpha in zip(range(3), [df1, df3, df4], [0, 0.1, 0.15]):
for j, sigma in enumerate(sigmas):
w = df['sigma_%.2f_width_mean' % sigma]
pval = df['sigma_%.2f_pval' % sigma]
ss = df['ss']
col = mapper.to_rgba(alpha)
ax.plot(ss.values, w.values, c=col, alpha=0.3, marker='o', markersize=3, label=alpha if j==0 else "")
ax2.plot(ss.values, w.values, c='lightgrey', alpha=0.3) #, label=alpha if j==0 else "")
s_ax2 = ax2.scatter(ss.values, w.values,s=3, c = pval, cmap=cmap, vmin=0, vmax=1)
#c=col, label=alpha if j==0 else "")
x = 2e1 #ss.values[0]
y = w.values[0]
if i == 0:
ax.annotate(r"$\sigma_\mathregular{lin} =$ %.2f" % sigma, xy=(x, y), xytext=(0.2*x, 1.5*y))
ax2.annotate(r"$\sigma_\mathregular{lin} =$ %.2f" % sigma, xy=(x, y), xytext=(0.2*x, 1.5*y))
handles, labels = ax.get_legend_handles_labels()
ax_legend.legend(handles, labels, title='Interaction ' + r'strength $\alpha$',
loc=9, ncol=3, columnspacing=0.5)
ax_legend.axis('off')
cbar = plt.colorbar(s_ax2, cax=ax_cbar, orientation='horizontal')
cbar.set_label('pvalue lognormal fit')
ax.set_xscale('log')
#ax.set_xlabel(r'Mean abundace $\times$ selfinteraction', ha='right', x=1)
ax.set_ylabel('Width distribution \n of ratios \n' + r'$x(t + \delta t) / x(t)$') #'Width distribution \n ratios of time points')
ax.set_xlim([2e2,2e2])
ax2.set_ylim([5e4,1e0])
ax.set_yscale('log')
ax.grid()
ax2.set_xscale('log')
ax2.tick_params(axis="both", left=True, labelleft=False)
ax2.set_xlabel(r'Mean abundace $\times$ selfinteraction', ha='right', x=1)
#ax2.set_ylabel('Scale lognormal fit')
ax2.set_xlim([2e2,2e2])
ax2.set_yscale('log')
ax2.grid()
plt.show()
Differences between time points as a function of the noise.
(A) The width of the distribution of the ratios of abundances at successive time points increases for increasing strength of the noise. For sufficiently strong noise the distribution is well fitted by a lognormal function (high pvalues for the KolmogorovSmirnov test). (B) Correlation between the mean absolute differences between abundances at successive time points and the mean abundance for different strengths of the linear noise (σ_{lin}) and multiplicative noise that scales with the square root of the abundances (σ_{sqrt}). More specifically, the parameter represents the slope of the logarithm of the mean absolute difference between abundances at successive time points as a function of the logarithm of the mean abundance. Examples of such slopes are given by Figure 1D. Here, the slope ranges from 0.66 for noise that scales with the square root to one for linear noise.
The strength of the noise determines the width of the distribution of ratios
Next, we examine the distribution of the ratios of abundances at successive time points (see Box 1E). As expected, for significant noise, this distribution can be approximated by a lognormal curve and the width of the distribution becomes larger for increasing noise strength (Figure 3A). In order to have widths that are of the same order of magnitude as the ones of the experimental data, the noise must be sufficiently strong. Another way of increasing the width is through interactions, this effect is only moderate. These results are presented in Supplementary file 1: Supporting results.
Stochastic logistic models capture the properties of experimental time series
By using all previous results and imposing the steadystate of experimental data, we find that it is possible to generate time series with identical characteristics to the ones seen in the experimental time series (Figure 4). Furthermore, these time series can be generated without introducing any interaction between the different species, but their neutrality measures can still be in the niche regime (Figure 4F). Out of 100 simulations, 62 had a pvalue smaller than 0.05 for the neutral covariance test which means they are in the niche regime. The colors of the noise fix the selfinteraction values (Figure 4C), next the rank abundance distribution is imposed by calculating the growth vector (Figure 4B). The slope of the curve of the mean absolute difference between abundances at successive time points as a function of the mean abundance is one by using linear multiplicative noise (Figure 4D) and the width of the fluctuations is tuned by choosing a large noise size σ (Figure 4E). In most experimental time series, only the fractional abundances of species can be measured per time point and not the absolute ones. Because the total abundance of all species remains nearly constant in time series generated by a stochastic logistic equation, our results still hold for time series with fractional abundances (see Supporting results). Similar results can be obtained for models with interactions (see Supporting results), but we want to stress that interactions are not needed to reproduce the properties of experimental time series.
def mimic_experimental(interaction=0, connectivity=1, N=80):
x = df_ts['David_stool_A'].values[150:, :] # do not consider the traveling
experimental_abundance = np.sort(x[0, :])[::1]
experimental_noise_color = noise_color(x.T)
def find_ss_selfint(x):
amplitude = 2.10E+00
x0 = 2.87E+00
k = 1.14E+00
offset = 1.77E+00
return 10**(1/x0 * np.log(amplitude/(xoffset)  1) + k)
params = {}
steadystate = (experimental_abundance[:N]).reshape([N, 1])
selfints =  \
find_ss_selfint(
experimental_noise_color['slope_linear'].values[:N]) / steadystate.flatten()
# interaction
if interaction == 0:
omega = np.zeros([N, N])
else:
omega = np.random.normal(0, interaction, [N, N])
omega *= np.random.choice([0, 1], [N, N],
p=[1connectivity, connectivity])
np.fill_diagonal(omega, selfints)
params['interaction_matrix'] = omega
# no immigration
params['immigration_rate'] = np.zeros([N, 1])
# different growthrates determined by the steady state
params['growth_rate'] =  (omega).dot(steadystate)
params['initial_condition'] = np.copy(
steadystate) * np.random.normal(1, 0.1, steadystate.shape)
params['noise'] = 2.5
params['noise_linear'] = 2.5
params['noise_sqrt'] = 0 # 0.005*steadystate #*np.sqrt(steadystate)
np.save('testparams2.npy', params)
ts = Timeseries(params, noise_implementation=NOISE.LANGEVIN_LINEAR_SQRT,
dt=0.01, tskip=19, T=50.0, seed=int(time.time())).timeseries
ts.time = np.arange(1, len(ts)+1)
return ts
def figure_characteristics_timeseries(ts):
fig = plt.figure(figsize=(ELIFE.TEXTWIDTH, 3))
gs1 = gridspec.GridSpec(1, 3, width_ratios=[
2.5, 2.5, 1], wspace=0.5, hspace=0.3, left=0.1, right=0.95, top=0.95, bottom=0.62)
gs2 = gridspec.GridSpec(1, 3, wspace=0.7, hspace=0.4,
left=0.1, right=0.95, top=0.45, bottom=0.12)
# timeseries
ax = fig.add_subplot(gs1[0])
ax.text(0.2, 1.1, 'A', transform=ax.transAxes, fontsize=10,
fontweight='bold', va='top', ha='right')
ax.grid()
PlotTimeseriesComparison([ts], composition=['ts'], vertical=False, fig=ax)
ax = fig.add_subplot(gs1[1])
ax.text(0.2, 1.1, 'B', transform=ax.transAxes, fontsize=10,
fontweight='bold', va='top', ha='right')
ax.grid()
# , ffig = 'figures/interaction_rescaled_model.png')
PlotTimeseriesComparison([ts], composition=['ra'], fig=ax)
ax.set_ylim([1e2, 1e5])
ax = fig.add_subplot(gs1[1], frameon=False)
ax.tick_params(left=False, labelleft=False,
bottom=False, labelbottom=False)
ax.text(0.5, 1.1, 'C', transform=ax.transAxes, fontsize=10,
fontweight='bold', va='top', ha='right')
sub_gs = gs1[0, 1].subgridspec(4, 1,
height_ratios=[1.5, 1, 1, 1.5], hspace=0.3)
ax_KL = fig.add_subplot(sub_gs[1])
ax_NCT = fig.add_subplot(sub_gs[2])
# , ffig = 'figures/interaction_rescaled_model.png')
PlotTimeseriesComparison([ts], composition=['nn'], fig=[ax_KL, ax_NCT])
# characteristics
for i, (char, letter) in enumerate(zip(['nc', 'dx', 'disdx'], ['D', 'E', 'F'])):
ax = fig.add_subplot(gs2[i])
ax.text(0.3, 1.1, letter, transform=ax.transAxes,
fontsize=10, fontweight='bold', va='top', ha='right')
ax.grid()
# , ffig = 'figures/interaction_rescaled_model.png')
PlotTimeseriesComparison([ts], composition=[char], fig=ax)
if char == 'disdx':
ax.set_ylim([1e2, 1e2])
ax.set_ylabel('Width distribution \n of ratios \n' +
r'$x(t + \delta t) / x(t)$')
elif char == 'dx':
ax.set_ylabel('Difference \n time points \n' +
r'$\left< \mid x(t+\delta t)  x(t) \mid \right>$')
# fig.align_labels()
#KL = np.zeros(100)
#NCT = np.zeros(100)
if True:
ts = mimic_experimental(interaction=0)
figure_characteristics_timeseries(ts)
plt.show()
findfont: Font family ['Open Sans'] not found. Falling back to DejaVu Sans.
A stochastic logistic model is able to reproduce the different characteristics of the noise.
(A) Time series. (B) A rank abundance that remains stable over time. (C) Results of the neutrality test in the niche regime. (D) Noise color in the whitepink region with no dependence on the mean abundance. (E) The slope of the mean absolute difference between abundances at successive time points is around 1. (F) The width of the distribution of the ratios of abundances at successive time points is in the order of 1 and independent of the mean abundance.
Discussion
Recent research has focused on different aspects of experimental time series of microbial dynamics, in particular the rank abundance distribution, the noise color, the stability, and neutrality. Within the framework of stochastic generalized LotkaVolterra models, we studied the influence of growth rates, interactions between species, and the different sources of stochasticity on the observed characteristics of the noise and on neutrality. Our observations are:
 Even when we consider the case without interactions between species, the result of the neutrality test on the time series is often niche. We should, therefore, be careful in the interpretation of the results of neutrality tests.
 For a given sampling step δt, the noise color depends on the product of the selfinteraction and the mean abundance, which for noninteracting species reduces to a dependence on the growth rate. Assuming the model can be used for microbial communities, the selfinteraction coefficients can be estimated given the mean abundance, noise color, and sampling rate. Low sampling rates result in larger errors (Figure 2B). For sparsely sampled experimental data, the standard deviation of the selfinteraction inferred using the noise color will be larger. For the experimental time series (plankton, gut, and human microbiome) the selfinteraction strengths range over several orders of magnitude. The convention of equalling all selfinteractions to −1 used in several studies 8Fisher and Mehta, 201411Gibson et al., 2016, cannot be adopted for stochastic models of communities with a heavytailed abundance distribution.
 The exponent of the mean absolute differences between abundances at successive time points with respect to the mean abundances is slightly smaller than one for experimental time series. Linear multiplicative noise results in a value of one, square root noise results in lower values (0.6). A mix of linear and square root noise can result in slopes with intermediate values.
 A large multiplicative linear noise is in agreement with both the distribution of the ratios of abundances at successive time points and the relation between the differences between abundances at successive time points and mean values.
To conclude, characteristics of experimental time series, from plankton to gut microbiota, can be reproduced by stochastic logistic models with a dominant linear noise. We expect, however, that for higher sampling rates, modeling the interactions between microbes would be necessary to explain the properties of the time series. For gut microbial time series, the system is sampled only once a day and therefore dominated by the noise in the growth terms corresponding to a linear noise.
Predictive models for the dynamics of microbial communities will certainly require a more indepth description of the system. Nutrients and spatial distribution of microbes should play a role to dictate the evolution of the community, as well as the interaction with the environment. Synthetic microbial communities are currently being developed and will hopefully provide a more comprehensive view on the complexity of microbial communities 31Vrancken et al., 2019.
Materials and methods
Modeling generalized LotkaVolterra equations
In a microbial community different species interact because they compete for the same resources. Moreover, they produce byproducts that can affect the growth of other species. Depending on the nature of the byproducts, harmful, beneficial, or even essential, the interaction strength will be either negative or positive. To describe the dynamics of interacting species, one can use the generalized LotkaVolterra equations:
where x_{i}, λ_{i} and g_{i} are the abundance, the immigration rate, and the growth rate of species i respectively, and is the interaction coefficient that represents the effect of species on species i. The diagonal elements of the interaction matrix , the socalled selfinteractions, are negative to ensure stable steadystates. The offdiagonal elements of the interaction matrix are drawn from a normal distribution with standard deviation α (). The gLV equations only consider pairwise effects and no saturation terms, or other higherorder terms. Due to this drawback, these models sometimes fail to predict microbial dynamics 25Momeni et al., 201717Levine et al., 2017. However, they are among the most simple models for interacting species and therefore widely studied and used. Noninteracting species can be described by the logistic model, which is a special case of the gLV model obtained by setting all offdiagonal elements of the interaction matrix to zero.
Implementations of the noise
There exist two principal types of noise: intrinsic and extrinsic noise. Extrinsic noise arises due to external sources that can alter the values of the different variables: the immigration rate and growth rate fluctuate in time through colonization of species or a changing flux of nutrients. These processes give rise to additive and linear multiplicative noise respectively. The remaining parameters, inter and intraspecific interactions can also, change depending on the environment. The formulation of this noise is more subtle (used in 34Zhu and Yin, 2009). Intrinsic noise is due to the discrete nature of individual microbial cells. Thermal fluctuations at the molecular level determine the fitness of the individual cells. Therefore, cell growth, cell division, and cell death can be considered as stochastic Poisson processes. For large numbers of microbes, these fluctuations will be averaged out.
We first consider the extrinsic noise. If the time series is calculated by , the implementation of the linear multiplicative noise is as follows,
where dW is an infinitesimal element of a Brownian motion defined by a variance of dt (). Changes in immigration rates of microbial species can be modeled with additive noise,
with . Our main motivation is to model the gut microbiome in the colon. Here, we ignore the immigration of species for two reasons. First, the number of microbes in the colon is orders of magnitude larger than the number of microbes in the other parts of the gut 19Marteau et al., 200113Gorbach et al., 1967—therefore, the flux of incoming microbes in the colon is small. Second, we only consider systems around steadystate, for which we assume immigration does not play an important role. For perturbed systems, which are far from equilibrium, immigration rates cannot be ignored. Ignoring immigration may be too restrictive for some microbial systems such as the skin microbiome or plankton.
To derive the form of intrinsic noise in generalized LotkaVolterra equations, we can consider every species abundance making a random walk in one dimension. The average displacement is zero and the variance of displacement is the sum of the rate of growth (jumping to the right) and the rate of death (jumping to the left). For the generalized LotkaVolterra equations, this results in a noise term
with ω the interaction matrix and where functions f and h each decouple the growth and death terms. In the generalized LotkaVolterra model no difference is made between negative interactions as a result of slowing down the growth rate or increasing the death rate, only the resulting net rates are used. This distinction must however be made to implement the intrinsic noise for gLV. In our analysis, we use the simpler logistic models where the resulting variance of the noise is proportional to the square root of the abundance . One must be careful not to use this noise for values that are smaller than one because this derivation relies on Poisson statistics which is defined for integer numbers.
We implement the intrinsic noise by a term that scales with the square root of the species abundance 32Walczak et al., 20128Fisher and Mehta, 2014,
with again an infinitesimal element of a Brownian motion defined by a variance of dt (). The size of this noise is determined by the cell division (g^{+}) and death rates (g^{}) separately, which are in our model combined to one growth vector (, ), for large division and death rates the intrinsic noise will be larger.
To sum up, we focus on linear multiplicative noise because: (a) extrinsic noise is dominant as microbial communities contain a very large number of individuals and (b) we ignore the immigration of individuals in our analysis.
We verified that our analysis is robust with respect to the multiple possibilities for the discretization of these models. We also compare our populationlevel approach with individualbased modeling approaches. Details can be found in the Supplementary file 1: Supporting results.
Neutrality measures
There is no consensus on the definition of neutrality. In general, ecosystems are considered neutral if the dominating cause of fluctuations is random birth and death processes and not fitness advantages of species.
Different neutrality measures focus on different aspects of neutrality. The KullbackLeibler divergence verifies whether all species are equal (equal abundances and equal covariances). The neutrality covariance test studies the grouping invariance of species in time series.
Given two distributions P and Q, the KullbackLeibler divergence is defined as
where is the expectation value using the probabilities of distribution P. The density function of a multivariate Gaussian distribution is
where μ and K are the mean and covariance matrix of the distribution respectively. The KullbackLeibler divergence for two multivariate Gaussian distributions in is 6Duchi, 2007
For every time series, we can calculate the mean μ and covariance matrix K, and define values and for a corresponding neutral time series in which all species are equal 8Fisher and Mehta, 2014. The distance to neutrality can thus be calculated by computing the probability distribution of the original time series P and the associated neutral distribution with mean values and and with S the number of species.
The neutral covariance test was designed by 33Washburne et al., 2016. We used a python translation of the code developed by this author.
Noise color
The color of the noise in a time series is determined by the slope of the power spectral density in a loglog scale. This slope can be determined by a linear fit through the spectrum. A different technique to estimate this slope has been put forward by 7Faust et al., 2018. There, it is argued that the power spectral density does not have a constant slope and that, therefore, a nonlinear curve must be fitted. They choose a spline fit and consider the minimal value of its derivative to be the value of the noise color. Because the minimal value of the slope of the fit is taken, the noise color tends to be darker when using this technique. For our time series, however, we see that the spline fit only deviates from the linear fit for low frequencies (Figure 5). We ignore the low frequencies for fitting because of the windowing effect. Therefore, we opt for a linear fit after omitting the values for low frequencies (one order of magnitude of the lowest frequencies).
ts = mimic_experimental(interaction=0.02, connectivity=0.1, N=50)
figure_characteristics_timeseries(ts)
plt.show()
The noise color of time series (A) is determined by the slope of the power spectral density (B).
This slope can be measured through a linear fit of all values (dashed), a linear fit through the higher frequency range (solid line) or by performing a spline fit (dotted). A linear fit through all frequencies can be influenced by the windowing effect for low frequencies and the spline fit can make the slope steeper at the low frequencies and result in a darker noise as can be seen for the purple curves. The values of the noise color determined by the different techniques are given in the legend. Therefore, in our work, we opt for the linear fit with a cutoff for low frequencies.
The correspondence between the colors and slopes is here:
Slope  Color 

0  white 
1  pink 
2  brown 
3  black 
References

 JHBrown
 VKGupta
 BLLi
 BTMilne
 CRestrepo
 GBWest

 JGCaporaso
 CLLauber
 EKCostello
 DBergLyons
 AGonzalez
 JStombaugh
 DKnights
 PGajer
 JRavel
 NFierer
 JIGordon
 RKnight

 KZCoyte
 JSchluter
 KRFoster

 LADavid
 ACMaterna
 JFriedman
 MICamposBaptista
 MCBlackburn
 APerrotta
 SEErdman
 EJAlm

 LDescheemaeker
 SdeBuyl

 JDuchi

 KFaust
 FBauchinger
 BLaroche
 SdeBuyl
 LLahti
 ADWashburne
 DGonze
 SWidder

 CKFisher
 PMehta

 MKAGavina
 TTahara
 KITainaka
 HIto
 SMorita
 GIchinose
 TOkabe
 TTogashi
 TNagatani
 JYoshimura

 TGibbs
 JGrilli
 TRogers
 SAllesina

 TEGibson
 ABashan
 HTCao
 STWeiss
 YYLiu

 JAGilbert
 RAQuinn
 JDebelius
 ZZXu
 JMorton
 NGarg
 JKJansson
 PCDorrestein
 RKnight

 SLGorbach
 LNahas
 LWeinstein
 RLevitan
 JFPatterson

 JGrilli
 GBarabás
 MJMichalskaSmith
 SAllesina

 JGrilli

 SPHubbell

 JMLevine
 JBascompte
 PBAdler
 SAllesina

 ELimpert
 WAStahel
 MAbbt

 PMarteau
 PPochart
 JDoré
 CBéraMaillet
 ABernalier
 GCorthier

 AMMartinPlatero
 BCleary
 KKauffman
 SPPreheim
 DJMcGillicuddy
 EJAlm
 MFPolz

 TJMatthews
 RJWhittaker

 RMMay

 RMMay

 BJMcGill
 RSEtienne
 JSGray
 DAlonso
 MJAnderson
 HKBenecha
 MDornelas
 BJEnquist
 JLGreen
 FHe
 AHHurlbert
 AEMagurran
 PAMarquet
 BAMaurer
 AOstling
 CUSoykan
 KIUgland
 EPWhite

 BMomeni
 LXie
 WShou

 JRosindell
 SPHubbell
 RSEtienne

 RSender
 SFuchs
 RMilo

 LSidhom
 TGalla

 JDSilverman
 HKDurand
 RJBloom
 SMukherjee
 LADavid

 RVSolé
 DAlonso
 AMcKane

 GVrancken
 ACGregory
 GRBHuys
 KFaust
 JRaes

 AMWalczak
 AMugler
 CHWiggins
 XLiu
 M.DBetterton

 ADWashburne
 JWBurby
 DLacker

 CZhu
 GYin