Ecological Forecasting

· โ˜• 9 min read · โœ๏ธ Hoontaek Lee
๐Ÿท๏ธ
  • #Book Review
  • #Research
  • #2020
  • Intro

    Near-term Ecological Forecasting Initiative (NEFI) Summer Course 2020.

    ์ฑ… ์ €์ž์ด์ž ๋ณด์Šคํ„ด ๋Œ€ํ•™๊ต(Boston University)์— ์žฌ์ง์ค‘์ธ Dietze ๊ต์ˆ˜๋Š” ๋งค๋…„ Bayesian application to ecological forecasting์„ ๊ฐ€๋ฅด์น˜๋Š” summer course๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๋‹ค. NEFI 2019์—๋Š” ์„ ๋ฐœ๋˜์ง€ ๋ชปํ–ˆ์ง€๋งŒ, ์˜ฌํ•ด๋Š” ๊ณ ๋ง™๊ฒŒ๋„ ๋‚˜๋ฅผ ์ดˆ๋Œ€ํ•ด์คฌ๋‹ค.

    ์ด ์ฑ…์„ ๊ต์žฌ๋กœ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ๊ฐ€์„œ ์ž˜ ๋ฐฐ์šฐ๋ ค๋ฉด ํ•œ ๋ฒˆ ์˜ˆ์Šต์€ ํ•ด์•ผ ํ•˜์ง€ ์•Š์„๊นŒ…ํ•ด์„œ ๊ตฌ์ž…ํ–ˆ๋‹ค.

    ๋‚ด toolbox์— ๋‹ด์„ ์ˆ˜ ์žˆ์„๋งŒํ•œ ์ฃผ์ œ์ด๋‹ˆ ์ž˜ ๊ณต๋ถ€ํ•ด๋‘๋ฉด ๋„์›€์ด ๋งŽ์ด ๋˜๊ฒ ๋‹ค. ์ฑ•ํ„ฐ๋งˆ๋‹ค ์š”์•ฝ๋„ ์ž˜ ๋ผ ์žˆ๊ณ , Github์— exercise ์ฝ”๋“œ๋„ ์ •๋ฆฌ๋ผ์žˆ๊ณ , ์ค‘๊ฐ„์ค‘๊ฐ„ ๊ธฐ์–ตํ•ด๋‘˜๋งŒํ•œ ํ‘œํ˜„๋„ ๊ฝค ์žˆ๋‹ค.

    ์ด๋Ÿฐ ์ฑ…์„ ์“ธ ๋‚ ์ด ์˜ฌ๊นŒ.

    Erratum

    • 30 p. 8th line from the below: “we be able to”

    1. Introduction

    1.1. Why forecast?

    ๊ธฐํ›„๋ณ€ํ™”์™€ ๋”๋ถˆ์–ด ์ƒํƒœ๊ณ„ ์—ญ์‹œ ๊ธ‰๊ฒฉํžˆ ๋ณ€ํ•˜๊ณ  ์žˆ๋‹ค. ์ •์ฑ… ๊ด€๋ จ ์˜์‚ฌ๊ฒฐ์ •์„ ๋•๊ณ  ์ƒํƒœ๊ณ„ ๋ณ€ํ™” ์ด๋ฉด์„ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ๋„๊ตฌ๋กœ์„œ EF๊ฐ€ ํ•„์š”

    Decision making & Understanding ecological processes

    • We live in an era of rapid and interacting changes in the natural world
    • Ecological questions about the future status are being asked by policymakers
    • How do we, as ecologist, provide the best available scientific predictions?
    • Quantitative tests are at the heart of the scientific method for advancing our knowledge

    Capture uncertainties

    • Coarse understanding, noisy data, and site-specific dynamics defy our understanding
    • Ecological forecasts need to capture these uncertainties to be more predictive science

    1.2. The informatics challenge in forecasting

    EF๋Š” ๊ฒฐ๊ตญ ๊ด€์ธก ์ž๋ฃŒ๋ฅผ ๋‹ค๋ฃจ๋Š” ์ผ์ผ ๊ฒƒ์ธ๋ฐ, ์ด๋Š” ๋Œ€๋ถ€๋ถ„ ์ƒํƒœํ•™์ž๊ฐ€ ๋ฏธํกํ•œ ๋ถ€๋ถ„์ด๊ธฐ์—(๊ต์œก๊ณผ์ •) ์ด ์ฑ…์„ ํ†ตํ•ด ๋‹ค๋ฃจ๋ ค ํ•œ๋‹ค.

    • Ecological forecasts may end up being devoted to data management and processing
    • The informatics will be leveraged when performing syntheses and making forecasts

    1.3. The model-data loop

    ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ๋งž์ถ”๊ณ ,

    ๊ทธ๋ ‡๊ฒŒ ๋งž์ถ˜ ๋ชจ๋ธ์˜ uncertainty ๋ถ„์„์„ ํ†ตํ•ด ์–ด๋–ค ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•œ์ง€ ์ฐพ๊ณ ,

    ์ƒˆ๋กœ ์–ป์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•ด์„œ ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ๋งž์ถ”๊ณ 

    • Forecasting the future status of real systems requires data about thos systems
    • The model-data loop refers to the idea of iteratively using data to constrain models and then using models to determine what new data is most needed
    • The model-data loop applies regardless of the complexity of the model

    1.4. Why Bayes?

    Bayes๋Š” ์‚ฌ์ „ํ™•๋ฅ  ์„ค์ • -> ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ -> ์‚ฌ์ „ํ™•๋ฅ  ์กฐ์ •(=์‚ฌํ›„ํ™•๋ฅ  ๊ณ„์‚ฐ)์˜ ๊ณผ์ •์„ ๊ฑฐ์น˜๋Š”๋ฐ,

    ์ด๊ฒƒ์ด the model-data loop idea์™€ ํก์‚ฌํ•˜๋‹ค.

    The Bayesian approach

    • allows us to treat all the terms in a forecast as probability distributions
    • is able to update predictions as new data becomes available
    • tends to be flexible and robust, allowing us to deal with the complexity of real-world data and to forecast with relatively complex models.

    1.5. Models as scaffolds

    ์—ฌ๊ธฐ์ €๊ธฐ์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์Ÿ์•„์ง€๊ณ  ์žˆ๋Š”๋ฐ ์„œ๋กœ ์ธก์ • ๋ฐฉ๋ฒ•, ์ธก์ • ์Šค์ผ€์ผ, ๋ฐฉ๋ฒ•, ๊ฐ€์ •, ๋ชฉํ‘œํ•˜๋Š” process (์ฆ๋ฐœ์‚ฐ vs ์ฆ๋ฐœ), ๋“ฑ ๊ฐ™์€ ๋ถ„์•ผ์—์„œ๋„ ์„œ๋กœ ๋‹ค๋ฅธ ์ ์ด ๋„ˆ๋ฌด ๋งŽ๋‹ค.

    ์ด ๋ฐ์ดํ„ฐ๋“ค์„ ํ†ตํ•ฉํ•˜๋Š” ๋ฐ model์ด ๊ฐ€๊ต(scaffold) ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

    Model์„ ์ด์šฉํ•ด ์—ฌ๋Ÿฌ data๋ฅผ ํ†ตํ•ฉํ•˜๋ฉด์„œ model-data loop idea๋ฅผ ์‹คํ˜„ –> models as scaffold approach

    Three modeling approaches:

    1. Forward modeling: use observed inputs to generate output sets
    2. Inverst modeling: use observations of model outputs to infer the most likely inputs
    3. Model as scaffold: use models and data together to constrain estimates of different ecosystem variables
      • Different data sets may not be directly comparable to one another (scales, assumptions, target processes, …), but may all be comparable to the model
      • Combining data and models is a fundamentally data-driven exercise

    1.6. Case studies and decision support

    ์—ฐ๊ตฌ์šฉ EF์™€ Decision support๋Š” ๋‹ค๋ฅธ ๋ฌธ์ œ.

    Decision support๋Š”

    ์ƒํƒœ๊ณ„์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ์„ ํ–‰๋ผ์•ผ ํ•˜๊ณ 

    value, trade-offs ๋“ฑ ๊ฐ€์น˜ํŒ๋‹จ์ด ์ถ”๊ฐ€๋œ๋‹ค.

    ๋•Œ๋ฌธ์—, ์ฑ… ํ›„๋ฐ˜๋ถ€์— ๋‹ค๋ฃฐ ์˜ˆ์ •์ด๋‹ค.

    Prediction vs. Projection

    • Prediction: probabilistic forecasts based on current trends and conditions
    • Projection: probabilistic forecasts driven by explicit scenarios

    2. From models to forecasts

    2.1. The traditional modeller’s toolbox

    ์ง€๊ธˆ๊นŒ์ง€ ๋‰ดํ„ด์‹ ๊ฒฐ์ •๋ก ์  ์‚ฌ๊ณ ์— ์ž…๊ฐํ•œ ๊ด€์ ์—์„œ ๊ต์œก์„ ๋ฐ›์•„์™”๋‹ค.

    ํ•˜์ง€๋งŒ ์ด์ œ๋ถ€ํ„ฐ ๋ชจ๋“  ๊ฒƒ์„(๋ฐ์ดํ„ฐ, ๋ชจ๋ธ) random variable๋กœ ๊ฐ„์ฃผํ•˜๊ณ  ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๋‘๋Š” ์‚ฌ๊ณ ๋ฅผ ํ•ด์•ผ ํ•œ๋‹ค.

    ์šฐ๋ฆฌ์˜ ์ง€์‹, ์ž์—ฐ์˜ ๋ณ€ํ™” ๋ชจ๋‘ uncertainties๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

    • The core difference between forecasting and theory ins’t a specific set of quantitative skills, but rather in the way we see the world (deterministic and Newtonian way).
    • The traditional approach emphasizes equilibria, the lack of change in a system over time, and stability, the tendency for a system to return to equilibrium when perturbed.
    • Traditional ecological theory does a bad job of making connections between models and measurements.
    • Modern ecosystems are rarely in equilibrium, and knowing the current trajectory of a system is often more important to policy and mangement than an asymptotic equilibrium.
    • Although stability also has a large impact on how well we can forecast, we require to shift the focus of our modeling to be less deterministic and to think probabilistically about both data and models.

    2.2. Example: The logistic growth model

    ๋กœ์ง€์Šคํ‹ฑ ์„ฑ์žฅ ๋ชจํ˜•์ด ๋งŽ์ด๋“ค ์ต์ˆ™ํ• ํ…Œ๋‹ˆ, ์ด๊ฑธ ์˜ˆ์‹œ๋กœ uncertainty ์ข…๋ฅ˜๋ฅผ ์†Œ๊ฐœํ•˜๊ฒ ๋‹ค.

    • Logistic growth model: dN/dt = rN(1-N/K)

      where N is the population size, K is the carrying capacity, and a is a constant.

    2.3. Adding sources of uncertainty

    obervation error

    parameter uncertainty

    initial condition uncertainty

    process variability

    others: model selection uncertainty, driver and scenario uncertainty, numerical approximation errorS

    • uncertainty: our ignorance about a process and should decrease asymptotically with sample size
    • variability: variation in the process itself that are not captured by a model

    2.3.1. Observation error

    ์กฐ๊ธˆ์ด๋ผ๋„ ๋ฌด์กฐ๊ฑด ์กด์žฌํ•œ๋‹ค.

    ๋ชจ๋ธ๋งํ•˜๋ ค๋Š” ๋Œ€์ƒ ์ž์ฒด์—๋Š” ์˜ํ–ฅ ์—†๋‹ค(๊ด‘ํ•ฉ์„ฑ ์ง„ํ–‰๊ณผ ์ƒ๊ด€ ์—†์ด ์šฐ๋ฆฌ๊ฐ€ ์ธก์ •ํ•˜๋ฉด์„œ ๋งŒ๋“œ๋Š” ์˜ค์ฐจ).

    ๊ฒฐ์ •๋ก ์  ๊ด€์  : residual error = observation error (์ž์—ฐ์˜ ์ƒํƒœ๋Š” ๊ฒฐ์ •๋ผ์žˆ์œผ๋ฏ€๋กœ ์˜ค์ฐจx).

    • The mean of zero implies that the observations are unbiased
    • The key to understanding observation error is to realize that it does not affect the underlying process itself.
    • Most statistical models deem residual error as being 100% observation error
    • Increasing sample size does not reduce it; It does not disappear asymptotically
    • If observation error was the only source of uncertainty, then this forecast would have zero uncertainty because observation error doesn’t affect the system itself.

    2.3.2. Parameter uncertainty

    ๋ชจ๋ธ์˜ ๋ชจ์ˆ˜ ๊ฐ’์„ ๊ฒฐ์ •ํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์˜ค์ฐจ์ด๋‹ค.

    ๋ชจ๋ธ๋งํ•˜๋ ค๋Š” ๋Œ€์ƒ ์ž์ฒด์—๋Š” ์˜ํ–ฅ ์—†๋‹ค.

    • is the uncertainty about the true values of all the coefficients in a model
    • treats the underlying model as deterministic; just the model calibration arouses errors

    Box 2.1. Covariance

    Variance is a special case of covariance (covariance between a variable and itself).

    Any covariance matrix can be decomposed into a diagonal matrix of standard deviations for each X and a matrix of correlation coefficients: DRD

    independency of the two variables can be determined by their covariance matrix (independence -> off-diagonal elements will be zero).

    Covariances are critical to the “model as scaffold” idea where we use correlations among the variables to constrain one variable based on observations of another.

    2.3.3. Initial conditions

    ์ดˆ๊ธฐ์กฐ๊ฑด ํ˜น์€ state variables๋Š” ๊ณ„์˜ (๋ณ€ํ™” ๋ฐฉํ–ฅ์ด ์•„๋‹Œ) ํ˜„์žฌ ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’์ด๋‹ค.

    Equilibria์—๋Š” ํฐ ์˜ํ–ฅ์ด ์—†์ง€๋งŒ, ์•ˆ์ •์ƒํƒœ ๋„๋‹ฌ ์ „์—๋Š” ์˜ํ–ฅ์ด ํฌ๋‹ค.

    Equilibria๋ฅผ ๋…ผํ•˜๋Š” theoretical ecology์™€๋Š” ๋‹ฌ๋ฆฌ forecasting์€ trajectory(์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ•ด๊ฐˆ ๊ฒƒ์ธ์ง€)๋ฅผ ๋‹ค๋ฃจ๊ธฐ ๋•Œ๋ฌธ์— state variables์— ๊ด€์‹ฌ์ด ํฌ๋‹ค.

    ๋ชจ๋ธ๋งํ•˜๋ ค๋Š” ๋Œ€์ƒ ์ž์ฒด์—๋Š” ์˜ํ–ฅ ์—†๋‹ค.

    • The initial conditions specify the starting values of the system’s state variables
      • state variables are those that give a snopshot of the system’s properties at any single point in time.
      • variables describing the changes of those states are typically not treated as state variables
      • (pool sizes, age structure, species composition) vs (recruitment, mortality, NPP)
    • In forecasting we are often much more interested in the current trajectory of a system than its theoretical asymptote.
    • More complex models often have multiple state variables, requiring the estimation of maps of multiple quantities.

    Box 2.2. Chaos

    What defines chaotic systems is their sensitivity to initial conditions: A population forecast with slightly perturbed initial condition will have a different trajectory from the original.

    Lyapunov exponent is a parameter that describes the rate of change of the difference between the trajectories and the different trajectories will diverge or converge.

    Predictability์™€ ๊ด€๋ จ๋œ๋‹ค๋Š” ์ด์•ผ๊ธฐ๊ฐ€ ๋‚˜์˜ค๋Š”๋ฐ, ์ž˜ ๋ชจ๋ฅด๊ฒ ๋‹ค.

    An important question is that how chaotic dynamics impact out ability to make forecasts.

    • many of the tools ecologists are using for forecasting are borrowed from atmospheric science, where the initial condition uncertainty dominates the problems.
    • However, for ecological forecasting, more important question is how chaotic dynamics impact our ability to make forecasts.
    • determining what the fundamental nature of the ecological forecasting problem is will in turn affects the tools we use and how successful we are.

    2.3.4. Process error

    ๋ชจ์˜ ๋Œ€์ƒ ์ž์ฒด์—์„œ ๋ฐœ์ƒํ•œ ๋ชจ๋“  uncertainties๋ฅผ ์˜๋ฏธํ•œ๋‹ค(๋ชจ๋ธ์ด ์ž˜๋ชป ๊ณ„์‚ฐํ•œ ๊ฒƒ, ๋ชจ๋ธ์— ํฌํ•จํ•˜์ง€ ๋ชปํ•œ process์—์„œ ๋ฐœ์ƒํ•œ ๊ฒƒ ๋“ฑ).

    ๋ชจ์˜ ๋Œ€์ƒ ์ž์ฒด์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค.

    ๊ฒฐ์ •๋ก ์  ๊ด€์ ์˜ ๋ชจ๋ธ์—์„œ๋Š” process error๋„ observation error๋กœ ๊ฐ„์ฃผํ•œ๋‹ค.

    • Process error encompasses all of the sources of true variability in the underlying process that are not captured by the model.

      • Demographic and environmental stochasticity are examples.

        (c.f. Stochastic models draw and add random numbers to the dynamics of a model to represent the discrete nature of births and deaths and the unpredictable year-to-year variability of the climate)

    • Process errors affet the underlying model dynamics.

    • Process errors can propagate forward in time, outward in space, and across any other unit being investigated and can be partitioned in various ways, including additive, independent, and autocorrelated ways.

    2.3.5. Other sources of uncertainty

    ๋ชจ๋ธ ์„ ํƒ์—์„œ ๋ฐœ์ƒ: ์•™์ƒ๋ธ” ์‚ฌ์šฉํ•ด์„œ ์ค„์ธ๋‹ค.

    ์ž…๋ ฅ ์ž๋ฃŒ๊ฐ€ ๊ฐ€์ง„ ์˜ค์ฐจ: observation error์ด์ง€๋งŒ ๋ชจ๋ธ์— ์‚ฌ์šฉ๋˜๋Š” ์ž๋ฃŒ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์—๋„ ์˜ฎ๊ฒจ๊ฐˆ ์ˆ˜ ์žˆ๋‹ค.

    Numerical approximation errors: ๋ณดํ†ต์€ ๋ณ„๋กœ ํฌ์ง€ ์•Š์ง€๋งŒ ์ƒ๋‹นํ•œ ๊ฒฝ์šฐ๋„ ์žˆ๋‹ค.

    • Model choice:
      • increasing the complexity of a model increases parameter error but reduces process error.
      • There is no guarantee that any of models being considered are correct.
      • It is generally beneficial to work with a set of models (ensemble), essentially treating model structure like a discrete random variable.
    • Driver uncertainty or boundary condition uncertainty: Uncertainty in model inputs
    • Scenario uncertainty
    • Numerical approximation

    2.4. Thinking probabilistically

    ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ random variable ์ทจ๊ธ‰ํ•˜๊ณ  ๊ฐ๊ฐ์— ์ ๋‹นํ•œ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค.

    ์„ค๋ น ๋งค์ปค๋‹ˆ์ฆ˜์ด ํ™•์‹คํžˆ ๋ฐํ˜€์กŒ๋”๋ผ๋„ ๊ด€์ธก ์˜ค์ฐจ ๋“ฑ์„ ์™„์ „ํžˆ ์—†์•จ ์ˆ˜๋Š” ์—†์œผ๋ฏ€๋กœ ์ด๋Ÿฌํ•œ ๋ถˆ์™„์ „์„ฑ์„ ํ‘œํ˜„ํ•œ๋‹ค๋Š” ์ ์—์„œ๋„ ์ ํ•ฉํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค.

    • Everything in the model is described in terms of probability distributions.
    • Philosophically, treating parameters as random variables doesn’t necessarily mean that the process itself is stochastic. It is a statement about our imperfect understanding.

    2.5. Predictability

    prediction์€ ์›๋ž˜ ์–ด๋ ค์šด ๊ฒƒ์ด๋‹ค.

    ํ•˜๋‚˜์˜ ๊ฐ€์„ค์ด๋‚˜ ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ์—๋งŒ ์ง‘์ฐฉํ•˜์ง€ ๋ง๊ณ , ์—ฌ๋Ÿฌ ๊ฐ€์„ค์„ ๋น„๊ต ๊ฒ€์ฆํ•˜๊ณ , ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ ๊ทน ํ™œ์šฉํ•ด์•ผ ํ•œ๋‹ค.

    • We know there are known unknowns, but there are also unknown unknowns.
      • known unknowns: we put them using probabilistic distributions.
      • unknown unknowns: aka “black swans.” Failing to anticipate this probability of disturbance does not actually have a huge impact on the mean but results in enormously overconfident results.

    2.5.1. Barriers to forecasting

    1. ํƒœ์ƒ์  ํ•œ๊ณ„(์• ์ดˆ์— ์˜ˆ์ธก์ด ๋ถˆ๊ฐ€๋Šฅ)
    2. ๋‹จ์ˆœํ™” ๋ถˆ๊ฐ€๋Šฅํ•œ ๋Œ€์ƒ
    3. ๋Œ€์ƒ์— ๋Œ€ํ•œ ์ง€์‹ ๋ถ€์กฑ
    • inherent barriers: such as local disturbances
    • computational irreducibility: cannot use an explicit representation of detailed processes such as species niche spaces. It limits the predictability of important global change processes.
    • failures to understand: unlike to weather forecasting models, ecological models cannot represent the physics of the target processes. This may limit the usability of the pending big ecological data.

    2.5.2. Weather forecasting

    ๋‚ ์”จ ์˜ˆ์ธก์šฉ ๋ชจ๋ธ, ๊ธฐ์ˆ ์ด ๋งŽ์ด ๋ฐœ๋‹ฌ ๋ผ ์žˆ์Œ.

    ๋‚ ์”จ ์˜ˆ์ธก๊ณผ ์ƒํƒœ๊ณ„ ์˜ˆ์ธก ๋ฌธ์ œ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์ ์„ ์ž˜ ํŒŒ์•…ํ•˜์—ฌ(์ดˆ๊ธฐ ์กฐ๊ฑด ๋ฏผ๊ฐ์„ฑ ๋“ฑ) ๊ทธ๋“ค์˜ toolbox๋ฅผ ์ž˜ ํ™œ์šฉํ•˜์ž.

    ์ง€๊ธˆ ๋‹น์žฅ ์‹œ์ž‘ํ•˜์ž. ์™„๋ฒฝํžˆ ์ดํ•ดํ•˜๊ณ  ์‹œ์ž‘ํ•  ํ•„์š” ์—†๋‹ค. ๋ชจ๋ธ ์“ฐ๋ฉด์„œ ์ดํ•ด๋„๋„ ๋” ๋†’์•„์งˆ ๊ฒƒ์ด๋‹ค.

    • Ecologists can no longer wait to start making forecasts, and it is unrealistic to postpone forecasting until we have perfect knowledge.
    • The atmosphere is highly chaotic, which makes these models very sensitive to initial conditions.
    • It is an open question how many of these tools can be used as-is by ecologists.

    2.5.3. The ecological forecasting problem

    ์–ด๋ ต๋‹ค.

    Ecological forecasting์—์„œ ์ž์ฃผ ์“ฐ๋Š” a general form of logistic growth model์„ ์˜ˆ์‹œ๋กœ ์ฃผ๊ณ ,

    ์˜ˆ์ธก ๋Œ€์ƒ์˜ uncertainty๋ฅผ ํ‘œํ˜„ํ•œ ์‹์—์„œ ๊ฐ ํ•ญ์˜ ์˜๋ฏธ๋ฅผ ์ž์„ธํžˆ ๋ถ„์„ํ•˜๊ณ  ์žˆ๋‹ค.

    ์–ด๋–ค ์ ์šฉ ์‚ฌ๋ก€์— ๋Œ€ํ•ด์„œ๋„ uncertainty๋ฅผ ๋ถ„ํ•ดํ•ด์„œ ๊ฐ ํ•ญ๋ณ„๋กœ ์ •๋Ÿ‰ํ™” & ์ค‘์š”๋„ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค!

    uncertainty๋ฅผ ์ž˜ ํŒŒ์•…ํ•ด์•ผ overconfident๋ฅผ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค.

    • There are important differences between the nature of the forecasting problem in meteorology versus ecology.

    • Yt+1 = f(Yt, Xt | theta_bar + alpha) + epsilon

      Var[Yt+1] = (stability x IC uncert) + (driver sens x driver uncert) + {param sens x (param uncert + param variability)} + (process error)

      • IC tems:
        • Internal (= endogenous) stability.
        • Because it is recursive, it will dominate all other term if |df/dy| > 1.
        • There are definitely ecological systems that appear to be chaotic (too sensitive to the IC), so my personal working hypothesis is that for most ecological forecasting problems we need to focus on the other terms.
        • The only term that shows the recursive property (i.e. exponentially increase or decrease), which means it will increase into the future.
      • Driver terms:
        • External (= exogenous) sensitivity
        • The magnitude of this term will be problem specific.
        • We might use different covariates for forecasting than we use for explaining the same process.
        • Experimental design for forecasting is often different from that for hypothesis testing.
      • parameter terms:
        • parameter uncertainty (how well do we know the mean of theta?)
        • parameter variability: the variability of the process itself; another manifestation of the process error
      • process error
        • errors from processes not explained by the model
        • stochasticity: disturbances, dispersal, mortality, reproduction, …
        • structural uncertainty: aroused by empirical equations rather than those based on physical laws
        • heterogeneity in space, time, and phylogeny
      • The power of the equation is that for any particular application it gives a quantitative way of breaking down the overall forecast into its components and comparing their relative size and importance.
      • It is more pressing to quantify variability than it is to explain it.

    3. Data. Large and Small

    3.1. The data cycle and best practices

    ๋ฏธ๋ž˜์˜ ๋‚˜๋ฅผ ์œ„ํ•ด์„œ + ์ฝ”์›ค ์ฆ๊ฐ€ํ•˜๋Š” ์‹œ๋Œ€์— ๋งž์ถฐ

    ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ๊ด€๋ฆฌํ•ด์•ผ ํ•œ๋‹ค - ์ผ์ • ํ˜•์‹ + ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ


    • Data Life Cycle: two scales
      1. data generation: plan - collect - assure - describe - preserve
      2. synthesis and forecasting: discover, integrate, and analyze
    • Best practices
      1. automated
      2. documented
      3. recoverable

    3.1.1. Plan

    ๊ณ„ํš์„ ์ž˜ ์งœ์•ผ ํ›„ํ™˜์ด ์ ๋‹ค

    ์–ด์ฐจํ”ผ grant proposal์—๋„ ์จ์•ผ ํ•œ๋‹ค

    3.1.2. Collect

    ๋ฏธ๋ฆฌ ์—ด์ œ๋ชฉ ๋“ฑ ํ‹€์„ ๋งŒ๋“ค์–ด ๋†“์ž.

    ๊ทธ๋‚  ๋ชจ์€ ๊ฑด ๊ทธ๋‚  ํ™•์ธํ•˜์ž.

    3.1.3. Assure

    QA/QC ์ž˜ ํ•˜์ž.

    ์› ๊ฐ’์„ ๋ฐ”๊พธ๊ธฐ๋ณด๋‹ค๋Š” QA/QC ํ”Œ๋ž˜๊ทธ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒŒ ์ข‹๋‹ค.

    3.1.4. Describe

    ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋งŒ๋“ค์ž.

    3.1.5. Preserve

    ๋ณ€๊ฒฝ ๊ธฐ๋ก ์ž˜ ํ•˜์ž.

    ๊ณต๊ฐœ ๋ฌด๋ฃŒ ์ €์žฅ์†Œ๋ฅผ ์‚ฌ์šฉํ•˜์ž.

    3.1.6. Discover, integrate, analyze

    ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๋Š” ์Šคํ‚ฌ ๊ฐ–์ถ”์ž.

    3.2. Data standards and metadata

    3.2.1. Metadata

    Data’s provenance: ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ž˜ ์“ฐ์ž.

    3.2.2. Data standards

    ์ •๋‹ต์ด ์žˆ๋Š” ๊ฑด ์•„๋‹ˆ์ง€๋งŒ, ๊ทธ๋ž˜๋„ ์„œ๋กœ ํ•ฉ์˜ ํ•˜์— ์ผ์ •ํ•œ ํ˜•์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ž‘์„ฑํ•˜์ž.

    3.3. Handling big data

    Big-data: size, RAM size, handling tool …

    Box 3.1. Self-documenting Data

    The key to the self-documenting data is that the data and the metadata are stored together.

    The two most common self-documenting file formats:

    • the network common data format (netCDF)
    • the hierarchical data format (HDF)

    4. Scientific Workflows and the Informatics of Model-Data fusion

    4.1. Transparency, accountability, and repeatability

    ์žฌํ˜„๊ฐ€๋Šฅํ•ด์•ผ ํ•œ๋‹ค –> ๋ฐ์ดํ„ฐ, ๋ฐฉ๋ฒ•, ์†Œํ”„ํŠธ์›จ์–ด ํˆฌ๋ช…ํ•˜๊ฒŒ ๊ณต๊ฐœ

    ์˜คํ”ˆ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์ ๊ทน ํ™œ์šฉํ•ฉ์‹œ๋‹ค.

    4.2. Workflows and automation

    ์žฌํ˜„๊ฐ€๋Šฅํ•ด์•ผ ํ•œ๋‹ค –> ๊ฐ ๊ณผ์ •์ด ์ •ํ™•ํžˆ ๊ธฐ๋ก๋ผ์•ผ ํ•œ๋‹ค –> version control (e.g. Git)

    • Workflow: A sequence of steps performed in an analysis

    Box 4.1. Software Development Tools

    An example of software development pattern

    1. ์„ธํŒ…: pull the latest code from the mainline repository to your local repository
    2. ํ•  ์ผ ์ฒดํฌ: issue tracking system์— ์žˆ๋Š” ๊ฑด์˜์‚ฌํ•ญ์ด๋‚˜ ๋‹ค์Œ milestone์„ ์œ„ํ•ด ํ•„์š”ํ•œ ์ผ ํ™•์ธ
    3. Ctrl+S: ์ž‘์—…์šฉ ๋ณต์‚ฌ๋ณธ(branch) ์ƒ์„ฑ. ์ž‘์—… ํ•˜๋‚˜ ์™„๋ฃŒํ•  ๋•Œ๋งˆ๋‹ค commitํ•˜๊ธฐ
    4. ํ…Œ์ŠคํŠธ: ์™„๋ฃŒ๋˜๋ฉด push.
    5. ๋ฐ˜์˜ ์š”์ฒญ (pull request)
      • Collaborators๊ฐ€ ์ž‘์—… ๋‚ด์šฉ ๊ฒ€ํ† 
      • ์ด์ƒ ์—†์œผ๋ฉด mainline์— ๋ณ‘ํ•ฉ(pulled into the mainline)
      • ์ž‘์—… ๋กœ๊ทธ ๋ฐ comment๋“ค๋„ ๊ฐ™์ด ์ €์žฅ๋จ
    6. ์…”ํ„ฐ ๋‚ด๋ฆฌ๊ธฐ: ์™„๋ฃŒํ•œ issue closeํ•˜๊ธฐ.
    7. ๋ฐฐํฌ: ๋ชจ๋“  ๋ฒ„๊ทธ, ๊ฑด์˜ ๋“ฑ ๋ฐ˜์˜๋์œผ๋ฉด ๋ฐฐํฌํ•˜๊ธฐ
    • pull: ๋‹ค๋ฅธ branch (=version)์˜ ์ฝ”๋“œ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ
    • commit: ํ˜„์žฌ ์ž‘์—…์ค‘์ธ branch์˜ ๋ณ€๊ฒฝ์‚ฌํ•ญ์„ local repository์— ์ €์žฅ
    • push: ํ˜„์žฌ branch๋ฅผ remote repository์— ์ €์žฅ
    • mainline: main branch (= trunk or master). Stable branch์™€ development branch๋กœ ๋‚˜๋ˆ„๊ธฐ๋„ ํ•จ.

    4.3. Best practices for scientific computing

    ์žฌํ˜„๊ฐ€๋Šฅํ•ด์•ผ ํ•œ๋‹ค –> ์˜ฌ๋ฐ”๋ฅธ ์Šต๊ด€์„ ๊ธฐ๋ฅด๊ธฐ!

    My top recommendations

    • ๊ณ„ํš ์„ธ์šฐ๊ธฐ: psuedo-coding

    • ์œ ์ง€๋ณด์ˆ˜ ์šฉ์ดํ•˜๊ฒŒ: Modular

      • ํ•˜๋‚˜์˜ ๊ธฐ๋Šฅ - ํ•˜๋‚˜์˜ ํ•จ์ˆ˜
    • Version control system ์‚ฌ์šฉํ•˜๊ธฐ

    • ํ•˜๋‚˜ ๋งŒ๋“ค๋•Œ๋งˆ๋‹ค ํ…Œ์ŠคํŠธ๋กœ ๊ฒ€์ฆ

    • ์ฃผ์„์€ ํ•ด๋‹น ์œ„์น˜์—

    • ๋ณ€์ˆ˜ ์ด๋ฆ„ ๊ณต๋“ค์—ฌ์„œ

    • ์ฝ”๋”ฉ์— ๋„ˆ๋ฌด ๊ณต๋“ค์ด์ง€ ๋ง๊ธฐ: ์ผ๋‹จ ๋ฌด์‚ฌํžˆ ์‹คํ–‰ํ•˜๋Š” ๊ฒŒ ๋ชฉ์ !

    5. Introduction to Bayes

    5.1. Confronting models with data

    ์žฌํ˜„๊ฐ€๋Šฅํ•ด์•ผ ํ•œ๋‹ค –> ๋ฐ์ดํ„ฐ, ๋ฐฉ๋ฒ•, ์†Œํ”„ํŠธ์›จ์–ด ํˆฌ๋ช…ํ•˜๊ฒŒ ๊ณต๊ฐœ

    ์˜คํ”ˆ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์ ๊ทน ํ™œ์šฉํ•ฉ์‹œ๋‹ค.

    5.1. Confronting models with data

    5.2. Probability 101

    5.3. The likelihood

    5.4. Bayes’ theorem

    5.5. Prior information

    5.6. Numerical methods for Bayes

    5.7. Evaluating MCMC output

    Share on

    Hoontaek Lee
    WRITTEN BY
    Hoontaek Lee
    Tree-Forest-Climate Researcher

    What's on this Page