VOSA. Help and Documentation

Version 7.5, July 2022

1. Introduction
2. Input files
2.1. Upload files
2.2. VOSA file format
2.3. Single object
2.4. Manage files
2.5. Archiving
2.6. Filters
3. Objects
3.1. Coordinates
3.2. Distances
3.3. Extinction
4. Build SEDs
4.1. VO photometry
4.2. SED
4.3. Excess
5. Analysis
5.1. Model Fit
5.2. Bayes analysis
5.3. Template Fit
5.4. Templates Bayes
5.5. Binary Fit
5.6. HR diagram
5.7. Upper Limits
5.8. Statistics
6. Save results
6.1. Download
6.2. SAMP
6.3. References
6.4. Log file
6.5. Plots
7. VOSA Architecture
8. Phys. Constants
9. FAQ
10. Use Case
11. Quality
11.1. Stellar libraries
11.2. VO photometry
11.3. Binary Fit Quality
12. Credits
12.1. VOSA
12.2. Th. Spectra
12.3. Templates
12.4. Isochrones
12.5. VO Photometry
12.6. Coordinates
12.7. Distances
12.8. Dereddening
12.9. Extinction
13. Helpdesk
14. About
 
Appendixes
. Excess calculation
. Total flux calculation
. VOphot quality info

Statistics

Definitions

We have obtained a set of N different values for the quantity X: $\{X_i\}$.

  • Mean value (average) $$ \mu \equiv \frac{\sum_i X_i}{N}$$
  • Standard deviation $$ \sigma \equiv \sqrt{\frac{\sum_i (X_i - \mu)^2}{N-1}}$$
  • Centered moments $$\mu_n \equiv \frac{\sum_i (X_i - \mu)^n}{N-1}$$
  • Skewness $${\rm Skew} \equiv \frac{\mu_3}{\sigma^3}$$
  • Kurtosis $${\rm Kur} \equiv \frac{\mu_4}{\sigma^4}$$

Definitions for a grouped distribution

The values can be grouped in different bins, so that we have a set of ordered pairs {value,frequency}. $$ \{X_i,Freq(X_i)\}$$ $${\rm with } \ X_i > X_{i-1}$$

  • Percentiles.

    A percentile is the value below which a given percentage of observations in a group of observations fall.
    In other words, the Percentile $P_k$ is defined as the value so that k/100 of the values in the distribution are smaller than it.

    Let's define some notations for the case of grouped values:

    $N = \sum Freq(X_i)$ (total number of values)

    $ S_n = \sum_{i<=n} Freq(X_i) $ (cumulated sum of frequencies up to the n-th bin)

    $ S_k = k * N/100$ is the cumulated sum of values corresponding to the k-th percentile (for instance, if we are looking for $P_{73}$ in a distribution with 1000 values, $S_k=730$)

    When we are looking for the k-th percentile, and $S_n = S_k$, then $P_k = X_n$.

    But if often happens that $S_{i-1} < S_k$ and $S_i > S_k$. In this case, the k-th percentile can be calculated using a linear interpolation: $$P_k = X_{n-1} + (X_n - X_{n-1}) \frac{S_k - S_{n-1}}{S_n - S_{n-1}} $$

  • Quartiles

    The quartiles of a distribution are defined as the 25, 50 and 75 percentiles. That is: $$Q_1 = P_{25}$$ $$Q_2 = P_{50}$$ $$Q_3 = P_{75}$$

  • Median

    The median is defined as the X value so that half the values in the distribution are smaller and the other half are larger. It can be said that it is the "medium point of the distribution".

    In practice, it is defined as $P_{50}$. $${\rm Median} = P_{50}$$

  • Mode

    The mode is the value that appears most often in a set of data.

    Normality tests

    There are several tests that can be used to estimate if a given set of values corresponds to an underlying Normal distribution. In VOSA we have implemented the Pearson's chi-squared goodness of fit test. Both at the Bayes analysis and the Chi2 model fit (when parameter uncertainties are estimated using a Monte Carlo method).

    Pearson's chi-squared test

    Pearson's chi-squared test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation: $$ \chi ^{2} = \sum _{i=1}^{n} \frac{ (O_{i}-E_{i})^{2} }{E_{i}} $$

    where:

    • $O_i$ = observed frequency for bin i.
    • $E_i$ = expected frequency for bin i.

    The expected frequency is calculated by: $$ E_{i} = N \cdot [ F(Y_{u}) - F(Y_{l}) ] $$

    where:

    • F = the cumulative distribution function for the normal distribution.
    • $Y_u$ = the upper limit for class i,
    • $Y_l$ = the lower limit for class i, and
    • N = the sample size

    Once obtained the value of $\chi^2$ we compare it to the chi-square distribution for the corresponding degrees of freedom and obtain a range of values for the probability that our values, $ \{X_i,Freq(X_i)\}$, can correspond to an underlying normal distribution.

    See, for instance, Goodness of fit (at the Wikipedia) for more details.