  ## VOSA. Help and Documentation

### Version 7.0, July 2021 Help News FAQ Credits Help-Desk
 Stars and brown dwarfs 1. Introduction
2. Input files
 2.1. Upload files 2.2. VOSA file format 2.3. Single object 2.4. Manage files 2.5. Archiving 2.6. Filters
3. Objects
 3.1. Coordinates 3.2. Distances 3.3. Extinction
4. Build SEDs
 4.1. VO photometry 4.2. SED 4.3. Excess
5. Analysis
 5.1. Model Fit 5.2. Bayes analysis 5.3. Template Fit 5.4. Templates Bayes 5.5. Binary Fit 5.6. HR diagram 5.7. Upper Limits 5.8. Statistics
6. Save results
7. VOSA Architecture
8. Phys. Constants
9. FAQ
10. Use Case
11. Quality
 11.1. Stellar libraries 11.2. VO photometry
12. Credits
 12.1. VOSA 12.2. Th. Spectra 12.3. Templates 12.4. Isochrones 12.5. VO Photometry 12.6. Coordinates 12.7. Distances 12.8. Dereddening 12.9. Extinction
13. Helpdesk

Appendixes

## Statistics

### Definitions

We have obtained a set of N different values for the quantity X: $\{X_i\}$.

• Mean value (average) $$\mu \equiv \frac{\sum_i X_i}{N}$$
• Standard deviation $$\sigma \equiv \sqrt{\frac{\sum_i (X_i - \mu)^2}{N-1}}$$
• Centered moments $$\mu_n \equiv \frac{\sum_i (X_i - \mu)^n}{N-1}$$
• Skewness $${\rm Skew} \equiv \frac{\mu_3}{\sigma^3}$$
• Kurtosis $${\rm Kur} \equiv \frac{\mu_4}{\sigma^4}$$

### Definitions for a grouped distribution

The values can be grouped in different bins, so that we have a set of ordered pairs {value,frequency}. $$\{X_i,Freq(X_i)\}$$ $${\rm with } \ X_i > X_{i-1}$$

• Percentiles.

A percentile is the value below which a given percentage of observations in a group of observations fall.
In other words, the Percentile $P_k$ is defined as the value so that k/100 of the values in the distribution are smaller than it.

Let's define some notations for the case of grouped values:

$N = \sum Freq(X_i)$ (total number of values)

$S_n = \sum_{i<=n} Freq(X_i)$ (cumulated sum of frequencies up to the n-th bin)

$S_k = k * N/100$ is the cumulated sum of values corresponding to the k-th percentile (for instance, if we are looking for $P_{73}$ in a distribution with 1000 values, $S_k=730$)

When we are looking for the k-th percentile, and $S_n = S_k$, then $P_k = X_n$.

But if often happens that $S_{i-1} < S_k$ and $S_i > S_k$. In this case, the k-th percentile can be calculated using a linear interpolation: $$P_k = X_{n-1} + (X_n - X_{n-1}) \frac{S_k - S_{n-1}}{S_n - S_{n-1}}$$

• Quartiles

The quartiles of a distribution are defined as the 25, 50 and 75 percentiles. That is: $$Q_1 = P_{25}$$ $$Q_2 = P_{50}$$ $$Q_3 = P_{75}$$

• Median

The median is defined as the X value so that half the values in the distribution are smaller and the other half are larger. It can be said that it is the "medium point of the distribution".

In practice, it is defined as $P_{50}$. $${\rm Median} = P_{50}$$

• Mode

The mode is the value that appears most often in a set of data.

### Normality tests

There are several tests that can be used to estimate if a given set of values corresponds to an underlying Normal distribution. In VOSA we have implemented the Pearson's chi-squared goodness of fit test. Both at the Bayes analysis and the Chi2 model fit (when parameter uncertainties are estimated using a Monte Carlo method).

Pearson's chi-squared test

Pearson's chi-squared test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation: $$\chi ^{2} = \sum _{i=1}^{n} \frac{ (O_{i}-E_{i})^{2} }{E_{i}}$$

where:

• $O_i$ = observed frequency for bin i.
• $E_i$ = expected frequency for bin i.

The expected frequency is calculated by: $$E_{i} = N \cdot [ F(Y_{u}) - F(Y_{l}) ]$$

where:

• F = the cumulative distribution function for the normal distribution.
• $Y_u$ = the upper limit for class i,
• $Y_l$ = the lower limit for class i, and
• N = the sample size

Once obtained the value of $\chi^2$ we compare it to the chi-square distribution for the corresponding degrees of freedom and obtain a range of values for the probability that our values, $\{X_i,Freq(X_i)\}$, can correspond to an underlying normal distribution.

See, for instance, Goodness of fit (at the Wikipedia) for more details. Upper Limits Download 