Skip to content
Snippets Groups Projects
README.md 2.78 KiB
Newer Older

# PLNmodels: Poisson lognormal models

Bastien Batardière's avatar
Bastien Batardière committed
> The Poisson lognormal model and variants can be used for analysis of mutivariate count data.
> This package implements
> efficient algorithms extracting meaningful data from difficult to interpret
Bastien Batardière's avatar
Bastien Batardière committed
> and complex multivariate count data. It has been built to scale on large datasets even
> though it has memory limitations. Possible fields of applications include
> - Genomics (number of times a gene is expressed in a cell)
> - Ecology (species abundances)
> One main functionality is to normalize the count data to obtain more valuable
> data. It also analyse the significance of each variable and their correlation as well as the weight of
> covariates (if available).
<!-- accompanied with a set of -->
<!-- > functions for visualization and diagnostic. See [this deck of -->
<!-- > slides](https://pln-team.github.io/slideshow/) for a -->
<!-- > comprehensive introduction. -->

The getting started can be found [here](Getting_started.ipynb). If you need just a quick view of the package, see the quickstart next.
Bastien Batardière's avatar
Bastien Batardière committed
## 🛠 Installation

**pyPLNmodels** is available on
[pypi](https://pypi.org/project/pyPLNmodels/). The development
version is available on [GitHub](https://github.com/PLN-team/pyPLNmodels).

### Package installation

```
pip install pyPLNmodels
```

Bastien Batardière's avatar
Bastien Batardière committed
The package comes with an ecological data set to present the functionality
Bastien Batardière's avatar
Bastien Batardière committed
import pyPLNmodels
from pyPLNmodels.models import PlnPCAcollection, Pln, ZIPln
Bastien Batardière's avatar
Bastien Batardière committed
from pyPLNmodels.oaks import load_oaks
oaks = load_oaks()
Bastien Batardière's avatar
Bastien Batardière committed
### Unpenalized Poisson lognormal model (aka PLN)
Bastien Batardière's avatar
Bastien Batardière committed
pln = Pln.from_formula("endog ~ 1  + tree + dist2ground + orientation ", data = oaks, take_log_offsets = True)
Bastien Batardière's avatar
Bastien Batardière committed
pln.fit()
print(pln)
transformed_data = pln.transform()
### Rank Constrained Poisson lognormal for Poisson Principal Component Analysis (aka PLNPCA)
Bastien Batardière's avatar
Bastien Batardière committed
pca =  PlnPCAcollection.from_formula("endog ~ 1  + tree + dist2ground + orientation ", data = oaks, take_log_offsets = True, ranks = [3,4,5])
Bastien Batardière's avatar
Bastien Batardière committed
pca.fit()
print(pca)
transformed_data = pca.best_model().transform()
```
### Zero inflated Poisson Log normal Model (aka ZIPln)
```
Bastien Batardière's avatar
Bastien Batardière committed
zi =  ZIPln.from_formula("endog ~ 1  + tree + dist2ground + orientation ", data = oaks, take_log_offsets = True)
print(zi)
transformed_data = zi.transform()

## 👐 Contributing

Feel free to contribute, but read the [CONTRIBUTING.md](https://forgemia.inra.fr/bbatardiere/pyplnmodels/-/blob/main/CONTRIBUTING.md) first. A public roadmap will be available soon.


## ⚡️ Citations
Bastien Batardière's avatar
Bastien Batardière committed
Please cite our work using the following references:
-   J. Chiquet, M. Mariadassou and S. Robin: Variational inference for
    probabilistic Poisson PCA, the Annals of Applied Statistics, 12:
        2674–2698, 2018. [link](http://dx.doi.org/10.1214/18%2DAOAS1177)