R packages by jdmde

scellpam - Applying Partitioning Around Medoids to Single Cell Data with High Number of Cells

PAM (Partitioning Around Medoids) algorithm application to samples of single cell sequencing techniques with a high number of cells (as many as the computer memory allows). The package uses a binary format to store matrices (either full, sparse or symmetric) in files written in the disk that can contain any data type (not just double) which allows its manipulation when memory is sufficient to load them as int or float, but not as double. The PAM implementation is done in parallel, using several/all the cores of the machine, if it has them. This package shares a great part of its code with packages 'jmatrix' and 'parallelpam' but their functionality is included here so there is no need to install them.

Last updated 9 months ago

cpp

2.78 score 9 scripts 237 downloads

parallelpam - Parallel Partitioning-Around-Medoids (PAM) for Big Sets of Data

Application of the Partitioning-Around-Medoids (PAM) clustering algorithm described in Schubert, E. and Rousseeuw, P.J.: "Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms." Information Systems, vol. 101, p. 101804, (2021). <doi:10.1016/j.is.2021.101804>. It uses a binary format for storing and retrieval of matrices developed for the 'jmatrix' package but the functionality of 'jmatrix' is included here, so you do not need to install it. Also, it is used by package 'scellpam', so if you have installed it, you do not need to install this package. PAM can be applied to sets of data whose dissimilarity matrix can be very big. It has been tested with up to 100.000 points. It does this with the help of the code developed for other package, 'jmatrix', which allows the matrix not to be loaded in 'R' memory (which would force it to be of double type) but it gets from disk, which allows using float (or even smaller data types). Moreover, the dissimilarity matrix is calculated in parallel if the computer has several cores so it can open many threads. The initial part of the PAM algorithm can be done with the BUILD or LAB algorithms; the BUILD algorithm has been implemented in parallel. The optimization phase implements the FastPAM1 algorithm, also in parallel. Finally, calculation of silhouette is available and also implemented in parallel.

Last updated 9 months ago

cpp

2.60 score 6 scripts 273 downloads

jmatrix - Read from/Write to Disk Matrices with any Data Type in a Binary Format

A mainly instrumental package meant to allow other packages whose core is written in 'C++' to read, write and manipulate matrices in a binary format so that the memory used for them is no more than strictly needed. Its functionality is already inside 'parallelpam' and 'scellpam', so if you have installed any of these, you do not need to install 'jmatrix'. Using just the needed memory is not always true with 'R' matrices or vectors, since by default they are of double type. Trials like the 'float' package have been done, but to use them you have to coerce a matrix already loaded in 'R' memory to a float matrix, and then you can delete it. The problem comes when your computer has not memory enough to hold the matrix in the first place, so you are forced to load it by chunks. This is the problem this package tries to address (with partial success, but this is a difficult problem since 'R' is not a strictly typed language, which is anyway quite hard to get in an interpreted language). This package allows the creation and manipulation of full, sparse and symmetric matrices of any standard data type.

Last updated 9 months ago

cpp

2.00 score 2 scripts 227 downloads