GenomicRanges - Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Last updated
geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-packagegenomicranges
18.30 score 46 stars 1.4k dependents 26k scripts 97k downloadsBiostrings - Efficient manipulation of biological strings
Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.
Last updated
sequencematchingalignmentsequencinggeneticsdataimportdatarepresentationinfrastructurebioconductor-packagecore-package
17.94 score 67 stars 1.2k dependents 14k scripts 102k downloadsSummarizedExperiment - A container (S4 class) for matrix-like assays
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Last updated
geneticsinfrastructuresequencingannotationcoveragegenomeannotationbioconductor-packagecore-package
17.00 score 37 stars 1.3k dependents 13k scripts 93k downloadsS4Vectors - Foundation of vector-like and list-like containers in Bioconductor
The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).
Last updated
infrastructuredatarepresentationbioconductor-packagecore-package
16.45 score 18 stars 2.0k dependents 1.8k scripts 118k downloadsIRanges - Foundation of integer range manipulation in Bioconductor
Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.
Last updated
infrastructuredatarepresentationbioconductor-packagecore-packageopenmp
16.41 score 23 stars 1.9k dependents 3.1k scripts 117k downloadsGenomeInfoDb - Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
Contains data and functions that define and allow translation between different chromosome sequence naming conventions (e.g., "chr1" versus "1"), including a function that attempts to place sequence names in their natural, rather than lexicographic, order.
Last updated
geneticsdatarepresentationannotationgenomeannotationbioconductor-packagecore-package
16.08 score 33 stars 329 dependents 2.4k scripts 102k downloadsGenomicAlignments - Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Last updated
infrastructuredataimportgeneticssequencingrnaseqsnpcoveragealignmentimmunooncologybioconductor-packagecore-package
15.78 score 11 stars 555 dependents 4.4k scripts 48k downloadsDelayedArray - A unified framework for working transparently with on-disk and in-memory array-like datasets
Wrapping an array-like object (typically an on-disk object) in a DelayedArray object allows one to perform common array operations on it without loading the object in memory. In order to reduce memory usage and optimize performance, operations on the object are either delayed or executed using a block processing mechanism. Note that this also works on in-memory array-like objects like DataFrame objects (typically with Rle columns), Matrix objects, ordinary arrays and, data frames.
Last updated
infrastructuredatarepresentationannotationgenomeannotationbioconductor-packagecore-packageu24ca289073
15.63 score 29 stars 1.4k dependents 762 scripts 94k downloadsGenomicFeatures - Query the gene models of a given organism/assembly
Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.
Last updated
geneticsinfrastructureannotationsequencinggenomeannotationbioconductor-packagecore-package
15.59 score 27 stars 353 dependents 8.2k scripts 35k downloadsBiocGenerics - S4 generic functions used in Bioconductor
The package defines many S4 generic functions used in Bioconductor.
Last updated
infrastructurebioconductor-packagecore-package
14.56 score 13 stars 2.4k dependents 860 scripts 125k downloadsBSgenome - Software infrastructure for efficient representation of full genomes and their SNPs
Infrastructure shared by all the Biostrings-based genome data packages.
Last updated
geneticsinfrastructuredatarepresentationsequencematchingannotationsnpbioconductor-packagecore-package
14.28 score 9 stars 275 dependents 1.7k scripts 25k downloadsHDF5Array - HDF5 datasets as array-like objects in R
The HDF5Array package is an HDF5 backend for DelayedArray objects. It implements the HDF5Array, H5SparseMatrix, H5ADMatrix, and TENxMatrix classes, 4 convenient and memory-efficient array-like containers for representing and manipulating either: (1) a conventional (a.k.a. dense) HDF5 dataset, (2) an HDF5 sparse matrix (stored in CSR/CSC/Yale format), (3) the central matrix of an h5ad file (or any matrix in the /layers group), or (4) a 10x Genomics sparse matrix. All these containers are DelayedArray extensions and thus support all operations (delayed or block-processed) supported by DelayedArray objects.
Last updated
infrastructuredatarepresentationdataimportsequencingrnaseqcoverageannotationgenomeannotationsinglecellimmunooncologybioconductor-packagecore-packageu24ca289073
12.90 score 12 stars 157 dependents 1.4k scripts 28k downloadsSparseArray - High-performance sparse data representation and manipulation in R
The SparseArray package provides array-like containers for efficient in-memory representation of multidimensional sparse data in R (arrays and matrices). The package defines the SparseArray virtual class and two concrete subclasses: COO_SparseArray and SVT_SparseArray. Each subclass uses its own internal representation of the nonzero multidimensional data: the "COO layout" and the "SVT layout", respectively. SVT_SparseArray objects mimic as much as possible the behavior of ordinary matrix and array objects in base R. In particular, they suppport most of the "standard matrix and array API" defined in base R and in the matrixStats package from CRAN.
Last updated
infrastructuredatarepresentationbioconductor-packagecore-packageu24ca289073openmp
12.80 score 11 stars 1.4k dependents 103 scripts 91k downloadsXVector - Foundation of external vector representation and manipulation in Bioconductor
Provides memory efficient S4 classes for storing sequences "externally" (e.g. behind an R external pointer, or on disk).
Last updated
infrastructuredatarepresentationbioconductor-packagecore-packagezlib
11.62 score 3 stars 1.8k dependents 85 scripts 112k downloadsRhtslib - HTSlib high-throughput sequencing library as an R package
This package provides version 1.18 of the 'HTSlib' C library for high-throughput sequence analysis. The package is primarily useful to developers of other R packages who wish to make use of HTSlib. Motivation and instructions for use of this package are in the vignette, vignette(package="Rhtslib", "Rhtslib").
Last updated
dataimportsequencingbioconductor-packagecore-packagecurlbzip2xz-utilszlib
11.06 score 11 stars 612 dependents 3 scripts 48k downloadsS4Arrays - Foundation of array-like containers in Bioconductor
The S4Arrays package defines the Array virtual class to be extended by other S4 classes that wish to implement a container with an array-like semantic. It also provides: (1) low-level functionality meant to help the developer of such container to implement basic operations like display, subsetting, or coercion of their array-like objects to an ordinary matrix or array, and (2) a framework that facilitates block processing of array-like objects (typically on-disk objects).
Last updated
infrastructuredatarepresentationbioconductor-packagecore-packageu24ca289073
10.97 score 7 stars 1.4k dependents 13 scripts 82k downloadstxdbmaker - Tools for making TxDb objects from genomic annotations
A set of tools for making TxDb objects from genomic annotations from various sources (e.g. UCSC, Ensembl, and GFF files). These tools allow the user to download the genomic locations of transcripts, exons, and CDS, for a given assembly, and to import them in a TxDb object. TxDb objects are implemented in the GenomicFeatures package, together with flexible methods for extracting the desired features in convenient formats.
Last updated
infrastructuredataimportannotationgenomeannotationgenomeassemblygeneticssequencingbioconductor-packagecore-package
10.49 score 5 stars 63 dependents 305 scripts 7.7k downloadsSeqinfo - A simple S4 class for storing basic information about a collection of genomic sequences
The Seqinfo class stores the names, lengths, circularity flags, and genomes for a particular collection of sequences. These sequences are typically the chromosomes and/or scaffolds of a specific genome assembly of a given organism. Seqinfo objects are rarely used as standalone objects. Instead, they are used as part of higher-level objects to represent their seqinfo() component. Examples of such higher-level objects are GRanges, RangedSummarizedExperiment, VCF, GAlignments, etc... defined in other Bioconductor infrastructure packages.
Last updated
infrastructuredatarepresentationgenomeassemblyannotationgenomeannotationbioconductor-packagecore-package
10.41 score 1 stars 1.8k dependents 26 scripts 61k downloadsUCSC.utils - Low-level utilities to retrieve data from the UCSC Genome Browser
A set of low-level utilities to retrieve data from the UCSC Genome Browser. Most functions in the package access the data via the UCSC REST API but some of them query the UCSC MySQL server directly. Note that the primary purpose of the package is to support higher-level functionalities implemented in downstream packages like GenomeInfoDb or txdbmaker.
Last updated
infrastructuregenomeassemblyannotationgenomeannotationdataimportbioconductor-packagecore-package
9.40 score 1 stars 330 dependents 10 scripts 64k downloadsh5mread - A fast HDF5 reader
The main function in the h5mread package is h5mread(), which allows reading arbitrary data from an HDF5 dataset into R, similarly to what the h5read() function from the rhdf5 package does. In the case of h5mread(), the implementation has been optimized to make it as fast and memory-efficient as possible.
Last updated
infrastructuredatarepresentationdataimportu24ca289073curlopenssl
9.15 score 3 stars 157 dependents 4 scripts 12k downloadscigarillo - Efficient manipulation of CIGAR strings
CIGAR stands for Concise Idiosyncratic Gapped Alignment Report. CIGAR strings are found in the BAM files produced by most aligners and in the AIRR-formatted output produced by IgBLAST. The cigarillo package provides functions to parse and inspect CIGAR strings, trim them, turn them into ranges of positions relative to the "query space" or "reference space", and project positions or sequences from one space to the other. Note that these operations are low-level operations that the user rarely needs to perform directly. More typically, they are performed behind the scene by higher-level functionality implemented in other packages like Bioconductor packages GenomicAlignments and igblastr.
Last updated
infrastructurealignmentsequencematchingsequencingbioconductor-packagecore-package
8.96 score 557 dependents 5 scripts 22k downloadspwalign - Perform pairwise sequence alignments
The two main functions in the package are pairwiseAlignment() and stringDist(). The former solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. The latter computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.
Last updated
alignmentsequencematchingsequencinggeneticsbioconductor-package
8.66 score 1 stars 111 dependents 116 scripts 12k downloadsigblastr - User-friendly R Wrapper to IgBLAST
The igblastr package provides functions to conveniently install and use a local IgBLAST installation from within R. The package also includes a set of built-in IgBLAST-compatible germline databases from OGRDB, the AIRR Community’s Open Germline Receptor Database, for various organisms. It provides functions to create additional IgBLAST-compatible germline databases using reference sequences retrieved from IMGT/V-QUEST or local FASTA files supplied by the user. When possible, annotations for the V and J alleles in a new germline database are automatically computed and added to the database, so they can be used as replacements for the internal and auxiliary data shipped with IgBLAST. IgBLAST is described at <https://pubmed.ncbi.nlm.nih.gov/23671333/>. IgBLAST web interface: <https://www.ncbi.nlm.nih.gov/igblast/>. OGRDB: <https://ogrdb.airr-community.org/>. IMGT/V-QUEST download site: <https://www.imgt.org/download/V-QUEST/>.
Last updated
immunologyimmunogeneticsimmunooncologycellbiologybioconductor-package
6.64 score 4 stars 20 scripts 214 downloadsZarrArray - Bring Zarr datasets in R as DelayedArray objects
The ZarrArray package leverages the Rarr package to bring Zarr datasets in R as DelayedArray objects. The main class in the package is the ZarrArray class. A ZarrArray object is an array-like object that represents a Zarr dataset in R. ZarrArray objects are DelayedArray derivatives and therefore support all operations (delayed or block-processed) supported by DelayedArray objects.
Last updated
infrastructuredatarepresentationdataimportbioconductor-packagecore-packageu24ca289073
6.62 score 5 stars 4 dependents 3 scripts 628 downloadsBSgenomeForge - Forge your own BSgenome data package
A set of tools to forge BSgenome data packages. Supersedes the old seed-based tools from the BSgenome software package. This package allows the user to create a BSgenome data package in one function call, simplifying the old seed-based process.
Last updated
infrastructuredatarepresentationgenomeassemblyannotationgenomeannotationsequencingalignmentdataimportsequencematchingbioconductor-packagecore-package
6.54 score 5 stars 20 scripts 486 downloadsSplicingGraphs - Create, manipulate, visualize splicing graphs, and assign RNA-seq reads to them
This package allows the user to create, manipulate, and visualize splicing graphs and their bubbles based on a gene model for a given organism. Additionally it allows the user to assign RNA-seq reads to the edges of a set of splicing graphs, and to summarize them in different ways.
Last updated
geneticsannotationdatarepresentationvisualizationsequencingrnaseqgeneexpressionalternativesplicingtranscriptionimmunooncologybioconductor-package
5.58 score 2 stars 21 scripts 578 downloadsupdateObject - Find/fix old serialized S4 instances
A set of tools built around updateObject() to work with old serialized S4 instances. The package is primarily useful to package maintainers who want to update the serialized S4 instances included in their package. This is still work-in-progress.
Last updated
infrastructuredatarepresentationbioconductor-packagecore-package
4.30 score 1 stars 4 scripts 308 downloads