Back from the dead – anRichment is now on GitHub

As many users have noticed, the web site of Horvath lab that hosted, among other useful information, the anRichment package and related software, was taken down by UCLA towards the end of 2023. With it disappeared all tutorials and packages, at least those not available from CRAN. It took longer than it should but I … Continue reading Back from the dead – anRichment is now on GitHub →

Signed networks from signed topological overlap

I have previously described the differences between signed and unsigned networks and the two closely related ways of constructing a signed network in WGCNA. WGCNA now contains a third option for constructing a signed network, implementing a signed version of the standard Topological Overlap described by Katia Nowick and collaborators already in 2009 and also … Continue reading Signed networks from signed topological overlap →

Why WGCNA modules don’t always agree with the dendrogram?

This post is about Dynamic Tree Cut, the method used, together with hierarchical clustering, to identify modules (clusters) in WGCNA. To put this post in context, in WGCNA, through several steps, one constructs a variable-variable similarity matrix which is then used for clustering. (The clustering similarity is usually the Topological Overlap Matrix, TOM, but it … Continue reading Why WGCNA modules don’t always agree with the dendrogram? →

Removal of unwanted variation based on a subset of samples, with R code

Batch effects, technical variation and other sources of unwanted or spurious variation are ever-present in big data, especially so in high-throughput molecular (gene expression, proteomic or methylation) profiling. Fortunately, multiple methods exist for removing such variation that are suitable for various situations one may encounter. When a source of unwanted variation is known and can … Continue reading Removal of unwanted variation based on a subset of samples, with R code →

Working with categorical variables

Most anyone working with any kind of data will have no trouble with binary outcomes (for example, case vs. control) and with relating them to continuous variables such as gene expression profiles. Indeed, the Student t-test or simple linear regression are some of the first topics encountered in data analysis. Categorical outcomes that encode more … Continue reading Working with categorical variables →

Functional enrichment analysis via R package anRichment

At some point in most any analysis of high-throughput data one wants to study enrichment of a resulting set (or sets) of genes in predefined reference gene sets. Although there are many tools out there that let the user evaluate enrichment in standard reference sets such as GO and KEGG, there are relatively few that … Continue reading Functional enrichment analysis via R package anRichment →

Filtering and collapsing data

I wrote recently about the "blockwise" approach that allows the WGCNA package to analyze large data with modest computational resources. This is all nice and well, but it often makes sense to reduce the number of variables in the data set before even starting the analysis. The simplest reduction is to filter out the variables … Continue reading Filtering and collapsing data →

When can correlation network analysis be useful and when you are better off not using it

Weighted correlation analysis in general and WGCNA in particular can be applied to many problems and data sets, but certainly not to all. To set the terminology straight, recall that, in a correlation network, the each node represents a variable (feature), and links represent correlations, possibly transformed, among the variables. Although one could construct the … Continue reading When can correlation network analysis be useful and when you are better off not using it →

WGCNA resources on the web

This post collects a few links to WGCNA-related material posted elsewhere on the web. First and foremost, the WGCNA page maintained by me (PL) is the place to go for WGCNA downloads, the original set of tutorials and an FAQ. Steve Horvath wrote a comprehensive book on weighted network analysis called, appropriately, Weighted Network Analysis: … Continue reading WGCNA resources on the web →

“Blockwise” network analysis of large data

A straightforward weighted correlation network analysis of large data (tens of thousands of nodes or more) is quite memory hungry. Because the analysis uses a correlation or similarity matrix of all nodes, for n network nodes the memory requirement scales as n2. In R, one has to multiply that by 8 bytes for each (double … Continue reading “Blockwise” network analysis of large data →