Working with categorical variables

Most anyone working with any kind of data will have no trouble with binary outcomes (for example, case vs. control) and with relating them to continuous variables such as gene expression profiles. Indeed, the Student t-test or simple linear regression are some of the first topics encountered in data analysis. Categorical outcomes that encode more … Continue reading Working with categorical variables

Functional enrichment analysis via R package anRichment

At some point in most any analysis of high-throughput data one wants to study enrichment of a resulting set (or sets) of genes in predefined reference gene sets. Although there are many tools out there that let the user evaluate enrichment in standard reference sets such as GO and KEGG, there are relatively few that … Continue reading Functional enrichment analysis via R package anRichment

When can correlation network analysis be useful and when you are better off not using it

Weighted correlation analysis in general and WGCNA in particular can be applied to many problems and data sets, but certainly not to all. To set the terminology straight, recall that, in a correlation network, the each node represents a variable (feature), and links represent correlations, possibly transformed, among the variables. Although one could construct the … Continue reading When can correlation network analysis be useful and when you are better off not using it

“Blockwise” network analysis of large data

A straightforward weighted correlation network analysis of large data (tens of thousands of nodes or more) is quite memory hungry. Because the analysis uses a correlation or similarity matrix of all nodes, for n network nodes the memory requirement scales as n2. In R, one has to multiply that by 8 bytes for each (double … Continue reading “Blockwise” network analysis of large data

Signed and signed hybrid: what’s the difference?

In a previous post I gave my recommendation to use signed rather unsigned networks. This post will describe the two slightly different formulas that WGCNA offers for building signed networks from a correlation matrix. As a quick reminder, constructing a network really means calculating its adjacency matrix aij. Elements of this matrix encode the connection … Continue reading Signed and signed hybrid: what’s the difference?

Signed or unsigned: which network type is preferable?

How should pairs of nodes with strong negative correlations be treated in a correlation network analysis? One option is to consider them connected, just as if the correlation were positive. A network constructed in this way is an unsigned network, because the sign of the correlation does not matter. On the other hand, strongly negatively … Continue reading Signed or unsigned: which network type is preferable?