Product Network Analysis – The Next Big Thing in Retail Data Mining

One of the biggest challenges retailers have is the depth of data available for decision making, especially if they don’t have a loyalty program. Though limited, are retailers nonetheless maximizing use of their existing data today? The answer is no. Product Network Analysis opens a new range of insights which can maximize return on category investments.

According to a 2011 study by Kantar Retail, 72% of retailers consider category management very / extremely important, with 88% believing that use of category insights will differentiate companies in the future, putting it on top of the list. Yet, for years, category management analytics in most retailers has not gone beyond using standard market research, voice of the customer, and market basket analysis tools, barely scratching the surface. Today, thanks to advancements in data mining techniques, retailers can do more.

Social Network Analysis (SNA) is a relatively new commercialized analysis method, automatically identifying social relations between individuals, based on their telecommunications, finance or social media interactions. Today, most leading telecoms operators in the world are using some SNA solution.

Product Network Analysis (PNA) is the application of SNA algorithms in the category management domain, in order to automatically identify:

  • Which products naturally belong to the same micro category
  • Which products are most important in terms of creating category loyalty
  • Which products are most likely to trigger cross-category sales
  • Where category rationalization opportunities exist


Product Network Analysis presents significant advantages over the most common insights / analysis methods used in category management, complementing them for a 360 degree view of the product portfolio. Three general methods are used in retail to conduct product-related analysis, each with their own shortcomings.

Categorization Based on Product Features

The simplest and most common form of product segmentation is based on the product features entered by purchasing, category management, or supplier teams. While this is a necessary evil, it has a number of shortcomings when compared to PNA. To name a few:

  • It Is Subjective: As a simple example, whether a product is luxury or not depends solely on the perception of the person labeling it as such during the segmentation effort.
  • It Is Open To Human Errors: Missing and low quality product data is very common (especially in retailers with tens of thousands of products), as keeping product data accurate and up to date becomes a secondary objective against the fast processing of items.
  • It Is Constrained By The Human Mind: One of the most important shortcomings of manual categorization is its lack of ability to reveal hidden relations between products (an example of which is the famous business intelligence correlation between beer and diapers, whereby young fathers were found to be buying beer when buying diapers on weekends, a highly unpredictable bundle to say the least).

Categorization Based on Customer / Shopper Insights

A relatively more sophisticated form of product segmentation uses the profiles of customers who are buying these products. For example, products more frequently purchased by health-conscious young families are identified and managed as a lifestyle / life stage category. Though this approach allows for a deeper understanding of products, it has its shortcomings:

  • It Is Limited By The Depth Of Customer Data: There are generally significant issues around data availability and quality of customer-related data in retailers. In order to do a proper product categorization driven by customer insights, detailed demographics, socio-economics and psychographics data is required, information which is a serious challenge for most retailers to obtain.
  • It Is Not Micro Enough: Although customer insights driven product portfolio analyses are highly useful in macro level decisions (i.e. whether to enter a category or not), they can’t provide enough detail (on their own) to allow for more tactical decisions to be made.

Correlation Based on Market Basket Analysis

Last, but not the least, most leading retailers make use of market basket analysis to understand correlations between products, which then becomes a critical input to bundling and promotion decisions. Like the prior two analysis methods, although this is a necessary activity, it cannot replace PNA due to its shortcomings:

  • It Is Not Complete: Market basket analysis identifies pairs and groups of products sold together most frequently, but it does not categorize or provide a complete view of the product portfolio. Knowing that Snickers is most frequently sold with Mars is not enough to optimize the whole confectionary section.
  • It Is Biased Against Substitutes: Since competitor products which can substitute each other are rarely seen in the same basket, market basket analysis fails to identify the correlation between them, which is, again, a major drawback in organizing or managing a complete category.

Although there exist other analyses which can support category management decisions, none provide insights as deep and complete as PNA. Without needing to conduct external research or invest heavily in technology or human resources, retailers can easily perform PNA and start benefiting from the insights gained in a matter of days.


A simple five step approach translates raw POS data into category optimization actions:

1. Compile POS Data for Analysis: The only requirement for a basic PNA is the simple POS transaction data, with barcodes listed for each sales transaction. Based on the level of analysis required (which can be at SKU, product, brand, or product group level based on business objective), this data can be summarized as well. Next, this transaction should be transformed into a format, representing pairwise frequencies of products – i.e. product A, product B, number / % of transactions with both products, etc.

2. Run a Network Clustering Algorithm: Once the data is ready, an existing Social Network Analysis solution (such as SNA Forte, our own open-source solution for the same) or graph clustering solution can be used to perform the analysis (for a demonstration or recommendation on alternative tools, please contact us). The outcome from these tools would present the product networks – i.e. micro categories – as well as products playing key roles in these networks. For example, for a hypermarket, the below partial sample represents identification of two micro categories:

  • Healthy Baby Products: With organic baby food and skincare products linked together
  • Parenting Books: With baby and motherhood related books linked together


Product Network Analysis


3. Review Networks and Key Products: All network clustering solutions require a certain level of fine-tuning of parameters to identify ideal networks – meaning that this is a cyclical process, going back and forth a couple of times before selecting the ideal results. After each cycle, identified micro categories, as well as products playing key roles in them, should be reviewed and evaluated. The typical product roles we define in PNA are as follows:

  • The Core: These are the most commonly purchased products of each micro category. They appear very frequently in baskets containing products within a micro category, meaning customers buying from a certain micro category are highly likely to be interested in or purchase them. In the example above, ‘organic baby oil’ is the core product for healthy baby products. These products are ideal as aisle centers.
  • The Hook: Also referred to as category crossers and category connectors, these products are those which are most likely the first bought when a customer who traditionally purchases in one category begins purchasing from another. In the example above, ‘Organic Baby Book’ is the link product between the healthy baby products and parenting books categories, meaning that promoting this product to customers already buying from healthy baby products can trigger sales from the parenting book micro category. These products are ideal to place as danglers inside their correlated micro-category display areas (i.e. placing several of the above-stated books in the healthy baby products area).
  • The Expandable: These products do not relate well to any of the micro categories in the product portfolio. They create very low cross-sales opportunities and do not create synergies in the existing portfolio. These products are ideal for category rationalization activities (i.e. discontinuation), unless new micro categories are built around them.
  • The Staples: These products exist very frequently in baskets, independent of the micro categories. Their purchase is mostly driven by basic needs – examples of such products include bread, milk and water.
  • The Add-on: These products are almost always sold with a set of other products as they address a certain need. Their purchase is mostly driven by the purchase of the main product. Examples include purchases inside categories (like ice cream cones being purchased when ice cream is purchased), or across categories (like when a lighter is purchased when cigarettes are purchased).

4. Develop Micro Category & Key Product Strategies: Once all the micro categories and product roles are identified, these should be used in category management and marketing activities, including but not limited to:

  • Sourcing & Product Assortment: By focusing on more lucrative micro categories and critical products, retailers can increase their product portfolio effectiveness while keeping variety at a manageable and rational level.
  • Layout & Shelf Optimization: As PNA provides deeper insights into sub-categories and product group relations, it enables micro management of shelf space and store layout. Using PNA results, retailers can decide on which 20 products should appear in a section, which one should become the center of attention, which section should follow them, and which products should be connecting them.
  • Product Pricing: By pricing products in specific roles and micro categories more attractively, retailers can selectively increase customer loyalty and price perception while not decreasing overall profitability significantly.
  • Targeted Promotions: By identifying micro categories, as well as core and hook products, PNA enables retailers to identify the right list of products to promote to the right list of customers.  For example, while loyalty and category share of wallet of customers can be increased by promoting core products, cross-category sales can be triggered by the hooks.

5. Pilot Strategies and Deploy: As in all successful commercial initiatives, defined strategies should first be put to the test through pilots, and deployed in stages after fine-tunings based on findings.

What Next?

Once overall category management is optimized using PNA, the next step for retailers is to take a top-down approach, localizing product portfolio by store. Especially for retailers with stores in districts with dissimilar demographics and socio-economics profile, aligning product portfolio with local needs is a must have.

Tags : analytics, bundling, business intelligence, consulting, cross-sales, decision management, fact-based, intelligence, marketing, product bundling, product network analysis, retail, sales, strategy, test