CBDA Source Data Domain: A Practical Guide

Once a research question is framed, the next temptation is to grab whatever data is lying around and start analysing. The IIBA Guide to Business Data Analytics calls this out directly: Source Data is a top-down exercise. You start from the context of the problem and determine what type of data must be used — not from whatever dataset happens to be available. This guide covers why the domain matters for the CBDA exam, its four tasks, a deep dive into planning data collection with a worked example, a sourcing checklist, and the traps examiners love.

Why Source Data carries real weight

Source Data is roughly 15% of the CBDA exam — and it is the domain where the Guide most sharply distinguishes the business analysis professional from the data scientist. Data scientists see datasets as a set of variables; the business analysis professional brings the insight to determine whether a dataset is useful within a business context, because they understand the meaning behind the data variables. That is why a well-structured analytics team deliberately combines both business and data science skills when sourcing data. If you arrive here without a validated research question, go back and read From Business Problem to Research Question first — Source Data exists to serve the questions framed in Identify the Research Questions, not the other way around.

The domain's tasks demand the most effort of any in the competency model, but the starting point is always the same: understand the problem context, then determine the data.

The four tasks of Source Data

2.2.1 Plan Data Collection. Before any data is sourced, analysis determines what data is most relevant to the analytics problem. The plan covers what data is needed, its availability, the need for historical data, when and how it will be collected, and how it will be validated once collected. The deep dive below unpacks this central task.

2.2.2 Determine the Data Sets. Review the data expected from each source and pin down specifics: data types, dimensions, sample size, and relationships between data elements. Decide which whole and which partial datasets to collect — an entire spreadsheet, or specific rows within it. Identify data gaps, where data doesn't exist or is missing due to errors such as a failure in the collection process. The Guide's signature tool here is the : (amount and size of data), (speed of generation and frequency of collection), (sources, formats, types), (trustworthiness — uncertainties and inconsistencies), and (the analytics must be driven by real, valuable business goals). Analysts also weigh cost versus benefit per dataset — ideally the team collects its own data from scratch to reduce external biases, but resources rarely allow it. A hard truth the exam tests: when the data required to answer it is too expensive to obtain. Data profiling and data sampling are the workhorse techniques.

Sourcing and Validating Data: Building the Right Dataset (CBDA: Source Data)

Why Source Data carries real weight

The four tasks of Source Data

Trending Guides

Story Mapping: The Big Picture Planning

Product Vision, Horizons & Strategy

How to Pass PSPO I in 30 Days

Recommended for you

CBDA Mock Questions: 35 Practice Questions Matching the Official Exam Blueprint

From Business Problem to Research Question: Framing & Validation (CBDA: Identify the Research Questions)

Task 2.2.1 in depth: planning data collection

The Source Data toolkit (2.2.5)

A data sourcing checklist

Exam tips and common traps

Key takeaways