Steps by Step to Collecting the Right Data
Once you know your goal, it’s time to gather the data you need to answer your questions. But before you jump in, here are some key things to think about:
Where does your data come from?
Think about:
What type of data do I need?
Where can I find it & how should I collect it?
How much data is enough?
What format will be easiest to work with?
A little planning up front will save you time and confusion later. To give you an idea of what kind of data there is, let's start with where your data is from.
There are different sources of data, choose your data source based on your condition:
First-party data:
Data you collect yourself (e.g., you can get this from surveys, website data, business records).
✅ Most trustworthy and accurate.
Second-party data
Data collected by a partner and shared with you (e.g., a supplier or retailer shares their data with you).
✅ Still reliable, but make sure to ask questions about how it was collected.
Third-party data
Data from outside sources (e.g., most commonly public datasets, data brokers).
⚠️ Be cautious—it might be outdated, biased, or not fully trustworthy. Make sure to double check its accuracy, credibility and whether if it is biased
How much data is enough?
Think about how many data points you need. For example, if you’re surveying customers, how many responses do you want?
A larger sample size usually gives more accurate results, but can take more time and effort.
If you’re not sure, start with a manageable amount and see if you get useful insights. If you need, you can collect more later.
For research projects, you might need to calculate sample size based on confidence level and margin of error, but for most business needs, a practical approach is fine
Select the Right Format
Then choose a format that works with your data processing software and is easy to use, here are some common formats:
CSV (Comma-Separated Values): Best for starters. Simple and works with almost all programs (like Excel, Google Sheets, and most databases).
JSON or XML: Useful if you’re sharing data between web applications or need to store complex data.
Parquet, ORC, or Avro: Used for very large datasets or advanced analytics, but not necessary for most small projects.
Determine the Time Frame
Use existing data: If you already have historical data (like past sales records or customer feedback), you can analyze it right away. This is quick and often very useful.
Track data over time: If you want to see changes or trends, collect data regularly (daily, weekly, monthly) for a set period. This gives you a more complete picture of what’s happening.
Finally, decide when you will collect the data:
If you are looking for reliable free data, here are a few reliable websites to explore: