What I Learned at the ACENET Spreadsheet Workshop

Keerthana Rajeev
RTPP Lead Full-stack Developer

 

I recently attended an ACENET workshop on spreadsheet data management, and it honestly made me rethink how I use spreadsheets.

I used to treat spreadsheets like a flexible space where I could mix notes, labels, formatting, and data all together. The workshop pushed a much cleaner mindset:

If your spreadsheet is going to be used for analysis, sharing, automation, or long-term storage, it needs to be structured like a dataset, not like a document.

Here are the takeaways that stuck with me the most.

1) A spreadsheet should behave like a table

 

The workshop emphasized a simple structure that prevents most downstream headaches:

  • One header row at the top
  • One variable per column
  • One record per row
  • No merged cells
  • No extra title blocks inside the data area

When you do this, everything becomes easier: sorting, filtering, validating, exporting, and loading into tools beyond Excel.

Example:
Instead of having a fancy title block like “Project: Coastal Sensors” inside the sheet, keep the actual data as a clean rectangle that starts with headers and continues with rows.

2) Column headers matter more than we think

 

A big point was that column names should be consistent and machine-friendly, not just readable.

Good headers feel boring, and that’s the point.

  • Short and clear
  • Consistent naming
  • Avoid special characters and messy spacing
  • Do not hide units and explanations inside headers

Example:

  • Good:temp_c, salinity_psu, sample_date
  • Risky:Temp (°C), Salinity (PSU), Sample Date (dd/mm/yyyy)

When you export to CSV or load the data into Python, R, SQL, or OpenRefine, clean headers reduce breakage and manual fixes.

3) Invisible formatting can break your data

 

One surprising lesson was how often spreadsheets fail because of things you cannot easily see:

  • Line breaks inside cells
  • Tabs and extra spaces
  • Copied rich text from emails or Word

These can cause weird parsing issues later, especially when converting formats.

Example:
A cell that looks likeHigh Prioritymight actually beHighnPriority(two lines). Or a number might be” 12.5 “with extra spaces that break matching and grouping later.

4) Keep metadata separate from the dataset

 

We often add legends, notes, and explanations directly inside the sheet.

The workshop warned that this can backfire, because those rows can be misread as actual data during imports.

Better approach:

  • Keep your dataset as a clean table in one sheet (often namedDATA)
  • Put metadata in a separateREADMEsheet or separateREADMEfile

Example metadata that belongs outside the main table:

  • Who collected the data
  • How measurements were taken
  • Units
  • Known issues or missing values
  • License or sharing permissions

 

5) Dates are tricky and need standard handling

 

Dates are one of the most error-prone spreadsheet fields because software may store them as numbers and display them differently depending on settings.

The workshop recommended using unambiguous representations like:

  • A consistent standard date string format (ISO is great):2026-02-26 
  • Or separateyear, month, daycolumns

Example:
03/04/05is chaos. Different people and systems will interpret it differently.

6) Data quality is prevention plus detection

 

This framing was super practical:

  • Quality assuranceprevents bad entries using validation rules and dropdowns
  • Quality controlcatches mistakes using sorting, filters, and quick scans for outliers

It is much easier to maintain a clean dataset than to fix a messy one later.

Example:
If a column should only containYes or No, use a dropdown. Then do quick QC by sorting that column and scanning for weird values likeY, N/A,oryes(with a trailing space).

7) Exporting to CSV helps with portability

 

Once your data is structured well, exporting to CSV makes it easier to use across tools and ensures it is less tied to a specific spreadsheet program.

CSV is plain text, which makes it:

  • Easier to load into other software
  • Easier to share without formatting weirdness
  • Easier to store long-term
  • Easier to track in version control (like Git)

Closing thought

The ACENET workshop reminded me that spreadsheets are powerful, but only when we treat them with the same discipline we would apply to any dataset.

My biggest takeaway is simple:

Structure first, analysis later.

 

Le projet de plate-forme terminologique respectueuse (RTPP) relève de la National Indigenous Knowledge & Language Alliance (NIKLA-ANCLA), en collaboration avec les organismes suivants :
Abonnez-vous à notre Bulletin
Restez informé des dernières nouvelles sur la RTPP.
© 2025 Projet de plateforme terminologique respectueuse