Things we think about when we import data

By: Michael Diedrick on Jan 23, 2025

Tags: Process (9) Museums (7) Data, Search and Discovery Interfaces (2) Data Dashboards & Visualizations (1) Web & Mobile Applications (1)

A computer screen with data in a dashboard When we get a nice stack of data that we need to import, like Wordpress blog posts from an old site, we need to find out the scope of issues and work needed to make that data work. (Wordpress is famously bad at data, so when we get an import file, we're pretty much assuming the worst, and usually we're correct.)

Our purpose is to find the following answers:

does it have the same amount of data as was imported?
can we compare the output of any records that have unique or potentially problematic data?
are any anomalies detectible?
are there other checksums?
are there any overwrites where a later record is overwriting something earlier?
is it quick and easy to try again after finessing the import rules?
are assets randomly not able to be loaded or are there specific ones missing?

We start with an import tool, especially if it's XML or un-whitespaced JSON. It needs to have some features and abilities including:

ability to wipe and rewipe currently imported data
choice of potential import files or rules, often in a select or input box
totals number of records submitted
total number of images requested as import
total number of anomalies, with types and frequencies of anomalies
data-creator specific things like, if Wordpress, short codes
warnings if there are posts with the same unique key (for Wordpress, the slug)
posts statuses (like draft or archive)

Most importantly we need a viewer. This gives us the ability to see posts in situ on a page, or raw in a standardized data format.

Only upon building this can we even know the width and depth of the issues, and even when we think we know, we'll likely find something that makes us start all over again. (Thank you Fusion Tables + Wordpress!)

Interested in starting a project?

Blog

Things we think about when we import data