By: Michael Diedrick on Jan 23, 2025
When we get a nice stack of data that we need to import, like Wordpress blog posts from an old site, we need to find out the scope of issues and work needed to make that data work. (Wordpress is famously bad at data, so when we get an import file, we're pretty much assuming the worst, and usually we're correct.)
Our purpose is to find the following answers:
- does it have the same amount of data as was imported?
- can we compare the output of any records that have unique or potentially problematic data?
- are any anomalies detectible?
- are there other checksums?
- are there any overwrites where a later record is overwriting something earlier?
- is it quick and easy to try again after finessing the import rules?
- are assets randomly not able to be loaded or are there specific ones missing?
We start with an import tool, especially if it's XML or un-whitespaced JSON. It needs to have some features and abilities including:
- ability to wipe and rewipe currently imported data
- choice of potential import files or rules, often in a select or input box
- totals number of records submitted
- total number of images requested as import
- total number of anomalies, with types and frequencies of anomalies
- data-creator specific things like, if Wordpress, short codes
- warnings if there are posts with the same unique key (for Wordpress, the slug)
- posts statuses (like draft or archive)
Most importantly we need a viewer. This gives us the ability to see posts in situ on a page, or raw in a standardized data format.
Only upon building this can we even know the width and depth of the issues, and even when we think we know, we'll likely find something that makes us start all over again. (Thank you Fusion Tables + Wordpress!)