Exactly. To compound the issue, older content may not fit into our current schema, so there are other data issues that need to be solved. Adding hand-coded HTML to pages was quite common 10 years ago, so parsing that into the structures we work with today isn't always straightforward and difficult to automate.