CSV is one of the most commonly used formats for transferring tabular data between systems. Modern data warehouse solutions, even if data is natively stored in another format, often use CSV as an importing method for legacy data, or as a bridge format to translate the data to the format of the new system. In our recent blog, we covered some of the many difficulties of data migration, focusing primarily on ETL (extract, transform, load) processes.
The bulk of problems lie in the transformation and loading of data, and it is this wheelhouse in which Flatirons Fuse resides, focusing efforts on one of the most common formats requiring transformation for a target system, CSV. Let’s look at some of the challenges 1 2 3 CSV imports and, specifically, loading and transforming data.
While CSV is a simple format commonly used by many systems and a common output for many systems, it is far from a well-defined and standardized format. In a previous blog, we covered some of the issues arising from this lack of standardization, including the numerous forms of delimited value formats that often get lumped in with CSV, such as tab-delimited and semi-colon delimited. However, different CSV (or CSV-like) formats are far from the only challenge.
Some issues arise with regards to different forms of data within the fields themselves, including how long string values might be handled, or the use of quotation marks to allow things like URLs or file locations to be imported. Not to mention characters from other languages the target system may not recognize. There are also situations where the old system and the new system might have completely incompatible (without a significant rewrite) forms of the same data, for example, if the old system contains simpler location information such as address, city, state, and the new system uses a string format that includes this information as well as latitude and longitude coordinates.
Validation becomes a key factor in determining the success of the import process in such situations. The target system may need additional coding to recognize the older formats and a means of conversion. The more complex the data, the more important it is that there are no errors. Even something as simple as a missing character can gum up the works if a long string value is involved.
The target system is not always equipped to handle the data from a particular source, or can only accept a subset of the data, particularly if the file is extremely large. When importing CSV files, the question of what to do with the data that cannot be handled can be a key issue – is it discarded? Is it required elsewhere? Do you need to split the CSV into multiple pieces to transfer it successfully? Do you need multiple templates to separate the data for different target systems? All of this needs to be discussed before migration, or properly defined by the target system and organization.
There are also hierarchical issues that can arise. Let’s say that the source system utilizes the concept of an organization, with multiple users underneath said organization. If there was a merger or some other event, there might be users under two or more organizations. Is it the same user under each organization, or do they have separate ‘accounts’? If the target system has no concept of an organization, it is not as easy as simply transferring the data underlying user accounts in a CSV format.
The potential incompatibilities mentioned above need to be discussed at length, particularly if the transfer of data is between separate entities with differing data strategies. Without cooperation and strong consideration of each piece of data to be transferred, and how it fits into the target system, the transformation process can be lengthy and problematic.
If there is not a great deal of opportunity for communication and planning, then the data to be imported needs to be rigidly defined and limited by the target system, preventing any confusion as to what is required. Our Flatirons Fuse CSV importer excels as a tool to help with this latter approach, defining templates for import, and strong validation tools to ensure that the data records to be migrated conform to system requirements. Flatirons Fuse will help you avoid the common mistakes made when building a CSV importer.