Why Building a CSV Importer is Difficult

author-avatar

Onboarding clients often means uploading their data into your system. This can be a client or contact list, financial data, or any number of other potential options. For tabular data, the vehicle of choice is usually a CSV or spreadsheet format. Unfortunately, there are numerous formats of CSV, and numerous ways to organize the data. When moving data between systems or organizations, the chances that the data from another organization will match your system schema and rules exactly range from slim to none. A CSV importer is the answer, but there are numerous issues your customers can run into when attempting to upload their data to your system. Here are just a few of the issues we had to consider when building our plug-and-play data importer, Flatirons Fuse:

1) Data Format

Despite relatively recent efforts at standardization, many different formats of CSV files exist out in the wild, not to mention other delimited-value variants, such as TSV, and semi-colon delimited CSV. If building a CSV and/or spreadsheet importer. It is important to clearly communicate the data formats your CSV importer will accept, and ensure that your customers are aware of any data format changes they will need to perform in order to make their data compatible with your system, and your importer. This includes any spreadsheet formats you might support, such as XLS or XLSX. 

2) File Size

CSV and spreadsheet files can include alot of information, sometimes thousands of rows worth of data. The capacity of most importers have generous limits, but can still run into problems with very large files. For example, the guidance for Mailchimp states a limit of 200MB (approximately one million contacts). They then recommend splitting up files into smaller pieces if this becomes an issue. To prevent problems with imports, and frustrated customers, ensure limits are clearly posted, and a fair bit under the maximum capacity to be safe. This will prevent instances where the import hangs, rather than fails outright, and provides no feedback to the customer attempting upload.

3) Headers

One of the most common issues when importing data via CSV or Spreadsheet is the problem of matching data headers. Your customer’s data is organized for their own system, not yours, and requirements are often different – for instance, let’s look at something as simple as a customer name. In your system, it might be one field including first and last name, and in another it might be two separate fields, one for first name and one for last name. Many importers split this information into first name and last name fields. If this is not clearly communicated to the customer, it can cause strong difficulties. If the logic behind your importer is flawed, it might assume that both names belong in the first name column or the last. It might not accept the file because one of the two required columns are blank. Clear guidance must be provided on required headers, as well as the changes the customer must make to make the import work.

4) Validations and Field Level Errors

If there is one thing that is universally true, it is that people are fallible. In a file of over 10 thousand contacts, the chances that each field will be correct is slim when moving data from one system to another. So it’s important that either a great deal of diligence goes into validating the data ahead of import, or that the importer is able to handle a degree of validation of its own, ideally both during the upload process, and on the back-end, should an error be detected after upload. Information about how your importer handles field-level errors (such as misformatted email addresses) and what issues will prevent upload can ensure a better user experience.

5) Transforming Data

When you are moving data from one system to another, it is extremely likely that you will have to transform some data. For example, one system might have “First Name” and “Last Name” as separate fields, whereas another system might have a “Full Name” field. It’s imperative to provide a way for data transformations to take place. In this example, a data transformation would combine First Name and Last Name to form a Full Name field, making the data set compatible with the system you are importing data to.

6) Character Limits

This issue can be twofold – the total number of characters allowed within a given field, or limitations on the characters supported, such as limits imposed on letters from foreign alphabets, or other special characters. Sometimes these limits can be related, as some alphabets and some characters, such as emojis, are more than a single byte of information, so it may not be a one-to-one relation, leading to much smaller character limits. Any such limitations should be brought to the attention of the user, lest they cause errors in the import process.

7) Missing Fields/Data

How the importer handles missing data can determine the user’s experience as much as invalid data. This is a validation issue as well, and can be mitigated by clear expectations laid out for the user, and by informing them how the data will be handled in the case of missing data – will the import fail? Or will the correct data be uploaded? If the latter, there must be at least a warning that some data was not entered. Such an approach would require validation on the importer itself. If you don’t want the onus to be on the user, validation is key for your solution. The less responsibility placed on the user, the better the validation for any issues, not just missing fields, must be. 

8) Spreadsheet Specific Limitations

CSVs can be read by spreadsheet programs without issue, but going in the other direction is a bit more challenging. One key issue you must be aware of is that while spreadsheets can contain multiple sheets, a CSV file cannot. Each separate sheet must be split into a separate CSV file. And chances are, if an importer can accept both, this CSV limitation will apply even to the XLS and XLSX files. 

Customer Experience is Key

Whatever your onboarding/data import solution is, customer experience is key. Any one of the above issues can cause a great deal of frustration for your clients, clients who are likely experiencing your product or service for the first time. This is a time when a negative user experience can lose a customer – if the customer can’t get their data into your system with ease, the expectation is that the product or service will be equally difficult. 

If your customers are experiencing difficulties in importing their data, or if you do not feel confident that your organization has the time, expertise or the means to develop a solution for the intricacies above, then Flatirons Fuse may be the answer you are looking for. Flatirons Fuse offers front and back-end validation, AI-supported matching for headers, full flexibility to match your organization’s unique data requirements, as well as the ability to blend seamlessly with your branding requirements. Consider using a solution built by experts, and designed to be simple and seamless.