What is a Flat File, or Flat File Database?

author-avatar

Chances are, you’ve worked on or with a flat file in your day to day life or work without even realizing it. The simplest definition of a flat file is a plain text file, with no relational information to other files or database information. This can be a plain text file, a tabular format such as CSV, TSV, or Excel spreadsheet, or a binary file. A flat file database is a database consisting of a single such table, which can be as large or small as necessary. Technically speaking, a simple list of names, addresses and phone numbers on a sheet of paper can be considered a flat file database. 

Flat file databases are likely some of the first computer files ever created, as it can be assumed that the punch card files created by Herman Hollerith for the US Census Bureau in the 1890s did not have cards indexing other cards, or otherwise relating individual records to one another. CSV files and other flat file formats have been in steady use in the decades since, seeing fairly little change over the years, because the simple nature of the files ensured both continued usefulness, as well as usability. The simple structure means they take up little space, and can be easier for programmers to reference.

Flat File vs Relational Databases

The key difference between flat file and relational databases is that a flat file does not contain any relational data. The records within the flat file database do not point to, depend on, or reference any other files within the database. This makes a flat file simpler to create, maintain, and in some cases to call upon programmatically. However, it also means there can be a great deal of redundancy, and the data more prone to error. On the other hand, a relational database consists of entities, attributes,and relationships. If an attribute or relationship changes, it is changed across the database. This ensures data integrity and consistency, but is much more difficult and time consuming to set up and maintain. 

Why Flat Files endure

Flat files are used for a number of reasons and in a number of ways in the modern world. It can be as simple as a large customer list used to interact with a company website. It could be a simple company directory. One of the key reasons for their longevity and popularity is their portability – because of the simplicity, they can often be read/translated directly by other systems with minimal effort. 

Flat files are simple to create as well. They consist of files, columns, fields, records, characters, and as a result are often easier to maintain and create than more complicated relational file systems. Many are human readable, and easy to create even without extensive knowledge of programming or computers (binary files can be flat as well, this would be a possible exception).

For example, let’s look at a simple table that could come from any spreadsheet.

 

First Name  Last Name Street Address City Zip Code Phone number
Brian Wilson 1234 Steven St. San Diego 91911 555-444-1132
Jimmy Page 3311 Zeppelin Drive Los Angeles 90210 321-123-1234
Lita Ford 3232 Robinson Ave Detroit 48206 234-234-2324
Nancy  Wilson 4545 Heart St. Seattle 98111 345-678-9090

 

If you were asked to fill in the last line, you would have no issues extrapolating how and in what order to fill in the information, provided you knew the information to be added.

The simpler the file, the simpler it is to run a query against it. Columns in a flat file are restricted to specific data types. Each line forms a record. Delimiters ensure the data format is limited to fixed maximum width, or specific number of characters. They also help to make it easier to find different fields within a record. This makes it very easy for an application to call on specific data within the file. 

Limitations of Flat Files

One of the key limitations of flat files is the potential for duplication of data. There can be numerous entries for a single individual in a list of transactions for example. In a flat file, these entries are not correlated with one another, and cannot be, except by queries run against the file by another application. Along with this duplication is the potential for error. While it is easy to enter data into a spreadsheet or CSV file in a predictable manner, it is also equally easy for human error to creep in. In a spreadsheet with thousands of records, a misspelled name or improperly configured email address can be difficult to spot. This is one of the reasons that Flatirons Fuse covered the importance of validation for CSV data imports in previous blogs.

In addition, there is no relational integrity. For example if there are multiple entries for a particular person in a flat file, for example numerous transactions for a Charlie Brown, and his address or phone number changes, the data must be changed manually for each entry in the file. 

Flat Files in Use Today

Linear stores of NoSQL data, JSON formatted data, basic spreadsheets can all be considered modern implementations of flat file databases seen in common use. Indeed, our most commonly used operating systems (Windows, Apple, Linux, etc) make use of a series of flat file databases.

One of the most common flat file formats in use today is CSV, a topic we’ve covered well in the past. These files, along with similar formats such as semi-colon delimited and .TSV (tab separated value), are used to transfer data between disparate systems via import/export. Many of the most common databases today offer an option to import or export to CSV (Oracle, SQL, Cassandra, and many others). 

Summary

The simplicity and transferable nature of flat files ensure that they will not be going away any time soon. Spreadsheets are a staple of the business world, and CSVs used to import data to other databases, websites and applications. While there are challenges with the reliability and integrity of flat file formats, they can be handled efficiently and cleanly with the right tools. For importing data between systems, consider a white-label, custom configured CSV/Spreadsheet importer such as Flatirons Fuse, a Flatfile Alternative.