-
Notifications
You must be signed in to change notification settings - Fork 0
File System Based Approach
This document discusses in detail the reasoning behind Teflon's file-system based approach. Read this if you are in interested in why we've chosen this architecture.
Teflon is file-system oriented: it processes files, transfers files and it manages metadata attached to files. The fact that there is no database server that holds the system together is more than an implementation detail. There are several reasons why we choose this design.
Digital post-production is about large binary files and the best practically achievable performance especially regarding throughput. While databases are fast to search in indexed metadata, they are slower than traditional file-systems when it is about serving huge amounts of large binary files to processes. If we want to be as fast as we can, we can simply not afford to store media files as blobs in a database.
We could still store metadata in a database, but we would have to maintain the relations between the database and the file-system. The problem with this approach is that we need to maintain two things, and if the two source differ, we get into an inconsistent state, which is a possible source of faulty operations.
One solution is that we forbid direct access to files, so every change comes from the database end, which makes it trivial to keep track of changes. Unfortunately it would also mean that our system would be useless until any of the tools of a pipeline is missing. In fact we want exactly the opposite of this: we want Teflon to be useful as soon as possible. This is a strategic business decision that lets us develop the software more gradually, and also lets the post-production houses integrate our software more gracefully. So this direction is unfortunately was not acceptable for us.
The other solution could be that we somehow monitor the file-system for changes and update the database accordingly. While this is not unimaginable, with this approach we would lose one of the biggest benefits of a database server: its consistency. It is inevitable that in such a system there are moments, when the database doesn't reflect the state of the file-system exactly. Operations based on those outdated information would yield erroneous results. It's easy to see that these states of inconsistencies would get more frequent and would last longer as the number of files under management and the number of operations on them are growing. It is worth mentioning that Teflon is aimed at feature length films with heavy post-production needs, which in practice translates to millions of big binary files. Unfortunately this path also proved to be unacceptable for us.
Databases also have some other disadvantages for our purposes:
- In order to be usable we need two services to run: the database and the file-system.
- Both deployment and maintenance is harder since we need to manage two services instead of one.
- We need two kind of information transfer between devices, one for each service.
- Archiving would be harder since information would be stored and managed by two different services.
In one sentence the gist of our approach is to never store information at multiple places. Since performance constraints force us to store media files in the file-system, in order to keep them at the same place we must store metadata there too. So that's what we do, and we are trying to make the best out of it. For example we not only resist to store the same information twice, but we also keep metadata files close to the files themselves, which results in some really nice features.
It is fairly simple how we store metadata about files in Teflon. Metadata for files are stored in hidden directories named .teflon. Every file with metadata has a counterpart in the .teflon directory, which has the same name as the file, with the ._ extension added to it. Metadata about the directory that contains the .teflon directory is stored in the file named _.
files_with_metadata
├── .teflon
│ ├── _ # metadata about 'files_with_metadata'
│ ├── video_1.mov._ # metadata about 'video_1.mov'
│ └── video_2.mov._ # metadata about 'video_1.mov'
├── video_1.mov
└── video_2.mov
With this simple approach we not only solve the original problem, but we also get some really nice side effects for free. For example moving or copying a directory with mv or cp -r moves the metadata about both the directory and the files in it with itself without any extra effort. This also holds for synchronized folders between devices.
The ability to execute automated file transformations is one of the three cornerstone services of Teflon. The fact that we store metadata in files, makes it possible to use the same automation mechanisms to manipulate metadata automatically.
Archiving is also very easy with this organization. Just archive a folder with your favorite tool, and you will be able to restore the files with metadata with the same ease. Again for free.
Of course metadata still can be corrupted, but it is (hopefully) never the software's fault.
Like every design decision this one also has some downsides.
In contrast to folders basic operations on files like moving, copying and linking need special attention. If we want to keep the attached metadata we need to apply the operation on the metadata files as well. While we can achieve this by executing two standard commands, it is much easier and safer to use Teflon to do this, by the teflon mv, teflon cp and teflon ln commands.
We can argue that metadata is more vulnerable to corruption in this setup, since users of the system sooner or later will do some mistakes resulting in corrupted metadata. While this might be true in some theoretical sense, in our viewpoint this is mitigated by the fact that you can use all your existing file-system level tools on your files.
Another drawback is that we need to implement our own search facility instead of using a database's one.