When I say that one of the most important things that I did in graduate school was set up a directory structure and workflow for my files, I am not kidding. Reading theory, learning statistical methods, and writing literature reviews were all important. However, just as important -- though not nearly as sexy -- is setting up a file structure and working directory.
Despite how trivial it sounds, maintaining a well-designed directory structure not only provides a framework for files, it structures productive work.
Given how important it was for me, I will attempt to explain the directory structure that I developed. Let me begin by saying that I am not an expert at developing directory structures. There are experts in these matters. Though I had an interest in becoming an expert at file management, I was too busy trying to become an expert in what I was actually studying to have the time. I will lay out in an ongoing series of posts the basic intuition behind my posts, what has seemed to work (and not) with this system, and improvements I would like to make. I would, of course, be interested in feedback and or comparisons to what others do.
It would be helpful to describe the set up that I use since that, to some degree, dictated the decisions that made. At work, I use a Windows desktop with XP (I haven't yet made the switch to Windows 7, though I will when I purchase my next computer) that is set up to use remote desktop. The primary software packages that I use are Stata 11 SE for statistics, a combination of MS Word and LaTeX (using the MiKTeX distribution with TeXnicCenter) for writing, and ArcGIS 11 (ArcINFO level) for geographic information system analysis. I am proficient at using Python for basic programming needs, though I have much to learn in order to use it more effectively. When at home or traveling, I have a Dell Netbook that I essential use as a word-processing machine and dummy terminal to my desktop through the remote desktop connection. Finally, I use Mercurial as my version control software, but since I am a complete newbie, I'm not quite sure that I understand yet how to use it.
I view my directory structure as a way to both reflect and improve my workflow. This means two things. First, this directory system reflects the way that I work best and forces me to be organized on my internal system. That means that it might not be the best for you (for example, Gabriel Rossman has an excellent description of his directory structure that differs from mine). Second, the most important thing is to find a workflow that works for you (see Kieran Healy's excellent guide on this), that is sustainable and helps you get your work done.
Since the directory structure imposes discipline on my workflow, today I will lay out what have evolved as the important elements of my own workflow. If they resemble yours, then this might be helpful to you.
The first goal of the workflow structure is to make different aspects of my work easy to find quickly and, accordingly, easy to maintain in an organized format. There are several large areas of work that I do regularly that need to be represented in this maintenance format:
- Data development -- these are tasks that relate to the creation and cleaning of datasets. If I am fortunate enough to work on a project where there is a data manager that does many of these tasks for me, then this directory becomes a simple repository of the data. If, on the other hand, I am responsible for managing the data, I break the tasks down to three discrete tasks: source data acquisition, data construction, and data delivery/documentation.
- Projects -- these are tasks that will hopefully lead to conference presentations and/or published papers. More often than not they have a data management component, analysis component, and a writing/presentation component. This is the piece where I (should) spend the most time and energy.
- Reviews and comments -- Like any academic, I am a referee in the peer-review process and provide feedback for colleagues on papers.
- Proposals and grants -- The nature of my work is that I try to acquire funding through different sources for the work that I do. This is a directory where I keep documents related to the proposal to obtain funding and management of grants.
- Other writing -- I also write for other venues (like the blog you are reading!) besides the academic writing that goes on in my projects and proposals. I save those projects in this directory.
- Other -- There are several other categories that require their own space that go into other directories (it is never good to have a directory called "Other"). Those are often developed on a case-by-case basis (e.g., applications for jobs) as I need them or are generally for administrative tasks (e.g., expense reporting and a repository for my professional files).
The second goal is to create a regular workflow that reduces repetition across projects by imposing discipline on my work. This means two things. First, it leads to creating a common structure across different projects within the same type of task that makes writing code or drafts easier because the system imposes a formula for doing things. Second, it makes it easier to pick up where I left off on a project after putting it down for a while (e.g., after getting a manuscript back after review from a journal).
- The third goal is a corollary to the first two, and probably the most important: make my work reproducible. This means ensuring that I can reproduce results on projects, identify decision points on grants and drafts, and go back to old versions in a systematic way if I need to.
- Finally, I want a system that makes it possible to collaborate with others. Thus, it should be simple enough to describe to colleagues (and blog readers!).
In future posts, I will walk through the tasks outlined above to describe what I do -- and what I would like to do -- for each.