In the last step, we downloaded all of our data and deposited into directories that store this source data, backed it up, and write-protected the files. Now that we have done all of that, it is time to start working with the data! There is only one problem: almost inevitably, the data do not come neat, tidy, and ready to use. Often, the data contain major problems and need to be constructed in order to be usable. In this installment, I will write about managing files for cleaning, constructing and storing datasets.
Blog
All entries categorized “advice”
Structuring Work: Data Cleaning and Construction, Laying the Foundation
Saturday, April 16th, 2011 11:37a.m.
Structuring Work: Data, The Foundation of Work
Monday, March 14th, 2011 3:50p.m.
After establishing where my root directory resides resides, it is time to actually get to work. As with any endeavor, success begins by laying a solid foundation and with academic work that begins foundation is our data.
The most fundamental skill to academic success is asking good questions and acquiring data to answer those questions. Yet, in quantitative research, that skill is useless without the ability to manipulate data into useful formats that are capable of answering the good questions. Data cleaning, construction, and manipulation constitute well over half of my work on major quantitative projects.
Structuring Work: The Root, Where it all Begins
Friday, Feb. 11th, 2011 1:02p.m.
In my last post, I explained the value of a directory structure: consistent file management structures a disciplined workflow that increases productivity. The magnitude of its importance was a revelation that occurred largely after graduate school as the result of starting a new job.
When I moved to start my new job, I needed to move my files to my new computer. In transferring my files, I realized that my work that followed my well-defined workflow transfered easily, while the work that didn't follow the workflow did not.
The contrast between the ease with which I started the well-structured work and difficulty getting up to speed on disorganized pieces threw in sharp relief the importance of maintaining a workflow structured by a consistent file management system. For those well-organized projects the only difference being on my new computer was that I began work from a different "root directory".
Structuring Work
Friday, Feb. 4th, 2011 10:04a.m.
When I say that one of the most important things that I did in graduate school was set up a directory structure and workflow for my files, I am not kidding. Reading theory, learning statistical methods, and writing literature reviews were all important. However, just as important -- though not nearly as sexy -- is setting up a file structure and working directory.
Despite how trivial it sounds, maintaining a well-designed directory structure not only provides a framework for files, it structures productive work.
Given how important it was for me, I will attempt to explain the directory structure that I developed. Let me begin by saying that I am not an expert at developing directory structures. There are experts in these matters. Though I had an interest in becoming an expert at file management, I was too busy trying to become an expert in what I was actually studying to have the time. I will lay out in an ongoing series of posts the basic intuition behind my posts, what has seemed to work (and not) with this system, and improvements I would like to make. I would, of course, be interested in feedback and or comparisons to what others do.
Learning from the Great One
Saturday, May 8th, 2010 1:48p.m.
"You miss 100% of the shots you never take." -- Attributed to Wayne Gretzky
I was reminded of this quote this week after I had a grant submission rejected. Although it stung, the criticisms were legitimate and, as one of my advisors told me, "rejection is part of the process." It was this comment that reminded me of Gretzksy's quote and realizing that, although it doesn't feel good to be rejected, it does mean that I made an effort -- I can't make a shot that I don't take after all.
This was a lesson that was hard to learn in grad school and I was fortunate that I had people around me -- advisors and more advanced grad students -- advise me that it is important to send things out. In fact, as I became an advanced grad student myself and subsequently took my post-doc, it is now something that I try to advise others about. As academics, we are perfectionists. As academics, it is good to be perfectionists, it is what gives us credibility and without that instinct we would not have gotten to where we are. At the same time, it is important to remember that things will be more perfect if we seek advice and help from others; this, too, is the essence of scientific inquiry.
Front Page
About
- Information about the purpose and topics of this blog can be found here.
Feeds
Archive
- Oct 2011
- Aug 2011
- Jul 2011
- Jun 2011
- Apr 2011
- Mar 2011
- Feb 2011
- Oct 2010
- Sep 2010
- Jul 2010
- Jun 2010
- May 2010
- Apr 2010
- Feb 2010
Categories
Tags
- advice
- architecture
- blogs
- built-environment
- cities
- data
- data-management
- data-visualization
- David-Kindig
- demography
- disorder
- gabriel-rossman
- gentrification
- grants
- graphics
- grocery
- health-policy
- immigration
- inequality
- Jon-Stewart
- kriging
- macros
- measurement
- National-Grocers-Association
- neighborhood-effects
- neighborhoods
- nutrition
- obesity
- orgtheory
- PAA
- peer-review
- personal
- population-health
- public-health
- rejection
- research-design
- research-process
- residential-mobility
- segregation
- Stata
- statistics
- strings
- suburbs
- teaching
- The-American-Prospect
- This-American-Life
- tips-n-tricks
- urban-policy
- whole-foods
- WNYC
- workflow
Miscellany
- The views presented here are solely and entirely my own, they do not represent those of my colleagues, employer, or any funding agencies which may support me.
- The writing on this blog is covered by a Creative Commons License (described here). Feel free to distribute or re-post with a link to the original content provided that it is freely available to others.
