Nightmare Naming Conventions

March 22nd, 2012 § 9 Comments

One of the tasks I seem to be spending a lot time thinking about these days is how to name files and structure them in the appropriate directories so that they follow a consistent logic. This is because my current research involves development of analysis pipelines of Next Generation Sequencing Data where the output file(s) of a program(s) is the input to the next. These processing steps allow raw data straight out of the machine to help answer the biological questions for which the experiments were run on the first place.

File and directory naming conventions may sound like a trivial thing to do but I have found that their complexity increases exponentially when many components are run. To illustrate my current approach to tackling this problem, I present here a simple example. Suppose a project (‘project_name’) that runs two programs, ‘program_1′ and ‘program_2′. Each time the pipeline is run, input files may vary and so I create a new ‘job_name’ for each run. I have come up with this directory architecture:

/project_name
/project_name/data
/project_name/data/job_name_1
/project_name/data/job_name_1/input_data_type_1
/project_name/data/job_name_1/input_data_type_2
/project_name/data/job_name_1/input_data_type_3
/project_name/results
/project_name/results/job_name_1/program_1
/project_name/results/job_name_1/program_1/output_1
/project_name/results/job_name_1/program_1/output_2
...
/project_name/results/job_name/program_2/output_1
/project_name/results/job_name/program_2/output_1
...

What would happen if instead of running 2 programs as I did above I run 5 or 6? And what if for each input data file I had replicates? What about maximising the number steps taken in parallel? You can start to see that the thing really gets complicated.

File and directory naming conventions is something that I am teaching myself, but any directives or systematic methods taught during my computer science student years would have come in handy now. In future bioinformatics lectures I teach I will definitively challenge my students to think about this issue very carefully.

Experiences with Personal Genetics: A Family Journey

April 17th, 2011 § Leave a Comment

The above is the title of a talk I will be delivering at this year’s OpenTech (21 May 2011),  a conference whose objective is to provide a forum of discussion for “people who work on things that matter“. Here is an outline of what I’ll be presenting:

Direct-to-consumer genetics testing is a new field of commercial activity that makes genome screening available to the general public. Test results are delivered on line via a password-protected account contextualized with state of the art inferences about the individual’s clinical features, disease risks and ancestry. Interpretation of results is limited to the information supplied by the provider and usually not accompanied with genetic counseling. Custodians of genetic information may not have the necessary skills to interpret results, let alone interpret results for others. This talk presents a personal journey of a genome bioinformatician acting as genetic counselor for his whole family, yet with no formal training to do so. Becoming custodian of genetic information for a whole family resulted in unanticipated situations and reactions that are hereby presented. As the utilization of these tests become ever more widespread, it is hoped that these experiences provide useful insights to new customers of genomic technology who try to understand their own genes.

For more information on this conference click on the image below.

Where Am I?

You are currently browsing the Lectures category at Manuel Corpas' Blog.

Follow

Get every new post delivered to your Inbox.