Thursday, January 30, 2014

upon reflection. entrez e-utilities, genbank, python, and xml parsing.

A few days ago I started coding my own access to NCBI.  Generating the URLs isn't that big of a deal.  Learning when to use history is more of a problem.  I haven't heard back from them after requesting my "tool" and email be added for authorized access so I've not done much work.

While waiting I looked closer at the BioPython code.  It uses python's xml.etree.elementtree to build an element tree out of the XML the entrez URL returns.  Since NCBI is a trusted source that isn't a problem but the python documentation makes me weary of unknown XML sources.  The document on defusedxml goes into some pretty gruesome details about some simple XML that can cause a lot of pain.  I learned the term "Monkey patch".  Still, this "fix" doesn't fix everything people can do.

I think everyone expects people to make errors.  Programmers can unintentionally cause computer problems.  I doubt many of us haven't accidentally created a never ending loop.  Sometimes the problems one looks at are NP complete so one has little choice when traversing the search space but to let the program run for extended periods of time or just give up.  That being said, one can lament the intentional creation of pain for others.

I'm going to have to tread lightly in this area because I don't like thinking about protecting myself from others intentions even though I do it when I have to.  I like simple code that gets a lot of good done.  I fix issues when I happen upon them.  I don't like a bunker mentality it wastes too much energy for little gain.

No comments:

Post a Comment