BayPIGgies: Developing A Product In Python

good evening welcome to pay piggies happy free-software day tonight speaker is Mike potaro we’re going to get a an overview of what it takes to develop a product in Python we can go late I hope that this will work for you ask questions and at the end we’ll have our normal random access and ma’am by mapping but it might be a little tight if anybody has got ideas for future talks please come and see me my name is Tim you’re on yeah I scream they got free ice cream in the back don’t miss out then in Jerry’s okay save one for me so good good evening everybody volume sounds good to me so let me know if I start droning off during the talk I have a tendency to let the volume drop thanks for having me here and thanks for Jim for helping coordinate this he went through the painful review of I think what was a hundred slides before we kind of pared it down to what I thought would be a reasonable talk so tonight what we’re what I’m gonna talk about is something a little less technical than I think some of the other talks we’ve had here it’s it’s more I think Jim build it right when he said it’s a soup to nuts discussion of an open source project I found it back at the end of 2005 to build a fairly substantial product based on based on Python so we’re gonna be covering some of the things we found both in terms of the project and Python and some of the infrastructure we built out in terms of timing I have a feeling it’s gonna take me almost an hour to get through all the slides it’s it’s pretty dense I’ll try not to rush I think in terms of QA a good approaches I’ll try and do things interactively but if we start going down rat holes well we’ll have to stop and keep moving unless we really want to pursue that path the downside to that approach is I think almost every slide in this presentation will eventually lead us down a rat hole because a lot of the stuff we touch are very much our decisions some of the things we did which may or may not be right I’m also interested in feedback on some of that stuff so I think a quick place to start here would be to discuss what the product is what the project was I started this project at the tail end of 2005 and it’s a it’s a data integration framework implemented onto the open-source model the goal of this project was to simplify the problems people have with data access and data transformation in the data warehousing world I’d like to call this ETL if you go more to the kind of the day-to-day practitioner world I think Greg Wilson call the data munging he wrote a book on it it’s people that are just trying to get data from A to B and you know a lot of us write these point-to-point programs but it would be nice if there was a better framework to help orchestrate this one of the things we did in this system in terms of both scalability and and some other reasons is we built it on a rest architecture which is very similar to way the web works and that both raises some challenge and gives us some some very nice capabilities the the goals of the system were to make it scalable in terms of performance and also in terms of connectivity the ability to build extra components and extra transformations so scalable and extensible and I put in parentheses here by an ordinary developer we wanted somebody that’s a reasonable Python programmer to be able to write a driver for an application or a database without having to jump through hoops the other point is it’s meant to be easier than writing custom code for every interface I think the first time you use it it’s not it’s not gonna be a lot easier than just going back to your old tools and just writing a custom piece of code but over time as you start to build up components and reuse things you can get a lot of leverage from the system an interesting decision is traditionally in particularly the data warehousing industry things are targeted at business users because they pay a lot of money for the software I felt that it made sense to try and target the product more at a developer because in reality developers are the people who end up doing this data integration work at the back end of the systems you know often a big transaction is done with an executive but it’s a developer that has to use the product and something else that’s really been a been coming up a lot more since we started this project is a trend where people are trying to access both web data and their local relational database data or file data so the general notion of having data at an

HTTP endpoint to pull it into either a data mashup or a web style mashup is becoming increasingly important I’m tossing up a block diagram here I apologize some of the fonts here a kind of small and I’m not going to try and beat this diagram to death what I wanted to do was touch on the nature of the product it give you an idea the scope of what we’re building more so then get into the internals of it the core of the system is a data server that has a series of components all written in Python which implement both connectivity to the outside world database read database write things like RSS read and write and also transformations there are aggregators sorters and so on and these are built as components inside the server there’s a special component in there I call the pipeline component and the job of the pipeline component is to let us assemble instances of these other components at runtime to build up these streaming transformations so that you can fire off a database read and then as the data becomes available pipe it into some sort of computation and then pipe it into a sort maybe pipe it into an aggregator and down the line so that’s pretty much the the job of the server at the lowest levels of the server there’s a data store based on RDF the Semantic Web technology and we use that to store definitions that a user has created of these various transformations and then there’s a front-end here which is an HDTV listener and dispatcher that pretty much takes requests into the server and then decides based on a resource definition how to dispatch it to the right component outside the server there are a whole bunch of supporting tools that help us deal with that server there’s a there’s a command-line interface that’s used to go into the server and kind of initialize the repository set up some of the security and permissions and things like that it’s not too sophisticated in terms of defining those resources there’s a client tool that runs in a browser this is actually flex ajax style client UI and connects to an intermediate server we call the management server and it uses xml RPC to talk to the management server the management server in turn talks to the back-end data server to load and save the definitions of those resources in addition to having that whiz-bang UI that lets you kind of draw things and look and create your pipelines there’s an alternative interface using Python directly where you can write programs using a module called snap script and you can actually create those definitions and save them on the server or you can actually fetch the definitions from the server modify them and saving back this was something I wanted them the product from the very beginning I’ve worked with a lot of GUI tools and there are some really really nice design environments but I’ve always felt that it’s also nice to have that other path where you know rather than navigating through many screens in a user interface sometimes it’s nice to just go in fetch the object you want to modify and make the changes to it and save it back you know sometimes that’s quicker and then another aspect of the system is we can actually have data servers that connect to other snap logic servers to deal with both data parallelism and just kind of a wide area data transformation problem so I think it’s summer either the reason I tossed up this picture is to give you an idea of the scope it was a pretty big project when we started off I have to admit when we started it wasn’t this big so it got bigger as we went along but it’s pretty audacious to try and do this from scratch as an open source project which raises the first topic I thought would be of worthy of some discussion so I cooked up this crazy idea that we want I wanted to build a system and the first thing is ok I need to get some programmers so you lean out tried to find some Python developers and I think what I found is hiring Python developers is difficult we’re still hiring if anybody’s interested I found it was very hard to find a Python skill set if I look for it specifically I did try the bay piggies list earlier this year I think it was January 2007 I kicked out an email I think it met the list guidelines in any case it triggered a discussion about whether or not job posting should be on the list it triggered a discussion about a name change I don’t know what it is about bait piggies it something to do with Kennedy I think there was a medalists discussion there was this there might have been a list pl– it I don’t know if that actually happened or not there was you know but at the end of the day I never got a reply back so that didn’t work I think another aspect is we’re a start-up you know we’re a small company you know we don’t have an established business model we don’t have an established customer base there is a risk factor I think what we’ve learned since then is it just seems to be better to hire good programmers don’t even get hung up on the Python thing look for good programmers look for good software engineers I think from experience good good developers can use any programming language there’s this religious aspect that I think we can convert them to Python most people get Python and if they’re smart they seem to get it pretty

quickly one thing to watch for is I think there’s takes a couple of months for the real Python east of mindset to sink in the people I’ve hired are very experienced and some of the guys I have or experienced in parallel programming Kiel issues aside they’re familiar with C they’re familiar with assembler and machine architecture they pick up Python pretty quickly it hasn’t really been a limitation also Python seems to have that secret weapon thing going on where it’s used a lot but people don’t actively advertise that their Python developers I’ve never figured out why so I think the the lesson we learned there was don’t go specifically after Python people just look for good developers and of course classic friend-of-a-friend referrals are the best way to get people on board so once we had a couple of people on board we started working on the system and we really started with a prototype there were lots of ideas we needed to prove some concepts there were some ideas in here that definitely challenged established norms in the way the data integration process is done we were modeling data as a rest resource we were streaming data through pipelines and one of the things we had to figure out is how do we store these definitions and make them fairly accessible so we started the prototype and I think was March 2006 we’ve been playing with some stuff before then and we started with Python we needed an HTTP server so we used cherry pie it was the one that most directly suit our needs in terms of making HTTP disappear so we could work on the real issues we wanted to leverage as much as possible from available code basically build upon the existing libraries and we didn’t have a strong bias towards the final decisions this really was intended to be a prototype and I think was successful we we proved the concepts we came up with answers for some of these tough problems and in the process we learned quite a bit about Python and the larger Python community I think since then the Python community is becoming more publicly visible you know the advocacy efforts are paying off and people are more aware of it we did make a decision I think is worth discussing which is why did we use Python for the prototype and they’re there a couple of reasons I think the main reason was the rapid development we wanted to prove the concepts quickly I think with Python when you start working on something from scratch you get working code sooner you play with it after you’ve played with it you kind of into it for a couple of days with running code and then you turn around and say I painted myself in a corner throw it away go back revisit the design you know based on what you learned so really you get to the point where you start identifying the essential complexity without getting hung up in the tool chain before I was using Python I did most of my work and see I’ll admit to a little Fortran and that I think the difference with C was you spent a lot more time thinking and designing than you did writing code then you finally got to the point where you started writing a code you debug the code and then you get it up and running and you realize I did this wrong the trouble is you’re out of time you know you did it wrong you screwed up maybe you’ll do it better next time but you’re out of time and you literally have to deal with what you have and make it work we didn’t have that problem with Python another aspect was readable code it was given that it was a start-up and we were trying to recruit a team I felt that having some readable code that people could look at and understand was essential to kind of build the team and get that community ethic together we also wanted the ability to study study what we’ve done and what we thrown away you know we don’t want to reinvent the wheel and it seems that that readable code really helps when you bring a new developer on board they think rocket pretty quickly they get in there to start looking at it things make sense to them I also had a hidden agenda here so I you know when we were starting the project I used to work with this guy named Tim and back in 1992 he was one of the early Python users and he said this is a really good programming language I ignored his advice for 15 years I really wasn’t using Python I’ve seen the error of my ways and one of the things I wanted to see was whether I could use Python to build a substantial system I’d used a lot of Python in terms of gluing A to B web scraping kind of back-end stuff but I’ve never really built anything substantial Empire on coming out of the prototype what I’ve done on this page is kind of tried to put together a list of the sort of core technologies we were using on what we found as we went through the process one of the things we needed was just basic libraries I think the concept of batteries included the Python standard library it has almost everything we need there there isn’t really much fundamental capabilities that you won’t find either in the language or in the standard library for an HTTP server I think it’s the other extreme there are a bunch of HTTP servers so we we saw cherry-pie as a kind of HTTP server and application framework there is base HTTP server in the library there’s the twisted framework a lot of other choices so we actually had to do extra work there to decide what to do on the database connectivity side

there’s the date Python database API pretty pretty good API doesn’t have modules for every database yet but it is getting better we had a requirement for a compact data encoding between these stages and our pipelines we needed to compress data down to a fairly small format we decided on pi asn.1 this is asn.1 is an old telco wire level formatting protocol it just it takes numbers and characters and it just packs them down into the minimum number of bytes it’s very efficient without getting down to compression and this is because we move a record oriented format in our pipelines we don’t really move like XML documents for RSS and atom which we’re clearly staring us in the face is something we would have to work with there’s the feed parser library it’s just a good library that parses it doesn’t write but it does a really good job of parsing for the resource definitions that we needed and the resource database in the backend we had done some homework and eventually we settled on using RDF and there’s a really good RDF implementation in Python called RDF live so that gave us a way of having both the resource definitions and converting them to and from XML it gave us some query functionality and it gave us a data store at the backend RDF lab at the back end it supports SQLite MySQL and Berkeley DB so between those three we got some pretty good choices for error and activity logging which you need to worry about in any substantial project there’s a logging package in Python you can debate whether or not you could write a better logger but the reality is it’s a pretty full-featured logging package so you spend a little time tweaking and customizing it and you have error logging we needed a command line tool I mentioned that snap admin utility there’s a command package in the standard library that will help you write a command line utility for plugins we wanted these components to be plugins into the server they’re not they’re currently not dynamic they kind of get loaded at startup time but we needed a way of having these plug-in modules it was basically a little bit of wrapper code around modules packages and the import statement we did not need to reinvent the wheel in terms of coming up with some plug-in mechanism it’s already there and it’s very accessible if you want to customize it for a configuration file the first cut was literally a Python class that we would import that had the various very very values in it and I actually kind of liked that but I was shot down and we later switched over to a more formal configuration file using config parser and that’s worked out again it wasn’t a lot of a lot of work to deal with I think the downside of using config parser versus import is you don’t have to parse anything if you import the downside to import if somebody screws up the config file you start throwing exceptions so there are trade-offs but both of those served as well so we found a lot of things that we could leverage we also found some issues in the prototype that were the essential complexity that we had to address one of the decisions we made was to write yet another HTTP server and there was a reason for it though in in the first cut we built our own HTTP server and the reason was we looked at cherry pie we looked at twisted twisted very interesting but the learning curve looked pretty daunting from where we were standing so we we weren’t ready to go down that path we looked at the base HTTP server we looked at paste we looked at mod Python mod WSGI and I think the big issue we were bumping into is all the existing HTTP servers implement HTTP as a request response but each of those whether it’s the request or response they typically treated as the header plus a message body and on the application side if it’s an HTTP GET the server gets the headers and it gets the message body passed into the request and the entire request is made available from the app we want it to stream data so we wanted to have a model where we could get the HTTP headers in the HTTP requests quickly scan those to decide what to do with the data and then take that data stream and hand it off to the component that was actually going to consume the data assuming there were no critical errors in the payload so that requirement was kind of unique and it more or less drove us to at least write her own HTTP server because it was a showstopper issue we ended up with I did a count here I went back into subversion and did a count we had about 440 lines of code 520 lines of comments in Doc’s string because the guy that wrote it really liked a comment as code and he really wanted to explain what the issues were that he was dealing with in there but it literally it took days not weeks we were able to get an HTTP server up and running and I want to make it clear we did not implement Apache and Python this is not a full-featured HTTP server it was a very limited purpose thing to help us address this issue of being able to take a request read the headers and then literally pass the data stream to another component enough in the code so that was an interesting experience the codes there we will I think at some stage replaced this with something else probably WSGI something else I think is

worth mentioning at least in the startup environment is kind of tools of the trade what the work environment is like in terms of basic tools we were building up some startup infrastructure so we’ve got a UNIX box we brought up you serve it with Samba shares for people that were running Windows IMAP server pop smtp we set up nailman for an internal mailing list so we archived our discussions subversion for the code we started off with track for a basic bug system we were using Moin Moin as our early wiki and most of this infrastructure got built up pretty quickly it wasn’t real hard and you can kind of deal with the care and feeding ongoing it’s not heavy administrative overhead on the desktop we’re still a mixture of Windows and UNIX I think whom boone’s who seems to be gaining ground among everybody I’m still demand reeve a user but there’s still a couple of guys that use Windows and we have no rule saying you will use Windows you will use the Mac we we don’t care you’re writing code use the tools that’s of you in terms of day to day activities there there are basically two camps there’s people like me that work at the command line would a text editor so it’s VI in my case or vim and Emacs we’ve got a group of people like that and then I think the other half of the team really likes this Eclipse pi dev environment I don’t do good wood IDs so I don’t really use Eclipse up high def but I can see there are some cool features with source code integration and so on on the Mac I really cut over to a macbook and i’ve been using komodo on the mac and i don’t know i don’t know how to put this it’s not that I don’t like it I just find that whenever I find fire up one of these IDs it doesn’t really make me more productive I find Komodo pretty useful for showing people code and giving demos where they want to get into the code because it gives you a nice layout and me they can’t deal with me just flipping windows and command prompts quickly so Komodo is good for that it also does a pretty good job of debugging threaded Python code most I think the alternative is wing and wing really can’t handle threading so I still have it installed but it’s not my main development tool in terms of development methodology and engineering practices on the process side if you put us on the on the capability maturity model you know we’re probably level 1 level 2 where we’re getting better and we just hired a really great a really great a software engineering director that’s that’s going to get us into into shape on the process side but we’re not real strong on the process level yet in terms of day to day prokta practicalities you know we were operating the Joel test how many people have seen the Joel test okay so everybody knows what I’m talking about I put us somewhere between 11 and 12 I so the half point the half point is actually two different topics I think the half point is in Joel says you should have the best tools money can buy we have the best tools that we can afford it’s probably a little bit of that and I think the other aspect is when it comes to usability testing we do hallway usability testing but it’s what the engineering team so that doesn’t always mean you get really good usability testing yet but to me the important thing about the Joel test is it hits the high points which are do you have a speck do you have a bug tracking system do you have source code control we found very early on just get that stuff in place you know though even in the very earliest days of the prototype I think the first time a zip file went through email saying here’s the code I was working on in the weekend subversion was up the following day I just didn’t want to do that anymore because you you even if you’re checking into a private branch or whatever you don’t want to have coordinator so we did that I think we did that right it’s it’s paid off because we’re not going back and rebuilding infrastructure now now we play a little Jeopardy so one of the topics in terms of Jeopardy is you know if you look at an answer and you say this is toxic waste the question is you know what is the question and we discussed this thing we call toxic waste which is open source bug fixes not open source bugs but bug fixes really what this is is the the crux of what I think drives a lot of the open source open source community you know we’re working with a lot of third-party packages we’re working with database drivers we’re working with track we’re working with we’re working with PI SQLite and so on it’s a lot of moving parts here we found bugs in some of those packages you know what are you gonna do you find a bug well since it’s all in Python you’ve got the source code we went off we fix the bugs and then you’re in this kind of dilemma okay we did a bug fix the bug fix looks good by the time you debug it you pretty much know what the fix is so you know you’re pretty sure you’ve gotta fix it may not be the final fix it may not be the one the code owner would have set but at least it’s probably a fix that demonstrates what would work so now we got this custom patch the last thing we want to do is keep it in the facility so we treated this toxic waste we basically take whatever we have and we file a bug report immediately in most cases when we file a bug report we’re filing the bug report with a test case I think in the vast majority of cases we filed it with the test case and defects no surprise here but on any open

source project if you provide a fix the probability of that fix being checked into the core code is much much higher test case and a fix helps the other options were you know if we had a custom patch version of something we don’t want to redistribute it there’s licensing issues that come up and the lawyers have to get involved in you don’t want to deal with that but more importantly you now have a some sort of fork off some other open source project you don’t want to maintain it we actually check them in subversion and keep them there just in case but we don’t want to maintain them we could monkey patch we don’t want a monkey patch either so we will save that for an extreme case but we really don’t want to do it I think is about ten slides ahead so we’re gonna get there I don’t think it’s quite one click but it’s real close I think it’s one click and a whole bunch of other things happen but and then another thing to mention and in like two years on this project plus you know five years of pretty steady Python using we really haven’t found a core Python book you know either in the language or in the standard library the closest thing I’ve come to that is where I have code that doesn’t work and I find out it’s some problem in the Python to three libraries that’s been fixed into four and fixed and to five and it’s really not an issue so yeah in the older versions we found issues but usually they’ve been long fixed by the time we’ve stumbled across them I mentioned this a couple of times but I think this is a good story around snap admin working with the our server we needed a command-line utility that was just there as a basic get into the server or do some basic operations before the whole framework was up so there was some server management stuff there was a couple of magic commands you need to do to initialize that resource repository and put credentials in it some user and security management stuff to bring up the server and also we wanted to have a command line import/export functionality for these resource definitions we ended up using the standard command package which is in the law in the Python library it’s fairly easy to use I think in terms of implementations I said here command prefers a noun verb syntax so that your commands tend to read resource import versus import resource there’s probably another case we made that I prefer to implement it that way but the way the command package works is it parses a command line it takes the first first token and enthis matches that dispatches to a class based on that token and then that class in turn can process the rest of the command-line so I think naturally it tends to fit into this model where you have an object that those all the resource commands be the import or export as opposed to an object that implements the import command for many other things also this isn’t a major user interface so the kind of awkward backward syntax was was reasonable in this case again we could always revisit it but we we wrote this thing in days not weeks so you know the first cut was up in like a weekend and then during the following week it got fleshed out with a whole bunch of commands and I had built-in help it had you know it’s got red line functionality so it has command line completion it has command line history I mean all that stuff came for free so it was worth using it the interesting part of the story here is when we first started talking about this somebody dusted off an old command-line utility they hadn’t see and kind of poured it over the Python Python so one of the guys was writing a command line parse oh that did a read line break it into tokens start dispatching commands based on that know basically the way you do and see and I think the moral of the story there is you know it helps to know what’s in the library so you’re really neat to print out a copy of the Python library reference keep it lying around every time I look at it I find something I didn’t know was in there and I’ve been through it quite a few times and I still keep going back I don’t know if they keep adding more or if it’s just me at which they do keep that anymore but more importantly I think there’s always something that’s been in there for a while I didn’t know about there’s a comment at the beginning of the reference manual that says keep it under your pillow this very good advice it really helps there was another aspect of the project that I think comes up here which is getting getting into more advanced Python I think Alex you you call this the the level where you go from getting from the basics to detach to the transcends last night but one of the things we we did was create this library called snap script and I think the motivations are a little tricky unless you really understand how snap logic works so I’ll try and explain this as clearly as I can within the system there’s one object which is the resource definition we call res def and it’s a pretty complicated object it’s the basis for all the resource definitions there’s a lot of RDF v and graph code in there and the interface of this class really does require some knowledge of the system it’s pretty much an internal

object the trouble is we were going to use this object to let people use that programmer interface where they can create their own resource definitions not a good idea the state’s thing was really complex so we really needed a simpler interface this is for programmers defining resources and just using predefined things in the system so we came up with this Python package we call snap script and it’s a set of generic hat classes that encapsulate the res def object so they give you a very simple interface to the resources the interface at this level is very much a gettin set property and effectively they hide the internal details of that res def and we did this at a deeper level by starting to use set adder and get a tour to do some interesting hooks on customizing attribute access there’s a there’s a section in the in the language reference manual there’s a chapter in there on customizing attribute at access and then there’s a similar chapter in the in PI in a nutshell when between those two and the cookbook and a couple of late nights I think I figured out how to do this but it took some serious homework but at the end of the day it wasn’t that hard to do I think the best way to describe this and also this is the baby so I do have to show some code here so best way to describe this is to compare that the two classes of code so this is a res def code sample and it’s going to create a database read resource so at the top we have a bunch of a bunch of imports at the top and then the first thing I’m doing here I should put line numbers here I’m learning next time I’ll put line numbers so the first thing I do is I call the res Def res def class and I call this get res def factory function I pass at the the dotted path name of the module that implements the thing I wanted in this case it was a DB read and it’s going to give me back a reference to R which is a specific instance of a DB read res def I then go through and I set some properties the thing has a URI which describes where it fits in our web space it has a description property so I call set property on description I call set property on title I call set property to give it a database connection and this is actually a URL reference to a connection it has a sequel statement so I type in I cut the sequel here it was much longer but I create the sequel and I assign that using a set property again saying the sequel statement I then create something we call a view and our terminology which is basically a record definition and I call the special list property so I go off here and I take this view which is in this case we use tuples for views it’s one of the things I love about Python for some of these things that are record definitions it’s just so easy to type these into code it’s even better if you have to scrape them off a web page that you can write a piece of code to scrape these things convert this definition and paste it into your code and or create it dynamically but again we go in and we set this list property this is the type of list property I’m sorry this is the name of the list property this was the type of list property which is a reference to list types from from the module and this is the actual view definition so it’s not a real clean interface in retrospect we could have done this better but it’s it’s not a perfect interface and it’s it is what it is right now we’ll come back and revisit it in the alternative universe with this snap script module the interface is much friendlier so you start off and at the top you basically import the snap script module and then to create a resource you basically you basically create a snap script resource by calling the constructor and again you do have to pass in the data path name of the component I couldn’t find a way around that time elegantly but once you’ve done that you get a reference to a very generic resource object in snap script once you have that object you manipulate it by assigning to members so within the resource I set props URI to the URL I set props description I said props title I said props DB connect I said props not sequel statement so they’re direct assignments to members and the object and behind the scenes we’re just using set a tour and get a tour to go off and do the right thing and what that right thing really really involves taking these strings and again making those set property calls we saw in the previous example but a user at this level never knows that some of the cooler things we could do when I talk about the view is when we assign two properties here to view we’ve set things up so the properties already has a member in a called output view which is one of those predefined view types so I can take by my list of tuples here and I can assign them directly to props dot output view dot output and this is where things got tricky because this is the view definition in Python this actually becomes the name of the view so when I when I type in output in the Python code very naturally that actually becomes the name if I went back to the previous

slide over here actually it’s a type of first time to the presentation this name that’s in here where it says input view in the in the alternative code that’s actually figured out by just using it as the as the name of the attribute python so you avoid a lot of the quoting in the code becomes much cleaner and the way we did this was the the props member on this object is hooked by set adder and get at her and as the assignments happen we go into some custom code that just does the magic behind the scenes everything under props is dynamic so that we we kind of put in that intermediate step in there called props to be the properties and one of the reasons is within snap script depending on the code you’re writing sometimes it’s nice to be able to add other attributes to the snap script object and we didn’t want to restrict that so I hung them off kind of a sub member just to put them at a level down but in terms of the property names here like title DB connects equal statement they’re actually defined by the DB read and when you’re doing these assignments we do check the names and make sure you’ve typed in the name right but it just seemed and we raise proper attribute name error exceptions so the intent was to let you just type the code without having to put quotes around things and just make the code natural but they’re still validated behind the scenes to make sure like for example if you try to assign to sequel statement and something that’s not a DB read it won’t work hmm oh I see what you mean they used to be I haven’t done it in a while so I did I think if you do derp they don’t appear but if you call stir to get a string representation stir actually makes it look like they’re there I don’t think I don’t think if you if you go in and do door I don’t think it’ll see them I’d actually have to go check the coder you can go check the code if anybody’s but I don’t think we properly squeeze them integer I think that’s the trade-off if I were to able to use the property function to get and set the properties that would have been a little cleaner so this is where we got a little deeper into some of the some of the Python Python aspects I mean this is this was hard research the code was not hard another aspect of an open source project people are gonna see your code so as we were moving through the project and we were playing with Python and doing all this research and stuff I think one day everybody woke up and said oh you know this codes couldn’t be public people are going to see it we’ve released it under a GPL version to open-source license and that means you know the codes out there on the web people can download it people can see it so it really drove a cleanup phase yeah right now in version 2 only we were on the line between version 2 and version 3 version 3 wasn’t final we just decided to go with and the trouble with the version 3 not being final is you don’t know what you’re signing up for so we just decided to go with v2 we’ll come back and revisit it later but the real driving issue here wasn’t just the open-source license per se although that was a hassle because we had to go put license tags and all the code it was really just the issue of you know it’s cleanup time and you’ve been kind of playing with all this stuff for a while and suddenly you realize small team we need to do some code cleanup so we needed to go back and revisit a lot of the module and package structure just an hour tree so we we ended up with top-level directories that have the data server the components there’s a utils directory which is common code that shared in the system and then there’s a separate directory for the snap script so this let us break up some of the dependencies and really kind of force this into a discipline it also simplifies development there’s only half a dozen of us working on this so it’s not like we have huge issues with people stepping on each other but the system is getting bigger and you have to worry about that something else is we we did put a lot of code in a lot of documentation as the code we as we went along we just weren’t consistent about a discipline and how to do that so there was good documentation but there were comments in there there were docstrings in there there were just huge long triple quoted strings in the middle of functions that arguably should be comments from a style perspective so we went back and tidied a lot of that stuff up what I found helped is we standardized on using epi doc to start creating the docs and we just set up as part of the build process we’ve run epi Docs we generate the API Docs for all the class interfaces and suddenly you start seeing problems like you see modules that don’t have a doc string you see modules that have a doc string equal to the subversion ID and so on and that the code owner did the documentation so whoever owned the code

it’s not like we were able to go get other people to go document people’s code we just tidied up our code so it’s time-consuming but it was worth it in terms of coding style we did establish some guidelines early saying these are going to be our coding style guidelines we did not follow them consistently we’ve now kind of gone back and we finally declared Peppe to be the standard for code and documentation with one controversial aspect which was the fight over 80 columns so I think we’ve bent the rules and there’s this loose agreement that you can use up to 120 homes I mean everybody has a 24-inch delt widescreen these days Dell was selling them for $600 at the normal price and they discount them 20 or 30% so everybody has a wide monitor I think the bulk of our code still fits an 80 columns but there are some cases where people wanted to go wider in terms of naming conventions that nevermind the column width you know naming conventions was a huge battle so the first earliest versions of the code everything in there was prefixed with qbf which was supposed to it was a code name for the project you know so everything had a qbf this qbf that in front of it we then decided we weren’t calling it qvf so everything was changed to be snap there’s no reason to do this you know I mean python has namespaces most modern languages have some sort of namespace concept so we ended up basically stripping that stuff there are some places where we use a prefix where we want to differentiate between something has the same name in the standard library just to make it clear but almost all the prefixes are gone now so for example the code was littered with this thing called qbf res def it’s now just res def it makes a lot more sense there was no need for the qvf on it we also I think we overdid it on the underscore underscore private in the early days there again remember I’m taking some C++ and Java guys and they’re coming up to speed with Python so there was a tendency to use oh I have a private and Python I’ll use it so everything became double underscore private I really don’t understand the major negative impacts of that but it just started looking ugly if nothing else it just didn’t look good so now we’re using the Advisory underscore private I do remember we ran into some things with the mangled names because of the double underscores but I don’t remember exactly what the scenarios were we also have guidelines for lower case camel case now and when we use upper camel case and so on so there are some guidelines think what it boils down to his classes are camel case names attributes or lower case names and we we seem to be leaning towards lower camel case for function names it’s it’s working and it’s consistent which is probably the big thing also we have a German working on the project who often corrects me that this isn’t really the way a German would assemble a name when you deal with the noun concatenation rule but the reality is some of the names in the code were really really really long and we’ve gone back and revisited those and tried to shorten them down I mean you can always get into debates over abbreviations but some of this stuff had really gotten silly and I can’t exclusively blame the German guy some of some other people do too we found imports get out of hand quickly again this is more discipline you know various files in the system would start doing imports and they were in kind of a random order we had at least one case where we had circular imports that were a pain to get out of there so we started following a discipline in the current discipline is that we we basically import the standard Python library modules then we import third-party packages things like PI SQLite PI mysqldb RDF live and so on and then we pull in our packages or modules and that that ordering we typically put it in three blocks with a blank line between each block that seems to help we haven’t got fanatical about ordering the individual lines that’s it’s really up to the individual programmer but we did find imports got out of hand very very quickly I think a last last point here on just generally coding and coding style there is a tendency to try and write Java or C++ and Python there’s this little blog post out here at dirt simple or called Python is not Java and it really does a good job of summarizing some of the things you see a lot in Python code written by Java programmers some of the things we bumped into one thing is this business of being able to add strings this is you know the holy grail for any C programmer is oh I can take strings and I can just glue them together so we found I mean these are extreme cases but there were instances where we had a string and we had strain equal string plus new value and just continually concatenate to a string it’s a very expensive operation because strings are immutable if you want to make a string of length ten ten X’s ten times X you know very simple way to do it in Python more importantly though it takes a while for people to get the hang of this no string or just the string join idiom to take a list of strings and

join them together in one shot so if you’re building up in a lot of cases or if you’re building up something like an XML document or their XHTML document where it has to be built in a custom way it’s more effective to just basically take the strings or generating dynamically append them to a list of strings and then join the whole thing at the end at the end that’s a good point I you’re not on the mic so I’ll repeat that the comment is don’t forget about c-string IO where you can get a file like object and you can write to that and assemble these buffers as well but I think the important thing to me was don’t do this you know even in smaller cases is just so wasteful it makes a lot of sense to try and treat things as a list naturally we didn’t have too much of this in the code but these are idioms that started creeping in that we we managed to get rid of very very early something that did creep into our code a lot was the getter setter model where people rate little functions to get a variable or a set of variable and the function does nothing other than a sign to a private member I mentioned the double underscores so you know the first thing was the double underscores and then the second thing was well this is private but I need to control access and now I have a getter and setter I don’t know where getters and setters started you know I came from the assembly language see high-performance world and nobody ever wrote getters and setters so I think this originated in the days of VB it got very popular and it kind of caught on I’ve never liked it but in Python you have property so if you really do want to hook and control access you can use property if you want to get fancier you’ve got getter and setter at or if you need to get very fancy so I think almost all our getters and setters are gone there’s a handful in the res def but they’re they’re not really getters and setters around single elements also everybody on the team now has a Python cookbook they’ve all read the they’re getting pretty doggy read but it’s a good way to start off and give people a solid foundation and in some of the idioms and this what I mean I another one I should have put on this list was David goodra has a talk he’s given on code like a Python ISTA which really focuses on common Python idioms and that’s been really really good for people it was he gave it a toss Khan as a tutorial or I think it’s on the Aswan site David good yer G ood ger I think he’s it was interesting meeting him I’ve known of him for years as the restructured text guy doc utils now back to Eddie’s topic I think a little bit of a segue here I’ve been talking about code and kind of day to day activities but I did want to shift over here and talk when I was talking to Jim he said you know testings always a hot topic so you know we’re building all this code and I talked about us kind of playing around it wasn’t quite as much wandering in the desert as it sounds but we did need to start worrying about testing in a build process so we currently have an automated build and test process it’s driven by buildbot which is another Python utility you notice a theme here we keep coming across these Python tools we keep using basically Bill bought it I think there’s a cron job that fires up every so often looks for checkouts or looks for check-ins and if there was a check-in within a certain amount of time it starts the build process you can also trigger it manually so I guess that meets the the Joel criteria of the one button build and the basic flow for the test is the first thing we do is we check the code out of subversion we then have to actually go through a real build step Python isn’t compiled but this flex stuff is so there’s some flex Cod air we compile the Flex code then we go off and we build an installer image we use bit rock for our installers so we essentially pick up all this Python code plus some third-party stuff we have to pick up plus effects stuff and we build an install image we then pump the install image off to a virtual machine there actually to hear there’s a Linux and there’s a Windows but you know for the sake of this flow there we push it out to a virtual machine we actually do the install and then after we’ve done the install we start running the unit tests against the code that made it to the target machine so we go off when we were in module level unit tests then we go off when we run integration tests and somewhere in that loop I’m not sure before after those two we actually generate the EPI Docs for all the code and API documentation it’s interesting how much stuff epi Docs will trip over on formatting you know it kind of forces you to keep your doc strings in shape while we’re going through that test process we also check code coverage we started off with coverage PI we’re now using fig-leaf with some customizations to collect the code coverage statistics and we do that while the test run testing

yes unit testing we’ve learned when we started off at the beginning here I think a lot of our aren’t early pieces of code that were really just little prototype saying here’s how I think we could do this they had a little piece at the bottom of the code that basically said if name equals main go off and run the tests and the comment run tests was actually in the body of the code there were no tests there there were some things that checked syntax and there were some things that demonstrated how you would use the API but they weren’t really test they were usage examples and that that caused two problems the first thing is those tests weren’t consistently run you know nobody was running him except the guy that was trying to come up with the example other people were copying them and saying oh that’s how I use it without writing the test and you know that the classic case of software rots either the code changed in the test didn’t or the test changed in the code didn’t but it didn’t work so now we’ve cut over that did that didn’t last too long but we’ve now cut over and we’re just using unit test framework in the standard library again so every module has a unit test that goes into test subdirectory and it runs the tests we’re starting to use PI mock a little bit to to set up scenarios for cases where we depend on something but it’s you’re always in that fine line of how much do you mock versus how much do you try and test in a real environment but we are using PI mock for some things I found with unit tests just you know forcing exceptions and simulating exceptions of pushing and Dad values to test so you can get pretty good coverage so a lotta we were doing more negative testing where we try to break things with garbage values and make sure the right exceptions get raised things like that and they don’t help you with coverage so I think you know moving forward when I had written these slides a couple of months ago I said you know going forward but I think as of today you know we we developed the unit test right up front we’re not a test-driven development where do we do them in the beginning but we are developing them much earlier on and we’re always checking code coverage during both unit testing and system testing yes know the question from the back of the room was was uh could I get into a little more detail on unit tests unit tests is a framework that makes it fairly easy for you to write tests assemble them into groups of tests and have them all run through a consistent test harness so it does a really good job of setting up a model where you can write code tests you group them and you basically write your tests within a class you have tests set up you have tests teardown and unit tests cake scare of handling the errors so you have functions where you can I think one of the ones I use a lot is you know is basically a raise assert where you can call a function in your code and you can basically assert that that function should raise a certain exception so unit tests will take care of a lot of that for you it will not figure out your test vectors or what you need to test what works in conjunction with that though very nicely is either coverage PI or fig leaf which collects statistics while the codes running it slows things down but it does collect statistics while the code runs and let you know which statements you executed in which you didn’t I was hoping I could dodge that question so the question of the back is you know is there a level is there a red flag I the ideal philosophy is a hundred percent test coverage on unit tests I think we’ve said we’re aiming for a hundred percent anything that’s not eighty to ninety percent is we’re paying attention to it depends on how important we think the class is there’s always a judgment call but a lot of the stuff at the unit test level is 80 plus percent I think Aaron you know I think a lot of that’s getting up 90 plus at unit test level it depends on the module yeah and it does go up and down but we’re aiming for a hundred percent I don’t think we’ll ever get there for unit test when I look at the coverage the trouble is as you get closer to a hundred if you miss a few lines you’re way off so the beauty of the fig-leaf stuff is it gives you a nice formatted report showing you the code and the lines that weren’t executed we’re in red so you can kind of seen you say yeah I really should fix that coverage but on the other hand you know it’s not gonna get tricked until the next time you hit a bug and then that’s gonna be that piece of code yeah Jim Oh repeat the question the question was where we’re using all these great modules like you know unit tests and and fig-leaf and bill bot and so on but we’re not using nose or what’s the other one hideout test I’m not at all familiar

with nose so I don’t know what benefits it gives us I think part of the answer here was at our unit test level most of the tests tended to be pretty straightforward in terms exercising code the bulk of our testing kicks into gear at a higher level which is more integration testing and I don’t know if nose will help us with this stuff I’ve heard of nose but I really don’t know what benefits it has oh okay oh okay we we could potentially be using it because this stuff kind of magically appeared one day so I’m gonna repeat the comment from the back room as it knows will kind of auto detect the tests and help you set a lot of this stuff up and deal with running them think it’s nos e nós e r8 knows so integration testing is our next level and this is really kind of system level testing for us way back at the beginning of this talk I was talking about how we were doing this data integration framework so we connect to databases we connect to files we connect to other servers we require a lot of infrastructure to to to exist before we can get a real test environment going up so after we do the unit tests we go through a series of integration tests where we try to deal with the connectivity matrix and execute different scenarios and while we run these tests we also verify code coverage one of the reasons we’re doing the coverage checking is with Python it’s possible to have a typo in code that’s never been executed and you trip at a runtime that was actually one of the open source bugs we found they were just there was a line that was munched in the code you know and it’s like okay we hit it at run time was like okay what do we do with this cuz you you get a syntax error and also unit testing I don’t know if this is a proven fact but at least it’s my opinion I know unit tests will never cover all our code even if it did cover the code it won’t cover all the permutations so that’s why we keep running the coverage stuff when we modified fig-leaf to do cumulative coverage so that we collect coverage stets during unit tests and during the integration tests but we also have this aggregate coverage across the two and that makes a difference because we can see if we didn’t hit it in unit test we probably hit those lines of code in the system test I don’t know what the status on those fixes are I think we’re we’re handing them back as soon as we’re comfortable with them on the client side I mentioned a client GUI we have this flex browser client that’s not automated it’s basically manual and it is a pain one of the difficulties there is it’s not really a forms based client so you really have to come up with scenarios and run through them in the client and it’s a lot of extra work so that’s that’s definitely one of the areas that needs more work interesting so drew up here I’m gonna try and paraphrase the question because it was long but what he’s saying is we we had the suggestion that you’re you’re more likely to get an open source bug fix a few submit a fix in a test case and then he was commenting on the statement that in one organization there was a rule that said that if you were fixing a bug then there was no test case you should write the test case that demonstrated the failure and do the fix at the same time I don’t know if her engineers want to go back to write test cases for her for code that they didn’t originally write but it is a good rule and we’re trying to get to that point I think Python makes this easier in my opinion I’ve done an awful lot of programming and see where I didn’t write a lot of test code somebody else worried about test code I find what Python were more likely to put the test code into the core code so I was switching gears here a little how are we for time Jim are we running later I think we got a couple of slides left okay so you know again switching gears getting away from kind of the test infrastructure one of the things we wanted to do was have a pretty good installer so we needed to deal with this issue of packaging and installs so we use bit rock as our installer it’s actually not open-source but it’s a pretty popular product among open source among open source companies and the bit Rock installer gives you an executable that runs on either windows or you and it’s supposed to do the rest of the installments just like any other installer there are weird config files that let you program the install the

bedrock installer when you run it will go off and download other eggs for other packages that we depend on directly from the cheese shop so we it’s actually it’s not the cheese shop anymore right it’s now the Python package index but anyhow it downloads the stuff and sucks it all in and that way we don’t have to worry about distributing older versions of those modules and we can build in the version dependencies we do have dependencies in a lot of other packages I mean our core stuff is sitting on top of take for example it sits on top of RDF live which in turn sits on top of either PI SQLite or mysql DB which in turn sits on either SQLite or mysqldb and that’s just one piece of the code then you have connectors to various databases and various files that depend on a lot of other modules so things get complicated we’re not creating rpms or Deb’s right now for the distribution we’re just doing this install image we might do that in the future the trouble is that the RPMs or Deb’s tend to be very system specific so I’m trying to cross my fingers and hope that somebody creates generic rpms for us and we don’t have to do it the biggest install problems we’ve run into I think the number one issue has simply been that we we want to see a Python interpreter installed on the system before we begin the install so we could fire off our bit rock installer but it’s going to go out and say what version of Python do you have and start running from there we didn’t want to redistribute and install Python so that that’s kind of an issue that we need to have a Python installation get started it’s not an issue for a developer it will be an issue I think when we get a little away from the developer and start introducing people to Python number two issue or number one depending on how often you bump into it is you know vendors that distribute really old versions of Python you know I think my biggest offender here was was Red Hat with enterprise linux for i mean the ship python two three it was a two three one so ok so they ship a fairly old version of python the problem is i mean there there are issues i don’t think we’re even running on python 2 3 there are some issues in the libraries that we can’t get around the you can’t easily upgrade that version of Python because you go upgrade that Python there’s a bunch of Red Hat stuff that depends on that Python then you find that Apache depends on it because it’s using my Python so you upgrade my Python you have to upgrade Apache upgrading Apache means you have to upgrade PHP and suddenly you open this big can of worms in dependencies so our solution has really been you know in cases like that my recommendation anybody is just leave the old python there put another version of python and use your local or someplace like that or run your own private version of python you know and that’s that’s been a good solution the downside of that is in some cases somebody needs to have extra permissions to do that install and again you know for the developer that owns their desktop machine it’s fine but as we start getting away from the core Python users there’s that little extra hump to say you got to get Python installed that scares people away the current Python installers are pretty good just from Python org I did an install on this macbook about two months ago just before Ross Khan it was the first time I’ve done an install without downloading and building from source code and it just installed so there were no issues there and I think that’s also true of the Windows installer now we have another issue that’s much stickier I listed it here is dependence on database libraries but I think it’s more dependency on just modules that you see so for example snap logic wants to connect to MySQL so there’s a Python MySQL package that’s easy to install but that package in turn has a couple of pieces written and see that have to be compiled and linked against the MySQL library so that opens a can of worms if you’re a Linux developer you know you got a Linux box you have all this stuff it just goes right through but you’ll find this a couple of curveballs you hit the first thing is you need to see compiler everybody on linux has one people on Windows don’t the second thing is for for cases like MySQL the Python MySQL library the code that’s in there the C code that has to be compiled has to be linked against the MySQL development libraries which again may not be on the machine that has the MySQL server so there’s there’s a little bit of complexity there we do whatever we can to check the prerequisites and we found as long as you get the right development packages and development headers installed everything works another aspect of this sometimes we bump into this on even on UNIX where this some of the RPM distributions split the special package they call Python devel which essentially includes some of the conclude files and a lot of these C extensions depend on having I think it’s think it’s mainly Python dot H you know the header files so you do hit that problem and I think that’s really going back to the the vendors and saying hey stop doing this just put all the Python stuff there and stop splitting it that would really simplify life on Windows we don’t have a solution for this what we do on Windows now is ship binaries for

some of those things like the MySQL libraries cuz there’s no way we can look at somebody and say you need a clock C compiler go install cygwin and give us a call back that’s not gonna work it does work but it assumes that the question from the back here is why doesn’t install cygwin work it works but i find that on the linux world if you tell somebody go install the Python develop a code you can be a call back no problem when you’re dealing with the Windows desktop a lot of people trying to stuff on Windows they don’t want to install cygwin they can install cygwin you know it tends to be a different environment yeah I mean there’s a deeper issue there yeah I think the comment from the back here it’s not true anymore anyhow that’s good point though that you can get it’s just the mingw the the plain distribution not included in cygwin ok so it was on Python announced I don’t know how to repeat the question yet so I think the the question that came up here really has to do with C compiler on Windows you know how do you get somebody to install a C compiler and I think the options are either cygwin or the one that Alex mentioned here which is the a good port of a for export of a mingw that’s a good compiler I think both of those work the problems we’ve bumped into in the area of C compilers is number one asking somebody to install a compiler just so they can install your product it usually kind of shakes them up a little bit what’s know you we want to do that so if it’s a developer it’s not an issue if it’s not really a heavy duty C developer they usually kind of why do I need to install C the other issue we’ve bumped into and I don’t know the current status on it is when you’re dealing with Windows the different compilers create binaries slightly differently and you ran into some subtleties where if you have MySQL runtime libraries that were compiled with compiler X you can’t compile the C extension where compiler why and make them work together very nicely and that’s been the bigger limitation on Windows mmm-hmm Soho the comment was sounds like a candidate for a virtual appliance it is a candidate for a virtual appliance we just haven’t been able to get the download side down small enough to make it manageable because the snap logic stuff isn’t that big but when you bring in the MySQL footprint and some of the other things they can get fairly big but we’ve actually played with that we actually have them lying around but they’re too big right now even compressed but I’m gonna follow up on in the future you had another question it depends on the audience I think the comment is it’s more of a social engineering problem and I think that’s true for people that have downloaded snap logic most of the downloads have actually been on a Windows desktop but it’s been more people playing with it than people looking to hack at it and kind of build serious integrations people saying oh this looks cool let me try it on Windows they tend not to have a compiler hey there’s another aspect that comes in here I think what we do right now in the Installer is we provide all the precompiled stuff so our installer it’s it’s not an issue for us in terms of the packaging and distribution it’s more something that we had to solve as part of building the install so we have to go through all this stuff stuff it’s it’s basically resolved if you take the download image it does everything you need it to do it just sticks all these binaries on it’s just more work at the back end for us on the install so I think my statements here about the biggest and saw problems these are things we had to deal with is we put together the install they’re not really issues we keep bumping into I probably didn’t make that clear so on the on the linux install we

basically just go out to the package index and we we we do the dereferencing and we just download the packages the only risk factor there is a site might be down and we might not be able to install it but that’s that’s not a big issue these days so the linux installs actually pull most of the other packages down and do the local compiles and links on windows we’ve bundled up all the binary pieces we need and included it in this one big install image yeah so alex is make of the point of the matter is it’s actually it’s running on the mac i have it running on mine we just haven’t done the install image and the secret there’s two secrets one is to remember the size of xcode and the second thing is it’s actually on the mac cds but they put it in a place where you have to remember to scroll down on the Installer to go find xcode to install it you figure out a palapa xcode on the top so everybody installed it and another issue what we were talking about installs was for example with windows we tried the windows vista install we we’ve done a test on it we were able to get through the install and it all worked it’s just every time we did something we had to you had to click through the entire sequence of this program wants to do this this program wants to do that it was just a very difficult install because we’re going out and downloaded this stuff from the network and do with this and doing all these pieces the the vista install just didn’t like that it worked but you had to click all the right yes or no so i think we’re we’re running tight on time here Jim should I go through I think I have two slides left here that I wanted to touch on so so in between all this product stuff the other trouble with open sources you know have to get a fight up so there was there was a whole separate process here which was building the the snap logic org site and again you know no surprise here there’s an awful lot of Python and open-source stuff out there so the the main snap logic dork site is running on Red Hat it’s just a machine at least in a data center its Apache and mod Python and I think it’s now mod WSGI I think we caught it over to mod WSGI Python 244 is running on that machine it’s running track 0 10 for track is not a version 10 yet I’m sorry it’s track 0 10 for with the MySQL database if you’re running track stay away from MySQL unless you’re ready to do some debugging it’s not quite there yet the sequel Lite stuff works we’ve hit a lot of issues with the MySQL stuff so again it’s kind of fix it ship it back to the track team mail man runs the mailing lists another Python package no problems with that the blog is WordPress which yes it’s PHP we initially started with some Python blogging software called news bruiser which is pretty nice package and it has this wonderful feature that it supports restructure text unfortunately nobody but me liked it so that was a that was a down and we settled on WordPress that’s fine it works the PHP stuffs in Apache and I think every time you install WordPress you have to touch the code so it’s not a big deal we’ve started using Django we have a piece of our site called packages that snap logic org where we’re beginning to collect content things like resource definitions and extra plugins for the system and we built that for we built that using Django and that was pretty painless so we’ll probably keep using more Django on the site for future tasks we’re currently using track is the bug tracking and we he together I’m not sure if the week he is gonna stand up to the loads and some of the security issues so it’s pretty likely we’ll cut to something else okay no we don’t have snap logic running on that server snap logic is more a data services ETL product so we’ve got it running on another server for some sample cases where people can consume data services but we’re not using snap logic to maintain this machine right now not yet not yet but as we come up with those things like one of the things we’re working on is some reports out of track and for things like that we’re using snap logic to do some of the extracts and do some of the statistics out of track but it’s not mission critical for us right now it will be eventually keep in mind that there’s one machine in the data center and there’s machines on developers desktop and there’s a mail server in the machine room so we don’t have a lot of data integration stuff going on yet but we will have in the future so youyou had made a comment from the back that version 11 a track will have

four 0.11 will have better security okay yes so Moin Moin has a lot of that functionality today in track that’s not there we’re using track right now and I think it’s kind of late weight but it’s starting to row up pretty quickly I think there’s enough people using it that it’s really getting hammered into shape very very quickly but again you know Moin Moin is an option on the website there’s a static website that just has some pages up there and as we’re going forward we have to decide whether we we’re definitely going to something more dynamic so the question is you know do we look at Django do we look at something like a zoop loan do we look at something else like a Joomla which is PHP but just get a better content management system out there I haven’t done any real research on this except I know it’s somewhere out in the future we may just go to the weekend mode and forget about the concept of a static website you know that that may work better on the project itself coming up in the future a lot of stuff to do you know one of the things we’re paying attention to already is Python 3000 support we were late with Python 2 5 we just we didn’t prioritize it we didn’t try against Python 2 5 till it was late mostly we found that the issues we bumped into were just some dependencies and having newer versions of packages and we found that when we did go to Python 2 5 we actually had to wait for some of the third-party packages to support 2 5 those little things like that so hopefully Python 3000 definitely a bigger upgrade but if we start paying attention to it upfront it shouldn’t be an issue on our server side I talked about this HTTP and the streaming stuff and on on that side we’re probably going to cut over to WSGI we’re currently testing it and seeing if we can make that do what we want in the early days we just wrote our own HTTP server because we needed to get rolling on the performance side we are definitely compute-intensive when we start firing off these components or performance is an issue for us we’re still mainly single processor which has prompted some people to kick out blog posts about this we do use threads in fact I think we use too many threads given the way the Python interpreter manages things I think we fire off more threads that we need to and we’re not getting parallel execution from them probably hurting herself a little but we are looking at parallelism in the broad sense whether it’s processed level probably more processed level in terms of taking HTTP requests and trying to dispatch them to independent processes potentially message passing using a pipe EDM or something like that but something’s going to happen in this area where we’re doing the homework I don’t think we’ve got all the answers yet yeah so alex is mentioning the the library where you can import process instead of threading processing and so instead of threading import processing when he meant which he mentioned in his talk last night and that’ll help yes it’s not part of the testing process yet the question I keep forgetting with this microphone I’m sorry guys the the question is is performance profiling part of the testing only to the extent that we’re looking at the timings on the tests we are cutting a release the end of this month and as soon as we we finish that release we’re gearing up to do a lot more kind of benchmarking and reference testing and performance not not even at the profile but just kind of core level benchmarking on the product to make sure that we get the performance up there but we haven’t gone in and done low-level profiling yet I mean at this point we we pretty much know where the issues are and I’m being honest when I say that we know where the issues are once we get the first cut of issues out of there we’ll have to start doing some profiling so parallelism definitely on our horizon whatever solution and I have a feeling there’ll be multiple solutions here one for the local machine and one for distributed across machines there’ll be different strategies also kind of an ongoing thing for us and the reason we we really built this open-source is the notion of connectivity you know we have some drivers for core databases but over time we want to address a lot of different data sources so that will be more connectors and more components in the pipeline I thought it was worth mentioning here the I always get up by the Pearl get beat up by the Pearl folks talking about Pearl dbi because Pearl DVI has a driver for everything including some things you wouldn’t even think there would be a driver for the Python database API isn’t quite there yet it covers the major databases but I think over time is gonna get fleshed out and we’ll probably end up doing some

work in that area as well so there’s a lot to do for the project and again before I forget we are we are hiring we’re looking for real Python wizards anybody’s interested it’s an ongoing project and you know by all means check it out download it you know and see what you think of the snap logic stuff there was a topic here it occurred to me I I think this is the last slide but there was topic here I didn’t bring up before we get into questions and I thought it’s not a Python thing and I really wanted to concentrate more on Python centric stuff but documentation just a whole different project to get documentation on what we have and how we do it is just an ongoing thing and it reminded me because the Python team is redoing the documentation I don’t know if everybody here knows it but there’s a plan in place to redo at least the tool chain oriented around the documentation and they’re open to suggestions and some of the things people might want to get involved in so questions I think yes oh okay so the other question is what’s the business model so right now we the focus for this year has very much been building out the product building out a community and really kind of get working on the install base you know if we can get enough users of this and really prove that it’s generally useful the the business model will most likely be a service as a support model very similar I think to MySQL where you know that the core product is free and if you want support and services and so on you need it for the individual developer that just needs to do what I’d like to call data munging I’m stealing Greg Wilson’s term but for people that want to do that I’m never gonna get any money from him anyhow I’m not even gonna Trump you know the products free for companies that have a lot of these it’s when you get into the enterprise there’s a huge amount of point-to-point data interfaces I’ve built a lot of them myself am I in my former life and you know they get out of hand very quickly so there are companies that are happy to use a framework and pay you for some level of pay you for some level of support and services to try and manage that so that that’ll be the basic business model you know there are other opportunities we could do things like host data services so on but I’m more interested in finding somebody else to use our stuff to host data services I don’t want to run the infrastructure it’s a commodity now between s3 you know and the you you’ve got s3 you’ve got ec2 I’m sure this I got Google base you know there’s so many places where you can host data I don’t want to get into that business so I don’t know if the mic picked up Shannon but he’s recommending our founders at work which is a summary of some other successful companies and how they started up with similar business models but the basic I mean I have this who’s getting away from Python but I have this belief that you know repeatedly every place I look the price of software is slowly but steadily going to zero on the license side not not on the manufacturing cost but on the under license side is going to zero software as a service is one way to try and kind of I almost think of it as an attempt to add value and justify the cost of that software but it is coming down appliances are another way if you give somebody 10 something tangible you know what’s easier for them to cough up a license fee but in reality if you look carefully at the financial records from a lot of the software companies you’ll find that they they sit you know 60/40 between services and license or 50/50 but if you look at the margins the margins are on the supporting services nobody makes money on software licenses especially if they’re in the enterprise market and they’re actually trying to pay the sales guys the Commission’s and look at the cost of sales so cause the software is being driven down quite a bit open-source has had an impact on that and other trends but services and support is really the basic business model that’s what you see among a lot of the open source companies you know most of them have really moved to that model finally justifies having buggy so yes it’s an interesting comment yeah I mean the past people felt guilty the the other aspect is if the codes out there the bug fixes get fixed I mean I’ve dealt with enterprise software where I hid nasty bugs that you know I can understand how they could slip through QA and a vendor could release it that’s just it happens the trouble was once you hit this bug defender didn’t want to fix it even I was paying up for support they would not want to fix it there was always a haggle over when do we fix it so it’s a different model in the

open-source space a lot more fun in some ways it’s just harder to wrap things up so Jimmy I think we have two minutes to nine here so were there any other questions we need to start wrapping up and doing a mapping random access and things like that sure uh okay it’s gone way back to like slide two I think the reason was more history the first release I’ll try and put the picture up here to make sense out of it I should have also mentioned these are on blog dot snap logic or guy kicked up a blog post like yesterday with the slides attached so historically the reason is we were trying to be very very pure about this being a very rest interface to the server we’re fanatical about that on the flex side we found that we had to create these resource definitions and flex and what we ended up with was the actual flex code does a lot of property set functions over here to this management server the management server actually maintains that object and then when you save it he pushes it back so really it became an issue of this thing was a lot of function call and you know kind of get instead property over HTTP which we didn’t want to put in the server that was one motivation it wasn’t an issue with flex in terms of capabilities it was probably an issue with us learning flex this was the the guy who developed this it was his first shot at flex and it was definitely coming up the learning curve and that part of the system we actually didn’t plan on a full user interface in the first release we ended up doing it as a in the first release and we we cut some corners to get that management server in there thanks guys