Skip Navigation

A Socio-Technical Analysis of Socialization in Open Source Projects

Open Source Software (OSS) development is often characterized as a fundamentally new way to develop software. Past analyses and discussions, however, have treated OSS projects and their organization mostly as a static phenomenon. Consequently, we do not know how these communities of software developers are sustained and reproduced over time through the progressive integration of new members.

To shed light on this issue I observed and analyzed socialization in an OSS community. In particular, I tried to document the relationships OSS newcomers develop over time with both the social and material aspects of a project. To do so, I combined two mutually informing activities: ethnography and the use of software specially designed to visualize and explore the interacting networks of human and material resources incorporated in the email and code databases of OSS. My findings are summarized in the following paper:

"Socialization in an Open Source software community: A socio-technical analysis." Computer Supported Coooperative Work, 14(4), pp. 323-368. [PDF from Springer.com]

I looked at socialization from two perspectives: as an individual learning process and as a political process. My observations indicate that successful participants progressively construct identities as software craftsmen, and that this process is punctuated by specific rites of passage. Successful participants also understand the political nature of software development and progressively enroll a network of human and material allies to support their efforts. I also discuss how these results could inform the design of software to support socialization in OSS projects, as well as practical implications for the future of these projects.


The paper presents these results in more detail. This page has a different purpose: to describe the software I designed for this research project and, more importantly, to let readers directly interact with a demo version.

 

The OSS Browser

The Open Source Browser was built as a standalone extension of Warren Sack's Conversation Map system. The Conversation Map is much more than a simple social network browser: its interface provides an hybrid representation of activities in online social spaces by integrating social networks, semantic networks, and socio-linguistic networks. The Open Source Browser was designed to highlight and explore a different kind of hybridism: one that emcompasses social networks, software networks and, most importantly, socio-technical networks.

When pointed at an Open Source project, the OSS Browser extracts the email messages exchanged between the project participants as well as the record of their software programming activities (usually available through CVS - concurrent versioning system - databases; CVS records when a participant downloads parts of a project to work on, when he or she submits changes, etc.). It then maps simultaneously how the connections between people (represented by their email exchanges) and the connections between people and artifacts (represented by their CVS activity) evolve over time. Users of this "technographic" representation can then focus on patterns of relationships they judge to be interesting, and directly access the raw data (email messages and software code) for a richer, deeper qualitative analysis.

The examples below are based on 6 months of data obtained from the Python project. The software is directly accessible here. Note that you will need the latest version of the JRE for this application to work properly. You can get it at Sun's Java site.

 

The Browser's Interface

The application is composed of two main visualization panels, one control panel, and two control sliders (click on the thumbnail to the left for a larger view). (1) shows the hybrid network itself. Black dots are individuals (hovering with the mouse over them displays their nickname) while blue rectangles are artifacts (that is, more or less granular pieces of software code - more on this later). When two individuals have exchanged email during the visualized time period, they are connected by a black line. The shorter the line, the more messages they have exchanged. The same logic applies to artifacts: when an individual accesses a piece of software code in the project, he or she is connected to it by a blue line. The more the artifact is accessed, the shorter the line.

(2) shows the conversations or threads of messages that have been exchanged among all participants during the time period. Each little "glyph" is a conversation, with the initiating message at the center and trees of replies radiating from this center. The more dynamic a conversation is, the denser the glyph looks. This way episodes of intense conversational activity can quickly be identified.

(3) offers various option to fine-tune the display of the hybrid network - each of them is described further down on this page.

(4) is a time slider, allowing the reader to analyze activity in the project as it progressively unfolded rather than as a simple static screenshot. By moving the slider to the right, time moves to the future. Each indentation on the slider represents a month of activity.

(5) is another slider controlling the level of granularity of the artifacts displayed. Indeed Open Source projects often contain thousands of files, which if they were all displayed could make the visualization unreadable. These files, however, are also arranged hierarchically in the CVS database. With the granularity slider one can compact subunits of a project into larger units, or conversely explode one unit into all of its components. In the screenshot used here as an example the grain is the largest possible: only one artifact (the blue rectangle) is displayed, representing the project as a whole. Individuals connected to it are therefore those who have contributed to any part of the project, without distinction.

 

In this next screenshot, however, the granularity slider has been moved to the fourth level, and some of the project's subunits are now visible. The highlighted artifact, for example, is the "documentation" folder in the "source" directory of the project. This way different types of relationships and roles in the project can be identified. Note, for instance, how some individuals contribute to the documentation and not to any other parts of the project, while others are connected to a diversified set of artifacts, and yet another group of individuals is not connected to any artifact at all - but can be quite active as far as email exchanges are concerned.

 

To fine-tune the analysis, it is possible to isolate either the social or the material parts of the network by selecting the appropriate option in the control panel. In the example on the left, clicking on "social network" limits the display exclusively to the relationships created over email between individuals. Note the unconnected black dots: these individuals are social actors, but they did not exchange emails with anyone - they have relationships only with artifacts, which can be easily verified by clicking on "artifacts network" as the example on the right illustrates.

 

As was mentioned earlier, the lower panel displays an overview of the conversations that took place during the time period. Clicking on a conversation highlights its participants in the hybrid network (see example on the left). Double-clicking on the conversation opens a separate window showing the tree of replies in greater detail (first screnshot on the right). Each dot in the tree is an email message; clicking on a dot highlights the message sender in the participants' list, while double-clicking on the same dot gives the reader access to the original message (second screenshot on the right).

 

By moving the time slider, one can see how the relationships in the hybrid network evolve, as well as how existing conversations progress while others are created. In the example on the right, time has been scrolled two months into the future. Note how centrally located "guido" is in the social network, and how a tight group of individuals are now clustered around the artifacts. In the lower pane, it is easy to spot a few very active and dense conversations that have been added since the first month.

 

As time moves forward, the picture can become quite complex as the example on the left illustrates. At this point the reader can use other options in the control panel to simplify the display and extract potentially meaningful patterns. One possibility is to limit the display to high-strength relationships. In the first example to the right, a minimum strength of 10 has been selected, pruning those individuals who had only rare contacts with others or artifacts. In the second example, a core decomposition (a concept from social network theory) is applied to the display, effectively limiting the graph to tightly interconnected nodes. In our case this isolates a group of social actors who are connected to at least 10 other actors of the same group.