Category Archives: Research

Software Engineering PhDs in Sweden: Statistics and Trends

The rules and regulations for attaining PhD degrees vary substantially across different countries and topics. For instance, a large variety of differences in funding, tasks and duties, expectations towards publishing, and the overall duration of the PhD studies (and whether or not it is really “studying” or more of an employment) is commonplace.

In this blog post, I will briefly summarise how PhD studies in Software Engineering (SE) are organised in Sweden, followed by a summary of work we (Robert Feldt & myself) have done in the end of 2018 to obtain an overview of publication statistics for Swedish SE PhD students. This work was inspired by an earlier study done some time before that (with similar findings). You can find a summary of that work here.

One of the main reasons for writing this blog post (and for doing the work summarised within) is that, oftentimes, the academic model used in the United States is assumed to be the norm, despite huge differences to other countries. This post will clarify some of the differences in Sweden (which is, in turn, similar the situation in other Nordic countries).

SE PhD in Sweden

Obtaining a PhD in SE in Sweden requires three years of full-time research work. Additionally, this period is extended by another one to two years to obtain mandatory course credits and to do teaching duties, resulting in four to five years of PhD studies. Additionally, it is common to publish and defend a so-called “Licentiate” thesis after approximately half the time, sort of a dry run for the final PhD degree. Positions are fully funded, resulting in a salary that is somewhat lower than in industry. However, it is an actual salary, not a stipend or a “eat cheap pasta or stay hungry for five years” compensation. PhD salaries are standardised across Sweden. Differences might exist in the amount of course credits that need to be obtained, and in teaching duties. For instance, many PhD positions are funded by industry, which often requires certain time to be spent at the industry partner in exchange for teaching duties.

As an example, my PhD studies (starting 2013) took five years, including one year reserved for teaching and one year for obtaining course credits. I was typically a TA in two courses per year (responsibilities ranging from more or less only grading to handling the entire course almost by myself, depending on my maturity and the responsible instructor). Regarding courses, I had to obtain 60 course credits. Included in those were a number of mandatory courses, such as in teaching and pedagogy.

In the Swedish academic system, PhD students are more often than not treated as employees rather than students, having substantial responsibilities (e.g., in teaching, student supervision) and independence (e.g., to plan their coursework, plan research studies). Studies are followed up regularly, and it is possible to change the supervisor/advisor in case of conflict or changes in staff.

Publication Statistics for SE PhDs

As SE is a field with a high frequency of publication, and as most SE PhD and Licentiate theses in Sweden are compilation-style theses (published papers, plus an introduction chapter explaining how they tie together), publications play an important role during the studies. There are no official figures on what constitutes a “good” thesis, or how many publications there should be in one. This is arguably a sensible approach, as it largely depends on the studied topic, on the different venues, and on the coherence of the different publications in a thesis (many excellent papers don’t automatically make a thesis). However, PhD students often express the desire to get some kind of reference on these different factors.

To address this issue and to get an overview of the trends, we decided in the fall of 2018 to revisit existing SE PhD and Licentiate theses in Sweden with the purpose to obtain such a reference picture. To do so, we searched all compilation-style PhD and Licentiate theses produced by SE research groups in Sweden until end of 2018, of which we could either obtain a PDF or a printed copy. From the resulting set of 71 PhD and 51 Licentiate theses, we extracted the following data:

  • Publication year;
  • Total number of published and submitted papers included in the thesis;
  • Total number of published and submitted papers not included in the thesis, but additionally listed (“other papers, not included in the thesis”);
  • For included and listed papers, an additional breakdown into the number of papers submitted to/published in ISI-listed Journals, non ISI-listed journals, conferences (everything listed in the CORE ranking with at least ranking C), workshops, and other venues;

To obtain the list of theses, we relied on our knowledge of Swedish SE research, of contacts to the different research groups, and search in publication databases of the different universities. We included theses according to the following criteria:

  • The thesis is a compilation-style PhD or Licentiate thesis from a known SE research group, and the thesis itself is listed as contributing to a PhD degree in SE, or
  • the thesis explicitly states (in the abstract or introduction) as targeting SE, or
  • the thesis is based on / includes at least two papers that have been published in an SE venue (an ISI-listed SE journal or a SE conference, at least ranked C according to the CORE ranking).

We classified papers not clearly marked as published or accepted as submitted/in submission. Papers in the “Other” category were not classified as published or submitted, as it is rarely clear for this category, e.g., for technical reports. We counted tool demos, posters, papers in doctoral symposia, and in “work-in-progress” tracks as “Other”. Finally, we excluded tutorials and keynotes.

In the following, we summarise the thesis data in terms of descriptive statistics and bar charts/box plots. We focus on PhD theses, since these are most relevant for international reference. However, similar figures can easily be obtained from the raw data, which is published on Zenodo. Note that we excluded names of PhD candidates and universities/research groups in the raw data, as we do not want to encourage unnecessary comparisons between groups. All metrics are flawed in some way, and differences in publications are insufficient to anyhow indicate quality of the research groups.

Fig 1: Theses per year

As can be seen from Fig. 1, the amount of PhD theses published per year differs quite a lot, with 1-2 in most of the early 2000s, and up to 8 theses in 2011, 2013, and 2017. This is important to keep in mind when looking at the statistics of papers published/included in the theses per year.

Fig. 2: Average included papers per thesis

Fig. 2 depicts the amount of papers included on average in a PhD thesis, sorted by the year of the thesis. Interestingly, there are no clear trends over years, e.g., no visible increase in the amount of publications included in a PhD. Typical numbers range from 5.5 to 8 papers per thesis.

Fig. 3: Published papers per PhD thesis per year

Among the total papers included in a PhD thesis, not all are typically published as some might be included as “under submission”. Fig. 3 shows the average amount of published/accepted papers that are included in a PhD thesis per year. Again, there are no clear trends, with a typical number being 5, while 2001 and 2014 show stronger outliers (7 and 8).

Fig. 4: Submitted papers per PhD thesis per year

Analogous to Fig. 3, Fig. 4 shows the average amount of submitted papers that are included in a PhD thesis per year. Averages of over 1.5 and under 1 are uncommon, indicating that PhD students typically have between one and two papers under submission.

Fig. 5: Overall published and submitted papers included in PhD theses.

Taking all years into account, Fig. 5 shows the published/accepted and submitted papers included in PhD theses. The averages are depicted by the ‘x’ symbol, while the bold horizontal bars depict the median. Most theses are in the range of 4-6 published and 1-2 submitted papers. The highest number of published papers is 10, the lowest 3. Similarly, the highest number of submitted papers is 4, the lowest number is 0.

Fig. 6: Included published papers by venue

Breaking the papers further down by venue, we see that most published papers are in conferences, followed by ISI-ranked journal papers and workshops. A “typical” PhD thesis would have 2 published conference papers, 1 ISI journal paper, and 1 workshop papers. Clearly, these figures vary greatly between theses. For instance, there are several theses that include 3 published ISI journal papers, and only 1 conference paper. Similarly, there are theses with no (ISI) journal papers, 1 conference paper, and several workshop/other papers.

Fig. 7: Included submitted papers by venue

Finally, Fig. 7 depicts the included papers that are submitted, sorted by venue/type. It can clearly be seen that it is common to include submitted journal papers, with a mean of 1 submitted journal paper by thesis. This is sensible, since journal papers have a much longer turnaround time with unclear time lines for reviews. It is much less common to include conference or workshop papers under submission. For the “other” category, we cannot clearly tell what is being submitted, since it is not stated in the thesis. However, these might just as well be journal papers as well.

Summary/Discussion

Summing up, we see that a archetypal PhD thesis in Sweden has about 5 published papers and 1 under submission (typically a journal paper). This trend is fairly stable over the years, with no discernible trend that there are meaningful increases (“publication inflation”). However, it is also important to note that there are large deviations between theses. We see this as an indication that there is no overly-formal requirement for publication numbers, but flexibility with respect to factors such as the strength of the individual publications, their coherence, and the novelty of the topic.

PhD position in Model-Based Software Engineering

People - Designed by Freepik
(Designed by Freepik)

Interested in joining me to do empirical research in the area of Model-Based Engineering?

Information about the research topic
I am looking for a dedicated PhD student studying how software/system models are created in the context of Model-Based System/Software Engineering. In Model-Based Engineering, models serve as a way to improve efficiency and effectiveness of software/systems development. While the range of proposed modelling languages and techniques is substantial, adoption has in many areas been limited. One of the reasons for limited adoption is that engineers lack guidance in how to use models. As a first step, the aim of this PhD position is study in depthhow models are created, using both qualitative and quantitative methods.

Information about the department and the university
You will join me in the Center for Research on Engineering Software Systems (CRESS, https://en.ru.is/cress) at the School of Computer Science, Reykjavík University. The school covers a board range of research topics in Computer Science and is engaged in several national and international research projects, often interdisciplinary in nature.

Reykjavík University is a private university located in Reykjavík, Iceland. The university currently hosts approximately 3500 students, divided into the Schools of Computer Science, Business, Law, and Science and Engineering. All of Reykjavík University is located in a single building in one of the most beautiful areas of Reykjavík.

Qualification requirements
To be eligible for this position, you should have/be about to obtain an MSc degree. With a degree in Software Engineering or Computer Science, your expertise would be most closely related to the topic. However, if you are a strong candidate with a background in other empirical sciences you are very much encouraged to apply as well. In fact, having a solid empirical background could be advantageous to a background in Software Engineering/Computer Science. Specifically, I consider knowledge in any of the following areas an advantage: Model-Based Engineering/Software Modelling, qualitative and/or quantitative empirical methods.

Position summary
Full-time temporary employment (currently paid 365,000 ISK per month before taxes, roughly 3000 USD at current exchange rate). You are expected to finish your PhD within four years’ time. The position includes teaching duties of roughly 20%.

Application procedure
Interested applicants should send the following documents (in PDF format) directly to me via mail (grischal@ru.is):

  • CV
  • Two letters of reference (e.g., from academic supervisors, former bosses)
  • A 1 to 3-page personal letter where you introduce yourself and present your qualifications/experience/interests
  • Attested copies of education certificates, including grade reports
  • Bachelor and/or Master thesis.
  • Research publications, if existent
  • Links to software repositories with relevant projects, if existent

We will start reviewing applications as soon as they arrive and will continue to accept applications until the position is filled. We strongly encourage interested applicants to send their applications as soon as possible.

Questions?
If you have any questions, want more details on the position, or just check before sending in your application, please contact me directly!

Grischa Liebel
Assistant Professor | School of Computer Science, Reykjavík University
grischal@ru.is, http://academia.grischaliebel.de

Lorentz Center: In-Vivo Analytics for Big Software Quality

In the end of September, I attended the “In-Vivo Analytics for Big Software Quality” Lorentz Center Workshop in beautiful Leiden, Netherlands. Organised by Andy Zaidman, Jürgen Cito, Arie van Deursen, and Mozhan Soltani, we spent five days discussing everything somehow covered by this umbrella title. The idea was to “bring junior and senior researchers together to address how Big Data analysis techniques on runtime data could be used to address software quality in DevOps”. The schedule was packed with presentations from a broad range of research areas, many of them outside of Software Engineering.

Initially, I was a bit unsure how my research actually fits into this scope (I do requirements engineering and model-based engineering), but I decided to give it a go. Indeed, the organisers managed to get hold of a wide variety of participants, especially reflected in the keynotes.

If you don’t want the whole story, feel free to scroll all the way down to my summary/take aways.

Monday

Monday started off with Benoit Baudry discussing “Automated Test Generation in DevOps”. Benoit started off introducing DevOps and the different elements involved, and went then on to the testing part. A large part of this keynote was really targeted at conveying the basics of DevOps. This broad scope was kept for the remainder of the keynotes. Given the broad nature of the workshop topic, this was an excellent way to get an overview of different areas and interdisciplinary topics.

Monday afternoon, Mariëlle Stoelinga covered the basics of her research about risk assessment of computer systems. In essence, she introduced the basics of fault trees and fault tree analysis. While one or another workshop participant probably knew these things, it was a great way to give the majority of the participants an out-of-the-box experience (and, hopefully, some inspiration).

We finished the work part of Monday with a 1-hour speed-dating session. In this session, we were paired with a (pseudo) random other participant for 2 minutes at a time, introducing our research and the reason we were attending the workshop. While this left all participants pretty exhausted, it helped me to get familiar with some faces and research areas (I took the participant list with me and crossed of everyone I had met – this actually helped a great deal later to remember people).

For dinner, we headed out to Café Oliver in downtown Leiden (all paid for by the Lorentz Center), continuing the discussions. I ended up at the table next to Benoit Baudry, Magnus Ågren, and Gilles Perrouin, discussing everything from Swedish habits to actually some research.

Tuesday

On Tuesday, Asterios Katsifodimos, from the area of data processing introduced Stream Processing of Runtime Log Data. This presentation was focused on getting the message across that we should be doing stream instead of batch processing and that batch processing is actually just a special case of stream processing.

The remainder of the day contained four smaller (20-minute) presentations and the so-called Writer’s and Collaborator’s Workshop. Arie discusses the idea in more depth here. In essence, we were asked to submit a paper or a paper draft to the workshop beforehand (similar to a regular conference/WS submission). Then, other workshop participants were assigned to review this paper (again, as in a conf/WS). Finally, at the workshop, we sat down and discussed the papers with the authors present. Naturally, this encouraged “nicer” discussions than you would find at a PC meeting or in conference reviews. However, it was appreciated a lot by (I think) all participants due to the direct feedback on the work, and – for junior researchers – to get insights into how a PC meeting might look like. Strongly recommended – adopt and repeat at your Workshop/Meeting/Research retreat!

Directly after the Workshop program, we had a Wine & Cheese party at the Lorentz Center, continuing the discussions (and side-tracking into communism and other topics) and eventually heading off to the hotel bar for more drinks.

Wednesday

Wednesday came with a headache – among other things. Then, there were of course more points on the agenda.

Claire Le Goues discussed Automated Fault Localization and Program Repair, also touching on the news of the week: Facebook adopting automated program repair with SapFix. Sadly, I had a hard time concentrating, due to my short night and the pressure to have some slides ready for a talk right before lunch.

Before lunch, we had four shorter presentations on a variety of topics, including my talk on LoCo CoCo (using existing requirements and traces to create communication networks).

The after-lunch keynote was given by our very own Eric Knauss, talking about RE at scale, in the context of agile and DevOps. He discussed how agile is difficult to scale in large organisations and how RE integrates with agile practices. While I was involved in a large part of that research, it was yet again interesting to get it presented from a different angle.

As a last program point, we formed breakout groups on specific topics we brainstormed right before. As a goal, we had to come up with a 4-minute elevator pitch on a topic we would like to propose to an imaginary funding agency (Andy tricked us and did in the end not provide any funding…). I ended up in the privacy/security group, discussing for an hour topics around privacy in analysis of runtime data. Essentially, we discussed how runtime data could be shared across organisations/actors, while still preserving a certain privacy. This is a trade-off between the amount of privacy you ensure, and the utility you will get out of the analysis. While the technical aspects of this topic are well-researched (e.g., how to encrypt data or how to do secure aggregation), the mix of people in the group made this a very interdisciplinary and interesting discussion!

After the workshop program, we went by bus to a place outside of Leiden, getting on a boat for a dinner cruise. We cruised through different canals and lakes around Leiden and had a fantastic evening. I ended up chatting away with José Miguel Rojas on tenure tracks, South America, and other topics only remotely connected to the workshop 🙂

Thursday

Thursday morning, Alexandru Iosup from the distributed systems community came in to share his vision of Massivizing Computer Systems. The talk was very well delivered, but at times hard to understand (in my opinion). While very ambitious, I felt that many of the things Alex was asking for have been discussed for many years (“We need to care more about human aspects”, “We need to be able to assure a better level of quality”, etc.). We surely need those things, but I wasn’t quite sure whether the talk did a good job explaining how to get there. In any case, I liked that Alex was actively trying to get people on board (connecting the different keynote topics and encouraging cooperation).

Before lunch, we had yet another session with short presentations.

Andy Zaidman delivered the afternoon keynote on his empirical studies on software quality. He basically encouraged to question beliefs in Software Engineering (especially if we don’t know were they come from).

As a final point, we had a 1.5-hour session in which participants did live demos of tools they had developed. Jürgen Cito demonstrated PerformanceHat, a tool that augments source code in the IDE with (runtime) performance information and predictions. José showed us CodeDefenders, a game for learning mutation testing. This got all the participants to happily hack away, trying to defeat each other in coming up with mutants and tests killing them. Eric then demoed T-Reqs, a tool that integrates requirements into revision control. I believe there were other demos, but at this point I was so exhausted that it got difficult following (sorry if I skipped anyone here).

Thursday did not have an evening program, which should have given me some time to visit Leiden. Instead, I was stuck in the hotel room doing lecture slides – a side effect of squeezing in the workshop into a busy teaching schedule.

Friday

Friday had only one keynote and a wrap-up session, allowing participants to head home at lunchtime.

The keynote was again rather interdisciplinary, as Fabian Knirsch and Andreas Unterweger talked about the smart grid, and security/privacy in this context. The talk again explained the basics in a very comprehensive way, sketching an interesting application field that (most likely) requires much more rigorous Software Engineering practices.

We then finished the workshop with a quick summary and said our goodbyes!

Summary/Take Aways

It’s now almost a month since the workshop, and maybe time to reflect on what I took away from the workshop, what worked, and what might be improved in similar workshops in the future.

First, I really thoroughly enjoyed the week in Leiden. Despite  (or maybe due to) being seemingly a bit disconnected from the workshop topic, I learned a lot during the week, getting insights into a variety of new topics (lots of different updates on software testing, some basics in the problems encountered in Smart Grids, to name a few). Since we, as researchers, are (or should be) aiming to solve problems on a societal scale, I think broadening your own horizon, seeing some new challenges, and getting inspired by existing solutions in other areas can only be beneficial. This is a clear plus of having such a broad workshop.

At the same time, the broad scope made it hard to come up with clearer goals for the workshop. I do believe that the discussion sessions could have benefitted from more structure or a clearer direction. However, I understand that this is hard given the different backgrounds and interests of the participants.

Thirdly, only knowing a handful of attendees actually proved to be a good thing, since I ended up meeting lots of people, some of which I might end up collaborating with. This was again somewhat unexpected, as I am used to the typical conference setup, with sub-communities already knowing each other. In such a setup it is in my experience much harder to talk about new ideas and directions.

Finally, as a direct take-away, I really liked both the breakout sessions in which we had to discuss a research topic/agenda (even though it was short) and the Writer’s/Collaborator’s workshop. Both setups are definitely something I will adopt in future meetings/workshops, and urge you to do the same!

Team Teaching: Make it two!

In this small research summary of my 2016 article in Teaching in Higher Education (here), I make the case for team teaching, including PhD students in your lecturing! In our specific case, we teach software modelling. However, the concept might apply to many different lecture setups.

Team Teaching and Pair Lecturing

In team teaching, multiple teachers are involved in teaching a single course. Often, the format is to include different experts to teach different parts of a multi-disciplinary course. For example, I am currently taking a course on medicine for engineers. Here, lectures on different parts of the body, e.g., the cardiovascular system and the central nervous system, are given by different specialists in the respective areas.

Another case of team teaching is pair lecturing, giving a single lecture in a pair (or a team) of teachers. In this case, the teachers take turns explaining things and jump in when needed. The rationale is that multiple lecturers can give multiple explanations to a complex topic, thus facilitating understanding. Also, when one teacher is currently talking, the other teacher can observe. As most of us know from our student times (and from talks we have listened to at conferences), it rather easy to spot as a bystander when the audience is not following. In those cases, the bystander jumps in and gives another explanation that might help the audience to understand.

Pair Lecturing to Verbalise Cognitive Processes

When lecturing about software modelling, we often face the challenge that modelling is a cognitive process. Apart from design patterns, there are very few established principles on how software models are built, what they should contain, and how you construct them in a step-wise fashion. Essentially, the process of building an abstract model of something is mainly going on inside our heads. Making this process explicit to students is challenging.

One way to explain this process is, according to our experience, pair lecturing. In class, we exemplify the modelling process by constructing software models together at the whiteboard. While we built the model, we have to explain to each other what we are drawing. Additionally, we might have to clarify, if our explanation is insufficient, argue, if our approach is questioned by the other teacher, or correct, if we together reach the conclusion that a different solution is better. This discussion at the whiteboard is, essentially, a verbalisation of the cognitive process that would be hidden to students if we simply drew the solution on the board. Similarly, by just explaining our own solution, much of the discussion would be omitted.

Apart from verbalising the modelling process to students, a positive side-effect is that many of the questions students might have during class are already brought up by the other teacher. This encourages further questions and participation from student side (in our experience), and makes sure that questions are actually asked.

Teacher Synchronisation & Costs

Traditionally, pair lecturing has been criticised for its high costs. Clearly, including two teachers instead of one raises the teaching costs.

Our approach to lower these costs dramatically is to include PhD students instead of members of the faculty. As the main teacher is still present, the PhD student does not need to be as knowledgeable in the topic as the main teacher. In fact, not knowing too much might be helpful, as the PhD student will in this case ask similar questions as the audience would – he/she builds a bridge between the main teacher and the students.

Now you might ask yourself: “OK, so those guys just use their PhD students because they’re cheaper. That’s it?”. No. The use of PhD students in pair lecturing is far from accidental in our case. In the Swedish university system, PhD students have a certain percentage of teaching duties. Typically, this duty is spent on group supervision in labs or project work. This, however, means that the PhD students need sufficient knowledge to supervise. Furthermore, they constantly need to synchronise with the main teacher on course contents, progress, and similar issues.

Including PhD students in the lectures effectively removes the need for additional synchronisation. At all times, the PhD student is aware of what is going on in the lectures and knows what the main teacher explained. Hence, additional synchronisation meetings are not needed. Furthermore, there is a lower risk that the PhD student will give explanations in the supervision sessions that are in contrast to the lectures.

Finally, pair lecturing using PhD students facilitates teacher succession. In our case, we had a large staff turnover in the last couple of years. This turnover suddenly left a lot of courses without teacher. In the case of the software modelling course, the PhD student that was involved in pair lecturing knew about all course moments, rationale behind them, and could essentially take over the course, or at least easily hand it over to the new teacher. Even if the main teacher kept teaching the course, it was easier to handle issues such as sick leave, business travel, or other cases in which the main teacher could not be present.

Summary

To sum up the entire article: In our experience, pair lecturing (lecturing with two teachers at the same time) using PhD students as part of the lecturing team helps:

  • by giving students multiple explanations to a single problem,
  • by verbalising thought processes and discussions,
  • and by engaging the student audience more than traditional lectures.

Our course evaluations also clearly show that students see these benefits!

With respect to the course setup, pair lecturing helps:

  • by synchronising between the main teacher and the teaching assistants, effectively removing the need for synchronisation meetings,
  • by avoiding contradicting statements between main teachers and teaching assistants,
  • and by effectively creating a “backup teacher” in case the main teacher is not available or even leaves the university/the course.

The full article includes a lot more details, empirical data, and reflections!

LoCo CoCo: Use the full potential of your systems engineering data!

In the previous article, I summarised our findings on communication problems in automotive Requirements Engineering (RE). While our interviewees during that studies mentioned several problems, at one of the case company they also highlighted frequently how it helped them in their daily work to have a tool that connects different kinds of data, e.g., requirements, hardware design, logical (software) design. While this clearly helps them already now to understand how their data is connected, we got the idea that this might as well be used to help them understand how people are or should be connected. This is the idea of LoCo CoCo.

LoCo CoCo (short for Low-Cost Communication and Coordination) is the idea to extract social networks from systems engineering data. That is, infer the relationship between people from items and relationships in data. Consider the example where you have a requirement, a hardware design of some sort, and two issues in your repository (see figure below). The different items could relate to each other (e.g., the hardware design contains the requirement or one of the issue relates to the requirement) and could be located in different tools (indicated by the colours). Additionally, different people might have created and/or changed those items. Based on this already existing data, we can derive a graph that connects people instead of work items. This is, in essence, what LoCo CoCo does. Using those graphs can then help engineers to find experts in a certain area (e.g., finding people knowledgeable in a particular area).

Clearly, there is related work around in the community. Two of the approaches to be mentioned in this context are Codebook and StakeSource. Ours differs in a number of ways. First, we are providing details on the implementation, whereas related work is on a rather high level. Secondly, we focus on high-level artefacts, i.e., abstract work items without a focus on (software) development. Hence, LoCo CoCo aims at early communication in an interdisciplinary environment, where code might not or not yet play a central role. Finally, we made the conscious choice not to employ any machine learning or other techniques that could introduce false positives. Instead, we intentionally only expose data that exists already in the tools. This is to avoid any false positives, but also to uncover incorrect ownership data in the systems engineering data. For example, an engineer might be listed as an owner of a requirement even though he/she is no longer in the company. In this case, an engineer consulting the social network might realise this and update the data.

We piloted LoCo CoCo at one automotive company based on data in two tools: SystemWeaver (by Systemite) and Clear Quest (IBM). While we were, so far, not able to roll out the tool, we asked practitioners at the company about the potential and their impression.

Our results show that, while LoCo CoCo has great potential, social data in systems engineering tools is often outdated, simply because it is not used. Our interviewees indicated that this might change as soon as the data is used in a meaningful way. For example, if engineers see the benefit of using social networks to contact the “right” people, they might be willing to update social data. Similarly, it could be possible to introduce simple changes in the tools to increase the data quality. For example, currently changes on systems engineering data is not logged at the case company. However, this is a common feature in most modern tools and should therefore be easy to change.

Finally, we discovered a number of ethical issues using social networks that are, to our knowledge, not commonly studied in related work. We found that the social networks created with LoCo CoCo might cause wrong perceptions when used naively. For instance, engineers might tend to believe that a person with few/no connections in a social network is not important or not working hard enough. However, the lack of connections might be related to several other factors, e.g., low data quality, a lack of trace links in the person’s work environment, or simply the role of that person. Hence, even though we only use existing data, highlighting social aspects holds a number of challenges which need to be investigated further in future work.

Based on the initial prototype, we are currently extending LoCo CoCo to extract data from OSLC adapters, one of the bigger standards for tool interoperability. Additionally, we are implementing an open source tool based on Gephi, which will include the OSLC adapters and several design considerations outlined in the current article.

(This is a summary of the article published in Information and Software Technology. If you want to know details, plus more numbers and figures, you can read it here!)

Communication in RE: Where does it hurt?

As a part of my research in model-based requirements engineering, we conducted a case study a while ago to investigate how models are/can be used in automotive requirements engineering. While running the interviews at two automotive companies, we discovered that interviewees regularly talked about problems that occur during everyday work. What stood out to us was that they all mentioned communication issues frequently.
We decided to first investigate these issues further, instead of prioritising the original analysis regarding modelling. From our interviews, enhanced with an online survey among practitioners in the automotive industry, we extracted a list of seven problems in automotive RE – all related to communication and coordination:

  • P1: Lack of Product Knowledge: the lack of sufficient knowledge about
    the product in early stages.
  • P2: Lack of Context Knowledge: the lack of context information regarding
    requirements on low levels of abstraction.
  • P3: Unconnected Abstraction Levels: a mismatch between requirements
    on different abstraction levels.
  • P4: Insufficient Communication and Feedback Channels: lacking communication
    with other people within or across the organisation.
  • P5: Lack of Common Interdisciplinary Understanding: the lack of common
    understanding across multiple disciplines.
  • P6: Unclear Responsibilities and Borders: the lack of clear and communicated
    responsibilities between different parts of the organisation.
  • P7: Insufficient Resources for Understanding and Maintaining Requirements:
    to lack enough resources in early phases to get an understanding
    of the needs and to maintain requirements later on.

We found that these problems were the main issues mentioned by multiple interviewees in both our case companies and in the survey we conducted.

Clearly, some of these issues are related to a cost-benefit trade-off. For instance, P7, how to deal with a limited amount of time and money, is omnipresent in daily life and can only be solved by lowering the expectations towards certain quality aspects in parts of the requirements specification. Similarly, P3, abstraction gaps between requirements abstraction levels, can only be reduced by extra effort in both specification and maintenance of requirements (which would then inevitably lead to P7 again).

However, we also find a number of issues that are strongly related to the automotive reality. For example, both the lack of product knowledge (P1) and lack of context knowledge (P2) are related to the large number of sub-contractors in automotive engineering. Sub-contractors are typically very specialised to certain parts of the automobile (their specific expertise), leading to a lack of product knowledge in other areas. Similarly, the lack of context knowledge is typically related to this issue and to concerns regarding intellectual property (i.e., only making very limited parts of the requirements specification available to sub-contractors).

Based on these findings, there are a number of issues that, we believe, should be addressed in future work. First, there is a need for a process that allows for sufficient levels of uncertainty during early phases of RE. This is certainly not a new finding, but increasing speed of technological change makes this more and more important. Also, while I personally come from the ‘formal world’ and, ideally, would like to specify everything so that you can verify it for correctness (or what ever else comes to mind), it is important to accept reality, i.e., that not everything is certain and can be specified. Secondly, there is a need for an organisation structure that effectively supports interdisciplinary RE, taking into account the central role of software. While automotive companies are often ‘traditional’ and have a purely mechatronic background, software is becoming increasingly important. The companies need to be aware of this and adapt. In contrast, the software engineering community would be strongly advised to not ignore other disciplines. Hardware and mechatronic components in automobiles still have long lead times and do often not adhere to agile practices. This already causes frictions between the two worlds (see also our recent Paper on challenges in large-scale, agile requirements engineering: Kasauli et al., 2017, “Requirements Engineering Challenges in Large-Scale Agile System Development”, Proceedings of 25th International Requirements Engineering Conference).

(This is a summary of the article published at Requirements Engineering Journal. If you want to know details, plus more numbers and figures, you can read it here, free of charge!)