MSc project proposals for 2004/5

Mohammad El-Ramly

 

1.   A Software Clustering Platform 

Prerequisites

Working knowledge of Java. Desire to work on an exploratory research project and develop and implement innovative ideas.

Aims of Project

Clustering software systems is the process of grouping software artifacts (files, classes, packages, etc.) together to identify subsystem structures and to produce a higher level of abstraction for software systems. It can be considered as an architecture reconstruction task.

 

This project aims to deploy some of the known software clustering algorithms to discover software structures in Cobol and Visual Basic software systems. The work will not start from scratch. Instead, the project aims to integrate exiting technologies and build the necessary adapters to get various tools to work together and complete the missing pieces.

 

First, the student will research and assess the applicability of clustering techniques in general and clustering tools in particular for COBOL (or VB) programs. The student can look at the various clustering approaches in the bibliography and attempt to provide an assessment of their applicability and the corresponding tools to COBOL or VB (for instance how close to reality the results are for COBOL, what they actually reveal for the code, etc.). S/he can also make a comparison on the applicability, of both techniques and tools, to real COBOL systems.

 

Second, the student will assemble a software clustering platform from exiting tools. The tools proposed as a starting point for the project are:

 

1.      ATX L-CARE reengineering platform for reverse engineering and producing higher-level abstractions of software systems.

2.      Bunch tool for code clustering which implements, in Java, some source code clustering algorithms (See references). The Bunch tool gets as input module dependency files that can be provided by L-CARE. However, L-CARE provides XML information on such graphs, while Bunch accepts its own format so the student will have to convert the CARE generated files to the ones accepted by Bunch.

3.      Some independent tool to visualize graphs, e.g., Graphviz to actually visualize the graphs produced by clustering (See references).

 

This project is exploratory in nature, the student (with consultation of the supervisor) can switch to better tools or even build his/her own if necessary. So, the specific tools used do not matter; what matter is examining the clustering process.

Learning Outcomes

·        The student will learn software clustering and will try out some of its algorithms and tools.

·        The student will learn how to set up a real software reverse engineering / reengineering environment composed of various tools integrated together to accomplish non-trivial task. The student will learn the challenges involved in this process.

·        The student will learn about the issues and challenges of exchanging inputs and outputs between different software engineering tools, that may use different representation formats for these i/o. The student will develop some adapters to overcome this problem, if needed.

Nature of End-Product

1.      A working implementation of a software clustering environment that integrates various tools together.

2.      An evaluation report on the application of software clustering technology to cluster Cobol and/or VB programs.

References

·        A Survey of some clustering algorithms - Brian Mitchell , Clustering Software Systems to Identify Subsystem Structures

 

·        Bunch tool for software clustering - http://serg.cs.drexel.edu/projects/bunch/

 

·        A Paper on Bunch tool for software clustering - S. Mancoridis, B.S. Mitchell, C. Rorres, Y. Chen and  E. R. Gansner Using Automatic Clustering to Produce High-Level System Organizations of Source Code

 

·        Graphviz - Graph Visualization Software

 

·        More publications on software clustering http://serg.cs.drexel.edu/publications/

 

 

2.   A Survey of Data Mining of Version Control Data

Aims of Project

The project aims to produce a survey and analysis of the current literature on applying data mining methods to version control data of CVS and other version control systems. The study is suggested to cover, among other things, the following topics:

o       Identification of the key projects, the important publications and the active researchers in this area.

o       Analysis and categorization of the types of data stored by version control systems

o       Analysis and categorization for the different users of these systems and their interests

o       Analysis and categorization of the data mining and other methods used in analysing the data stored by version control systems

o       Study and categorization of the available literature on the subject according t the purpose of the study.

o       Analysis of the results of these studies with a measure of their success.

o       Possibly, a comparison between different methods and their results either theoretically or practically by implementing some of these methods.

Learning Outcomes

Nature of End-Product

·        A survey dissertation on mining version control systems.

·        A possible implementation on some of version control systems mining methods.

·        Possibly, writing a technical journal paper on results of the survey to be published

References

This is a long list of possible references on this topic. This does not mean that every single one is to be used. Since it is a survey project, the student should take some time initially to identify the valuable resources and studies that should be subject of this study and ignore the trivial ones.

 

Related Workshops

-----------------

 

MSR 2004: International Workshop on Mining Software Repositories

http://msr.uwaterloo.ca/

Proceedings available at: http://plg.uwaterloo.ca/~aeehassa/home/pubs/MSR2004ProceedingsFINAL_IEE_Acrobat4.pdf

 

Some Related Journal Publications

---------------------------------

Daniel M. German, Using software trails to reconstruct the evolution of software, Journal of Software Maintenance and Evolution (JSME), Special Issue on Evolution of Large-scale Industrial Software Applications, Vol. 16 No. 6, p 367-384, 2004.

D. Atkins, T. Ball, T. Graves, and A. Mockus. Using version control

data to evaluate the impact of software tools: A case study of the

version editor. IEEE Transactions on Software Engineering,

28(7):625-637, July 2002.

 

Stephen G. Eick, Todd L. Graves, Alan F. Karr, J. S. Marron, and

Audris Mockus. Does code decay? assessing the evidence from change

management data. IEEE Transactions on Software Engineering,

27(1):1-12, 2001.

 

Todd L. Graves, Alan F. Karr, J. S. Marron, Harvey P. Siy: Predicting

Fault Incidence Using Software Change History. IEEE Trans. Software Eng. 26(7): 653-661 (2000)

 

T. Graves and A. Mockus. Identifying productivity drivers by

modeling work units using partial data. Technometrics,

43(2):168-179, May 2001.

 

Ahmed E. Hassan and Richard C. Holt: Studying The Chaos of Code

Development, Proceedings of WCRE 2003: Working Conference on Reverse

Engineering, Victoria, British Columbia, Canada, November 13-16, 2003. (Paper invited to IEEE Transactions Special Issue for WCRE 2003)

 

James D. Herbsleb, Audris Mockus: An Empirical Study of Speed and

Communication in Globally Distributed Software Development. IEEE Trans. Software Eng. 29(6): 481-494 (2003)

 

Philip M. Johnson, Carleton A. Moore, Joseph A. Dane, Robert S. Brewer: Empirically Guided Software Effort Guesstimation. IEEE Software 17(6): (2000)

 

Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue: CCFinder: A

Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code. IEEE Trans. Software Eng. 28(7): 654-670 (2002)

 

Audris Mockus, Roy T. Fielding, James D. Herbsleb: Two case studies of open source software development: Apache and Mozilla. ACM Trans. Softw. Eng. Methodol. 11(3): 309-346 (2002)

 

Audris Mockus and David M. Weiss. Globalization by chunking: a

quantitative approach. IEEE Software, 18(2):30-37, March 2001.

 

Dewayne E. Perry, Harvey P. Siy, Lawrence G. Votta: Parallel changes in large-scale software development: an observational case study. ACM Trans. Softw. Eng. Methodol. 10(3): 308-337 (2001)

 

Walt Scacchi. Free/Open Source Software Development Practices in the

Computer Game Community, IEEE Software, Special Issue on Open

Source Software, (to appear, 2004).

 

Related Workshop and Conference publications

------------------------------------

 

T. Zimmermann, P. Wei_gerber, S. Diehl, A. Zeller: Mining Version

Histories to Guide Software Changes. Proc. 26th International Conference on Software Engineering (ICSE), Edinburgh, UK, May 2004.

 

T. Zimmermann, S. Diehl, A. Zeller: How History Justifies System

Architecture (or not). Proc. International Workshop on Principles of

Software Evolution (IWPSE 2003), Helsinki, Finland, September 2003.

 

Michael Fischer, Martin Pinzger, and Harald Gall. Analyzing and Relating Bug Report Data for Feature Tracking. In Proceedings of the 10th Working Conference on Reverse Engineering (WCRE), Victoria, British Columbia, Canada, IEEE CS Press, November 2003.

 

Michael Fischer, Martin Pinzger, and Harald Gall. Populating a Release History Database from Version Control and Bug Tracking Systems. In Proceedings of the 2003 International Conference on Software Maintenance

(ICSM), Amsterdam, The Netherlands, IEEE CS Press, September 2003.

 

Harald Gall, Jacek Krajewski, and Mehdi Jazayeri. CVS Release History Data for Detecting Logical Couplings. In Proceedings of the International Workshop on Principles of Software Evolution (IWPSE), Helsinki, Finland,

IEEE CS Press, pp. 13-23, September 2003.

 

H. Gall, M. Jazayeri, and C. Riva. Visualizing software release histories: the use of color and third dimension. In International Conference on Software Maintenance (ICSM '99), pages 99-108, Oxford, England, Aug. 1999.

IEEE Computer Society Press.

 

H. Gall, K. Hajek, and M. Jazayeri. Detection of logical coupling based on product release history. In International Conference on Software Maintenance (ICSM '98), Washington D.C., Nov. 1998. IEEE Computer Society Press.

 

H. Gall, M. Jazayeri, R. Klvsch, and G. Trausmuth. Software evolution

observations based on product release history. In M. J. Harrold and G. Visaggio, editors, ICSM. (ICSM '97), pages 160-6, Bari, Italy, Sep. 1997.

IEEE Computer Society Press.

 

Marek Leszak, Dewayne E. Perry, Dieter Stoll: A case study in root cause defect analysis. ICSE 2000: 428-437

 

Dewayne E. Perry, Harvey P. Siy, Lawrence G. Votta: Parallel Changes in Large Scale Software Development: An Observational Case Study. ICSE 1998: 251-260

 

Philip M. Johnson, Hongbing Kou, Joy Agustin, Christopher Chan, Carleton Moore, Jitender Miglani, Shenyan Zhen, William E. J. Doane: Beyond the Personal Software Process: Metrics collection and analysis for the differently disciplined. ICSE 2003: 641-646

 

Mikio Aoyama, Katsuro Inoue, Vaclav Rajlich: Principles of software

evolution: 5th international workshop on principles of software evolution (IWPSE 2002). ICSE 2002: 657-658

 

Ahmed E. Hassan and Richard C. Holt, The Chaos of Software Development, Proceedings of IWPSE 2003: International Workshop on Principles of Software Evolution, Helsinki, Finland, September, 1-2, 2003

 

Audris Mockus, Roy T. Fielding, James Herbsleb: A case study of open

source software development: the Apache server. ICSE 2000: 263-272

 

James D. Herbsleb, Audris Mockus: Formulation and preliminary test of an empirical theory of coordination in software engineering. ESEC / SIGSOFT FSE 2003: 138-137

 

Audris Mockus, James D. Herbsleb: Expertise browser: a quantitative

approach to identifying expertise. ICSE 2002: 503-512

 

James D. Herbsleb, Audris Mockus, Thomas A. Finholt, Rebecca E. Grinter: An Empirical Study of Global Software Development: Distance and Speed. ICSE 2001: 81-90

 

Ivan T. Bowman, Richard C. Holt: Reconstructing Ownership Architectures To Help Understand Software Systems. IWPC 1999: 28-37

 

Richard C. Holt, J. Y. Pak: GASE: visualizing Software

Evolution-in-the-Large. WCRE 1996: 163-

 

Y. Liu, E. Stroulia: Reverse Engineering the Process of Small Novice

Software Teams, 10th Working Conference on Reverse Engineering, November 13-16, 2003, pp. 102-112, IEEE Press.

 

Davor Cubranic, Gail C. Murphy: Hipikat: Recommending Pertinent Software Development Artifacts. ICSE 2003: 408-418

 

J. Shirabad, Timothy Lethbridge, Stan Matwin: Supporting Software

Maintenance by Mining Software Update Records. ICSM 2001: 22-31

 

J. Shirabad, Timothy Lethbridge, Stan Matwin: Mining the Maintenance

History of a Legacy Software System. ICSM 2003

 

Jennifer Bevan, E. James Whitehead, Jr., "Identification of Software

Instabilities." In Proceedings of the Tenth Working Conference on Reverse Engineering (WCRE 2003), Vancouver, British Columbia, Canada, November 13-16, 2003.

 

 

Annie Chen, Eric Chou, Joshua Wong, Andrew Y. Yao, Qing Zhang, Shao Zhang, Amir Michail: "CVSSearch: Searching through Source Code using  CVS Comments"  International Conference on Software Maintenance, pages 364-373, 2001

 

John Champaign, Andrew Malton, Xinyi Dong, Stability and Volatility in the Linux Kernel, Proceedings of IWPSE 2003: International Workshop on Principles of Software Evolution, Helsinki, Finland, September, 1-2, 2003

 

Dirk Draheim, Lukasz Pekacki: Process-Centric Analytical Processing of Version Control Data, Proceedings of IWPSE 2003: International Workshop on Principles of Software Evolution, Helsinki, Finland, September, 1-2, 2003

 

 

 

3.   Software Architecture Recovery: A Case Study

Aims of Project

The project aims to study and apply the available technology for software architecture recovery and reconstruction on a software system. The purpose is rebuilding the architecture of the software and document it for the purpose of maintaining and evolving it. The activities involved are:

o       Review of the exiting literature on architecture recovery and reconstruction

o       Preliminary investigation of the target system

o       Choice of the suitable methods for the system in hand

o       Applying the chosen methods to the target system to recover and document its architecture

Learning Outcomes

Nature of End-Product

A dissertation on software architecture recovery

Architecture description and software technical documents for the target system

Possibly, writing a technical journal paper on results of the study to be published

References

Architecture Reconstruction Guidelines, Third Edition

http://www.sei.cmu.edu/publications/documents/02.reports/02tr034/02tr034.html

 

O'Brien, Liam. Experiences in Architecture Reconstruction at Nokia (CMU/SEI-2002-TN-004). http://www.sei.cmu.edu/publications/documents/02.reports/02tn004.html

 

Software Architecture Reconstruction: Practice Needs and Current Approaches

http://www.sei.cmu.edu/pub/documents/02.reports/pdf/02tr024.pdf

 

Architecture Recovery Using Conway’s Law _

http://plg.uwaterloo.ca/~itbowman/pub/CASCON98.pdf Software

 

Software Architecture Reconstruction

http://adam.wins.uva.nl/~x/sar/sar.pdf

 

SEMINAR: Software Architecture: Recovery and Modelling

http://www.bauhaus-stuttgart.de/dagstuhl/#program

http://www.dagstuhl.de/03061/

 

Architecture recovery of Apache 1.3 — A case study http://apache.hpi.uni-potsdam.de/document/documents/architecture_recovery_of_apache.pdf

 

 

Rigi Reverse Engineering Tool http://www.rigi.csc.uvic.ca/index.html

 


4.    Mining GUI Usage Data 

Aims of Project

The project aims to expand previous work on mining recorded traces of interaction between software systems and their users to software systems with Graphical User Interfaces. The process of mining software usage data involves:

o       Recording the actions take by the software system and the user in response to one another during the system-user dialog, i.e., during the usage of the system through its user interface.

o       Applying data mining methods (e.g., IPM2 algorithm. See references below) to discover patterns of interesting user activities in the recorded sequences of actions.

o       Analysis of the discovered patterns and using them for program understanding, reengineering, user interface personalization, re-documentation, reverse engineering, and similar tasks.

This process had been applied successfully to Web-based and legacy systems and the references below describe. The steps to be followed for this research could be:

o       Study of the available literature and available resources (Code for IPM2 is available)

o       Study of the available GUI recording technology and how it can be utilized or adapted for this application. This should result in selecting some GUI recorder or creating one.

o       Installing a GUI recorder on the machines of a number of users of a GUI based system and collecting traces of their interaction with the system.

o       Collected the recorded traces and applying data mining to them.

o       Analysis of the discovered patterns and identifying potential uses for them.

Learning Outcomes

Nature of End-Product

A dissertation on mining software GUI usage data

An implementation of a prototype for collecting and mining software GUI usage data

Possibly, writing a technical journal paper on results of this study to be published

References

Enhancing A GUI Event Recorder To Support The Creation Of User Documentation

http://www-scf.usc.edu/~shenliu/_private/paper_submission_Liu.pdf

 

M. El-Ramly, E. Stroulia, Mining Software Usage Data. Available at: http://plg.uwaterloo.ca/~aeehassa/home/pubs/MSR2004ProceedingsFINAL_IEE_Acrobat4.pdf

 

The following papers are available at www.cs.le.ac.uk/~mer14

Mohammad El-Ramly and Eleni Stroulia.

Analysis of Web-usage Behavior for Focused Web Sites: A Case Study. Journal of Software Maintenance and Evolution, 1-2 16 ( 2004 ), 129-150. John Wiley and Sons .

 

Eleni Stroulia, Mohammad El-Ramly, Paul Iglinski and Paul Sorenson.

User Interface Reverse Engineering in Support of Interface Migration to the Web . Automated Software Engineering, 3 10 ( 2003 ), 271-301 . Kluwer Academic Publishers .

 

Eleni Stroulia, Mohammad El-Ramly and Paul Sorenson.

From Legacy to Web through Interaction Modeling . In Proc. of the 18th International Conference on Software Maintenance (ICSM 2002), 3-6 October 2002, Montreal, Quebec, Canada. , 320 - 329 . IEEE Computer Society, ISBN 0-7695-1819-2 ( 2002 ).

 

Mohammad El-Ramly, Eleni Stroulia and Paul Sorenson.

From Run-time Behavior to Usage Scenarios: An Interaction-pattern Mining Approach . In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), July 23 - 26, 2002, Edmonton, Canada. , 315 - 324 . ACM Press, ISBN 1-58113-567-X ( 2002 ).

 

Thesis of M. El-Ramly. Can be requested by emailing mer14@le.ac.uk

 


5.   Java to C# Language Transformer

Prerequisites

Working knowledge of Java. Eagerness to read and learn lots about Java and C# (reads C sharp) and their specifications, similarities and differences. It also requires investing time and effort in learning TXL and its use in structural transformations.

Aims of Project

The project aims to produce a language transformer that takes a Java program and produces a corresponding C# program. The transformer will be written in TXL (see references). TXL is a programming language specifically designed to support computer program analysis and source transformation tasks. It is a hybrid functional /rule-based language. The idea of source code transformation with TXL is to write a grammatical description of the first source language and a grammatical description of the target source language. Then the transformation rules that takes a source program in the first language (Java) and transforms it to a source program in the second language (C#) are written and tested. It is expected that the project duration will allow only writing a subset of C# grammar and the Java-to-C# transformation rules. I am also open to other language transformers if the student prefers to write a transformer from another to yet another language instead of Java-to-C#. The project is considerably non-traditional in the sense that most of the effort will go to understanding C# specification and translating it into TXL and inventing the structural transformations necessary to convert Java language constructs to the equivalent C# ones.

Learning Outcomes

Nature of End-Product

A working Java-to-C# language transformer.

Project Timetable

Phase 1: Research on Java and C# language specifications. Learning TXL. Implementing a C# grammar in TXL (A Java grammar in TXL already exists)

References

TXL Website .

http://www.txl.ca/ (2004).

C# Language Specification.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/vclrfcsharpspec_C.asp (2004).

 


6.     Information Gathering Agent using NQL

Aims of Project

The student will use Network Query Language (NQL) to build a software agent for information gathering. Ideally, the user will be able to identify one or more domains of interest and provide the agent with examples of the information of interest to him on the web sites related to this domain, or alternatively, the agent will observe the navigation behaviour of the user in the domain(s) of interest. From these examples, the agent will learn to gather daily information digests for the user in the domain(s) of interest and present it to him every day in the morning. Additionally, the system will have the capability of collecting information from legacy information systems to integrate with information collected from the Web. This definition is broad and flexible. After a period of background research and readings, the student should concretely define the specifications of his/her project.

Learning Outcomes

Nature of End-Product

The end product is an information gathering agent for multiple domains that allows the user to specify the domain of interest (e.g., stock, news, etc.) and specify some details of his interests within this domain (which may include the information sources of interest and examples of the information of interest or examples of the user’s navigation of the relevant websites). Then, the agent provides a daily digest of information tailored to the user’s needs, and collected from the relevant web sources.

Requirements

This project will require NQL software, which can be purchased/obtained from http://www.nqltech.com/nql.asp. It will need some background research on software agents and programming by demonstration or by example. Some links for starting this research are provided below.

Starting Point References

David Pallmann , Network Query Language (NQL),  ISBN: 0-471-20766-7, John Wiley & Sons Inc

http://www.mobilein.com/NQLWhitepaper.pdf

http://www.nqltech.com/nql.asp

http://www-2.cs.cmu.edu/~softagents/intro.htm

http://www.agentbase.com/survey.html

http://citeseer.nj.nec.com/bauer00programming.html

http://web.media.mit.edu/~lieber/PBE/Your-Wish/#ruvini