C to Java Programming Language Translation Jamie McAtamney October 26, 2007 Abstract With the modern emphasis on program portability and the new need to run programs on multiple computers in networks or over the Internet, it would be very useful for C programmers to be able to translate either “legacy” C programs or newly written programs into Java to make them more portable; however, currently translation by hand is seen as too tedious and time-consuming, while computer algorithms to do so are not very accurate. A combination of keyword search/replace and algorithms to translate C structs to Java classes and C “include” modules to Java “import” modules can help alleviate or solve the problem of tedious or inaccurate translations specifically between C and Java. 1 Introduction The C programming language is versatile, powerful, and popular among programmers; Java is also versatile and powerful, but less widely used. The major advantage Java has over C is its modularity, as it is capable of being used on any platform and any operating system, while implementations of C are platform-specific and must be recompiled or sometimes rewritten when moved from one computer to another. There are three major groups of programs that would benefit from translation from C to Java. First are ”legacy” programs that were originally written in C to take advantage of their higher execution speed; as modern computers have more memory and run faster than those of even a few years before, these ”legacy” programs would gain more from added portability than they would from remaining in 1 C. Second are programs wherein the majority of the code implements simple algorithms such as string tokenization, data storage and manipulation, and the like; Java already has several implementations of algorithms such as these built into it, so code could be simplified and shortened. Third are programs that will be used either over a network or the Internet; while C has methods for sending and receiving information between different computers, any programs that require a user interface on the other end of transmissions would benefit greatly from Java’s portability and its already-implemented applet system. 2 Background While the differences among programming languages have been studied extensively in comparative languages courses and otherwise, little progress has been made in the area of automated programming language translation. One company, Jazillian, provides translations among a limited number of languages for a fee, but significant client involvement is required to tailor the algorithm to the program’s intended use. The problems involved with automated translation occur because programming languages are too dissimilar for direct word-for-word translation. Python and Ruby, for example, do not declare variables and use indentation instead of braces-“{” and “}”–to denote blocks of text, in comparison to the C and Java methods of declaring variables and separating code. Even syntactically similar languages such as C and Java have differences that make simple search-and-replace difficult. For example, while the C char arrays have an analog in Java Strings, because they are two different data structures the methods for accessing them are very different, and this discrepancy must be taken into account. A related difficulty is C’s use of pointers: A “string” in C is not simply an array of chars, it is a pointer to an array of chars–expressed as “char*”-which means that string comparison methods, string search methods, and the like are required; one cannot simply copy, compare, or otherwise manipulate strings in the same way one may manipulate ints or chars. Computer scientists A.A. Terekhov and C. Verhoef have written a paper entitled “The Realities of Language Conversions” which describes many of the various problems facing automated language translation systems and upon which I am basing my preliminary work. They describe in detail the some of the differences in structure, implementation, and syntax I mentioned 2 above, and after I progress to the stages where I will have to deal with each problem I will be able to discuss the difficulties in more detail in later versions of this paper. 3 Preliminary Testing At the moment, the translation program is in its first stages. It can read in text from a C file, tokenize each line for C keywords and punctuation, manipulate keywords individually, and output the result to a Java file. Nothing more complex than a simple keyword search-replace has been implemented. More complex translation, such as struct to class translation or manipulation of pointers, are the next step, which will hopefully be completed by the end of the first semester. I initially wrote the translator program in C, as I am most familiar with that language and have coded in C most recently. Unfortunately, my algorithm to tokenize the lines was suffering from segmentation faults and a lack of easy string manipulation methods. I finally had to translate it to Java by hand (irony of ironies) to take advantage of its already-implemented StringTokenizer class, and now that that is finished, I can get back to actual translation coding. Before I tried to tokenize the input file, I was able to implement wholeline translation (for example, the main() method is always ”public static void main(String[] args)” and does not require tokenization to translate), so the basic modular structure of the translator already works. 4 Results and Discussion There are not many results to speak of at this point; the translator has a basic skeleton onto which I will be adding several translation modules later, but right now it is very basic. Only the most rudimentary translations can be made at this point (the whole-line translations mentioned above), but the basic structure is done and from this point on the implementation of more translation modules should be relatively simple. 3 References [1] A.A. Terekhov and C. Verhoef, “The realities of Language Conversions”, IEEE Software, vol. 17, 111-124, 2000. [2] , “Jazillian Design Philosophy”,http://www.jazillian .com/philosophy.html, 2007. 4