C to Java Programming Language Translation Jamie McAtamney October 26, 2007

C to Java Programming Language Translation
Jamie McAtamney
October 26, 2007
With the modern emphasis on program portability and the new
need to run programs on multiple computers in networks or over the
Internet, it would be very useful for C programmers to be able to
translate either “legacy” C programs or newly written programs into
Java to make them more portable; however, currently translation by
hand is seen as too tedious and time-consuming, while computer algorithms to do so are not very accurate. A combination of keyword
search/replace and algorithms to translate C structs to Java classes
and C “include” modules to Java “import” modules can help alleviate
or solve the problem of tedious or inaccurate translations specifically
between C and Java.
The C programming language is versatile, powerful, and popular among programmers; Java is also versatile and powerful, but less widely used. The
major advantage Java has over C is its modularity, as it is capable of being used on any platform and any operating system, while implementations
of C are platform-specific and must be recompiled or sometimes rewritten
when moved from one computer to another. There are three major groups
of programs that would benefit from translation from C to Java. First are
”legacy” programs that were originally written in C to take advantage of
their higher execution speed; as modern computers have more memory and
run faster than those of even a few years before, these ”legacy” programs
would gain more from added portability than they would from remaining in
C. Second are programs wherein the majority of the code implements simple algorithms such as string tokenization, data storage and manipulation,
and the like; Java already has several implementations of algorithms such
as these built into it, so code could be simplified and shortened. Third are
programs that will be used either over a network or the Internet; while C has
methods for sending and receiving information between different computers,
any programs that require a user interface on the other end of transmissions
would benefit greatly from Java’s portability and its already-implemented
applet system.
While the differences among programming languages have been studied extensively in comparative languages courses and otherwise, little progress has
been made in the area of automated programming language translation. One
company, Jazillian, provides translations among a limited number of languages for a fee, but significant client involvement is required to tailor the
algorithm to the program’s intended use. The problems involved with automated translation occur because programming languages are too dissimilar
for direct word-for-word translation. Python and Ruby, for example, do not
declare variables and use indentation instead of braces-“{” and “}”–to denote blocks of text, in comparison to the C and Java methods of declaring
variables and separating code. Even syntactically similar languages such as C
and Java have differences that make simple search-and-replace difficult. For
example, while the C char arrays have an analog in Java Strings, because
they are two different data structures the methods for accessing them are
very different, and this discrepancy must be taken into account. A related
difficulty is C’s use of pointers: A “string” in C is not simply an array of
chars, it is a pointer to an array of chars–expressed as “char*”-which means
that string comparison methods, string search methods, and the like are required; one cannot simply copy, compare, or otherwise manipulate strings in
the same way one may manipulate ints or chars.
Computer scientists A.A. Terekhov and C. Verhoef have written a paper
entitled “The Realities of Language Conversions” which describes many of
the various problems facing automated language translation systems and
upon which I am basing my preliminary work. They describe in detail the
some of the differences in structure, implementation, and syntax I mentioned
above, and after I progress to the stages where I will have to deal with each
problem I will be able to discuss the difficulties in more detail in later versions
of this paper.
Preliminary Testing
At the moment, the translation program is in its first stages. It can read in
text from a C file, tokenize each line for C keywords and punctuation, manipulate keywords individually, and output the result to a Java file. Nothing
more complex than a simple keyword search-replace has been implemented.
More complex translation, such as struct to class translation or manipulation
of pointers, are the next step, which will hopefully be completed by the end
of the first semester.
I initially wrote the translator program in C, as I am most familiar with
that language and have coded in C most recently. Unfortunately, my algorithm to tokenize the lines was suffering from segmentation faults and a
lack of easy string manipulation methods. I finally had to translate it to
Java by hand (irony of ironies) to take advantage of its already-implemented
StringTokenizer class, and now that that is finished, I can get back to actual
translation coding.
Before I tried to tokenize the input file, I was able to implement wholeline translation (for example, the main() method is always ”public static void
main(String[] args)” and does not require tokenization to translate), so the
basic modular structure of the translator already works.
Results and Discussion
There are not many results to speak of at this point; the translator has a
basic skeleton onto which I will be adding several translation modules later,
but right now it is very basic. Only the most rudimentary translations can
be made at this point (the whole-line translations mentioned above), but the
basic structure is done and from this point on the implementation of more
translation modules should be relatively simple.
[1] A.A. Terekhov and C. Verhoef, “The realities of Language Conversions”,
IEEE Software, vol. 17, 111-124, 2000.
, “Jazillian Design Philosophy”,http://www.jazillian
.com/philosophy.html, 2007.