What is DMS Modernization Technology?
The DMS Modernization Technology is a comprehensive tool kit that automates analysis, modification, translation, and generation of source code for software systems containing arbitrary mixtures of languages ("domains").
A very simple model of DMS is that of an extremely generalized compiler, having a parser, a semantic analyzer, a program transformation engine (to do code generation and optimization), and final output formatting components (producing source code rather than binary code).
It is particularly important that the analyzer output can be configured to define the desired transforms.
Unlike a conventional compiler, in which each component is specific to its task of translating one source language to one target machine language, each DMS component is highly configurable, enabling a stunningly wide variety of effects. This means one can change the input language, change the analysis, change the transforms, and change the output in arbitrary ways.
Also, unlike a conventional compiler, DMS can process thousands of files from multiple languages at the same moment, allowing analysis and/or consistent code changes across complex systems of files. (An interesting property is that DMS reads formal descriptions of languages, analysis and transforms, and is consequently used to support itself.)
This diagram shows the internal components of DMS Modernization Technology and how the external definitions enable these tools to be precisely configured to meet customer's needs.
How is DMS Modernization Technology different than competitive technologies?
DMS Modernization technology is a highly configurable tool set that has been developed over the last decade. This technology is unique in the market place with the level of flexibility to generate solutions that precisely meet customer needs. This technology development was initially evaluated and funded through the National Institute of Standards and Technology - Advanced Technology Program.
Point Solution Language Conversions
This type of conversion technology is very focused and off-the-shelf. This can be seen as a positive if the customer's needs precisely match the tool capabilities. The problem is that there is very little likelihood that a customer's needs match precisely enough with this off-the-shelf solution to keep from having to do extensive manual rewrite or having functionality that cannot be translated. These systems are also often ad-hoc implementations that tend to generate code that is difficult to maintain.
Group of Point Solutions
Some companies have a Group of Point Solution that can allow them to convert several components of an application (or multiple languages) and make them look like they have a complete solution. These point solutions individually have the same issues discussed above. An additional problem with these solutions is that they are not designed to work together and therefore the output will need significant manual intervention in order to get everything to work together.
Procedure-based Language Conversions
Procedure-based Language Conversions are another type of competitive technology. These implementations are typically implemented in code like compilers. Because they are coded like compilers, these implementations tend to be procedural in nature and do not allow for easy change. This architecture tends to make it more difficult to get the exact result that the customer wants. Both procedural and point solutions can be customized through programming changes to the tools. Because these changes are actually changing the tool and need to be debugged just as any new software, they are error prone.
DMS Modernization Technology is Rule-Based Language Conversion Technology
Rule-Based Language Conversion is very flexible on input and output languages because languages are described explicitly as grammars. The customer-specific target issues are handled directly by configuring customer specific rules. Because each legacy application is different, it is important to account for customer specific usage of the language and implementation. These rules are developed and verified during each customer project. Although DMS Modernization Technology is flexible, the customer specific configuration is typically less than 1% of the total functionality of our technology. No two legacy systems are alike and therefore the capability to configure the exact type of transformation that each system needs is critical.
Example
This is an example of a single step in the conversion of High Level Assembly to Java
Transform Remove Machine Artifacts Customer Specific
Elimate Result-less, Parameter-less subroutines- Do deep flow analysis.
- Make parameters out of assigned values consumed by subroutine.
VELSW='C';SELPOL();... void SELPOL(){... VELSW ...} >SELPOL('C');... void SELPOL(char VELSW){ ... VELSW ...}- Make return result of single scalar value assigned by subroutine.
SELPOL(...); ... SQLCODE ... void SELPOL(...){...SQLCODE=...; return;} >SQLCODE=SELPOL(...); ... SQLCODE ... int SELPOL(...){...SQLCODE=...; return SQLCODE;}- Similarly, make side-effected parameter of compound values (structures) updated by subroutine.
DMS Modernization Technology - Industrial Scale
The DMS ModernizationTechnology has been used for an amazing variety of industrial tasks, including quality analysis, restructuring, automated migration, pretty printing and highly optimized code generation.
DMS is designed to work on large scale source systems
- with up to several million lines of source code or specification
- across tens of thousands of source files
- having multiple languages at the same time
DMS is implemented using our proprietary parallel language to provide computational horsepower consistent with this scale. While DMS runs on a single processor PC at unit speed, it also runs on symmetric multiple processor workstations with enhanced performance. As an example, the attribute evaluation process is automatically parallelized, and can often provide a linear speedup on an N-way SMP system.
DMS Modernization Technology Details
DMS provides a large set of robust, integrated facilities for building analysis and modification tools:
- Full UNICODE-based parser and lexer generation with automatic error recovery. Standard support is included for reading multiple source files to enable INCLUDE file management and construct suitable preprocessors. The parser technology is based on GLR, and can handle any context-free language, even with ambiguities (much stronger than YACC/LALR). Proven on dozens of real languages.
- Automatic construction of (non-value-carrying terminals and unit productions are suppressed; syntax-lists are converted into Abstract Syntax Trees (AST) list nodes). Literal values (numbers, escaped strings) are converted to native, normalized binary values for fast internal manipulation. Source comments are captured and attached to AST nodes.
- Pretty-printer generation converts ASTs back to nicely formatted legal source file form, according to specified layout information, including source comments. In fidelity-printing mode, comments, spacing and lexical formatting information of unchanged code is preserved. Customizing allows generation of source code HTML form, or even as obfuscated source text. Trees may be output directly in XML format.
- Multi-pass attribute-evaluator generation from grammar, to allow arbitrary analysis (including name/type analysis procedures) to be specified in terms of the concrete grammar provided.
- Sophisticated symbol-table construction facilities for global, local, inherited, overloaded and other language-dependent name lookup and namespace management rules.
- Control-flow graph construction and data flow analysis framework, to allow data-flow analysis problems to be posed and answered.
- Multiple domains (notations/languages) can be represented at the same time. This enables processing or generating systems composed of parts from more than one domain (COBOL and JCL, C and Makefiles, etc.), and/or translation from one domain language to another.
- Transforms and patterns can be written directly in surface-to-surface domain syntax form. Alternatively, procedural code can implement transforms, or refer to existing transforms and patterns to enable construction of very sophisticated transforms.
- A full Associative/Commutative rewrite engine that operates on trees and DAGs, which can be used to apply sets of transforms.
- An algebraic specification subsystem can be used to specify arbitrary algebras (this is just a DMS domain!). The axioms can be treated as a set of rewrite rules. This allows one to code arbitrary simplification procedures. (We have done simplification on Boolean equations that are essentially 1 million terms in size; we have also modeled optimization of transistor [not gates!] circuits this way).