Lossless Compression for Structured Data
Monday, May 21, 2018 7:30 PM
Marist College, Hancock Center (Building 14 on map), Room 2023. Park just north of Hancock Center, or in parking lot on south-east corner of Route 9 and Fulton Street. We thank Marist College for hosting the chapter's meetings.
This program is free and open to the public. Attendees should RSVP at Meetup.com.
All are welcome to join us beforehand for dinner at the Palace Diner at 6:00 PM.
For further information, go to Pok.ACM.org (QR code below):
About the Topic
Examples of structured data are database tables and spreadsheets. The most important measures of a data compression program are four numbers:
- The time to compress.
- The time to uncompress.
- The compression ratio.
- The time to query compressed data.
This presentation will discuss the tradeoffs between choices of compression methods and data structures for the four measures. It will focus on combinations of methods, including partial use of gzip and gunzip in combination with other methods. Special attention will be focused on querying compressed data, as it is not a usual function of a compression program.
About the Speaker
Luther began programming in 1957 as an actuarial programmer at an insurance company in Chicago. While there, he wrote sorts and, in 1958, implemented what is now called pivot tables. In 1960, he decided to make a change to more involvement with computer development, and left the insurance company. He joined IBM in 1961 to work on the 7070 sort and spent the following years working on sorting, indexing, and storage allocation using radix partition trees. Eventually this led to a component of MVS, called RPTS. While working on the sort, he developed a method of ensuring program correctness so that the 8,000 line sort had no errors from the first release forever. In 1968 he attended the graduate school, IBM SRI, and majored in probability, statistics, and queueing theory. This led to a paper on semi-Markov processes, published in the IBM Systems Journal in 1970. In 1987 he retired from IBM as a senior programmer. He was granted many patents by that time, and more since then. In retirement he builds computers, does contract programming, and develops applications for sorting, pattern matching, data compression, and a query language for operating on compressed database files, for a licensed program product.