Poughkeepsie Chapter of the Association For Computing Machinery


Database operations on compressed data


Luther Woodrum


Monday, November 21, 2016       7:30 PM


Marist College, Hancock Center (Building 16 on map), Room 2023. Park just north of Hancock Center, or in parking lot on south-east corner of Route 9 and Fulton Street. We thank Marist College for hosting the chapter's meetings.

More Information

This program is free and open to the public. Attendees should RSVP at Meetup.com.

All are welcome to join us beforehand for dinner at the Palace Diner at 6:00 PM.
Refreshments are served after the meeting.

For further information, go to Pok.ACM.org (QR code below),
email Bill Collier, or phone 845-522-1971.

QR code RSVP to ACM Poughkeepsie at   Meetup.com

About the Topic

A new data compression method, called mr6, stores data for large files with compression ratios of as much as 9 to 1, while gzip gets ratios of 5 or 6 to 1 for the same files. The new compression method reads and uncompresses data 50% faster than gzip, in 2/3 of the CPU time for gzip. The method stores only one unique copy of each field in a table column, which allows new ways to do queries, such as join and outer join. Pattern matching is much faster by matching on the unique values instead of every record. In addition, queries that use pattern matching can have millions of query patterns that are performed on every record, in a time that is sub-linear with the number of queries. New operations on lists of lists will be presented for queries that operate on compressed data without needing to uncompress it first. A new query language is in progress, and some of its features will be discussed.

About the Speaker

Luther began programming in 1957 as an actuarial programmer at an insurance company in Chicago. While there, he wrote sorts and, in 1958, implemented what is now called pivot tables. In 1960, he decided to make a change to more involvement with computer development, and left the insurance company. He joined IBM in 1961 to work on the 7070 sort and spent the following years working on sorting, indexing, and storage allocation using radix partition trees. Eventually this led to a component of MVS, called RPTS. While working on the sort, he developed a method of ensuring program correctness so that the 8,000 line sort had no errors from the first release forever. In 1968 he attended the graduate school, IBM SRI, and majored in probability, statistics, and queueing theory. This led to a paper on semi-markov processes, published in the IBM Systems Journal in 1970. In 1987 he retired from IBM as a senior programmer. He was granted many patents by that time, and more since then. In retirement he builds computers, does contract programming, and develops applications for sorting, pattern matching, data compression, and a query language for operating on compressed database files, for a licensed program product.

To Print this Announcement