对话

• 输入：一个最多包含 n 个正整数的文件，每个数都小于 n，其中 n = $10^7$。没有重复数字。
• 输出：升序排序的列表。
• 约束：1 MB 内存空间，磁盘空间充足，运行时间 10s 以内。

实现概要

1. 将所有位置置为 0。
2. 依次读入整数，在对应的位置上置为 1。
3. 检验每一位，如果等于 1，就输出对应的整数。

习题

1. If memory were not scarce, how would you implement a sort in a language with libraries for representing and sorting sets?

set 集合容器实现了红黑树的平衡二叉检索树的数据结构，它会自动调整二叉树的排列，把元素放到适当的位置。set 容器所包含的元素的值是唯一的，集合中的元素按一定的顺序排列。

2. How would you implement bit vectors using bitwise logical operations (such as and, or and shift)?

3. Run-time efficiency was an important part of the design goal, and the resulting program was efficient enough. Implement the bitmap sort on your system and measure its run time; how does it compare to the system sort and to the sorts in Problem 1? Assume that n is 10,000,000, and that the input file contains 1,000,000 integers.

bitmapsortset

4. If you take Problem 3 seriously, you will face the problem of generating k integers less than n without duplicates. The simplest approach uses the first c positive integers. This extreme data set won’t alter the run time of the bitmap method by much, but it might skew the run time of a system sort. How could you generate a file of k unique random integers between 0 and n-1 in random order? Strive for a short program that is also efficient.

5. The programmer said that he had about a megabyte of free storage, but the code we sketched uses 1.25 megabytes. He was able to scrounge the extra space without much trouble. If the megabyte had been a hard and fast boundary, what would you have recommended? What is the run time of your algorithm?

1. 可以考虑分成两类，分别是以 0 和 1 开头的电话号码，这样的话可以将内存需求降低到 100 万字节。
2. 将所有数字分成两组，每组 500 万字节，排序完然后进行归并排序。

6. What would you recommend to the programmer if, instead of saying that each integer could appear at most once, he told you that each integer could appear at most ten times? How would your solution change as a function of the amount of available storage?

7. [R. Weil] The program as sketched has several flaws. The first is that it assumes that no integer appears twice in the input. What happens if one does show up more than once? How could the program be modified to call an error function in that case? What happens when an input integer is less than zero or greater than or equal to n? What if an input is not numeric? What should a program do under those circumstances? What other sanity checks could the program incorporate? Describe small data sets that test the program, including its proper handling of these and other ill-behaved cases.

8. When the programmer faced the problem, all toll-free phone numbers in the United States had the 800 area code. Toll-free codes now include 800, 877 and 888, and the list is growing. How would you sort all of the toll-free numbers using only a megabyte? How can you store a set of toll-free numbers to allow very rapid lookup to determine whether a given toll-free number is available or already taken?

9. One problem with trading more space to use less time is that initializing the space can itself take a great deal of time. Show how to circumvent this problem by designing a technique to initialize an entry of a vector to zero the first time it is accessed. Your scheme should use constant time for initialization and for each vector access, and use extra space proportional to the size of the vector. Because this method reduces initialization time by using even more space, it should be considered only when space is cheap, time is dear and the vector is sparse.

10. Before the days of low-cost overnight deliveries, a store allowed customers to order items over the telephone, which they picked up a few days later. The store’s database used the customer’s telephone number as the primary key for retrieval (customers know their phone numbers and the keys are close to unique). How would you organize the store’s database to allow orders to be inserted and retrieved efficiently?

11. In the early 1980’s Lockheed engineers transmitted daily a dozen drawings from a Computer Aided Design (CAD) system in their Sunnyvale, California, plant to a test station in Santa Cruz. Although the facilities were just 25 miles apart, an automobile courier service took over an hour (due to traffic jams and mountain roads) and cost a hundred dollars per day. Propose alternative data transmission schemes and estimate their cost.

12. Pioneers of human space flight soon realized the need for writing implements that work well in the extreme environment of space. A popular urban legend asserts that the United States National Aeronautics and Space Administration (NASA) solved the problem with a million dollars of research to develop a special pen. According to the legend, how did the Soviets solve the same problem?