Θεωρητική Πληροφορική ΙΙ: Προχωρημένες Δομές Δεδομένων
εαρινό εξάμηνο 2006-2007

ΓενικάΑνακοινώσειςΥλικό

Γενικά

Διδάσκων

Διαλέξεις

  • Δευτέρα 17:00-21:00, αίθουσα 1.1.29 Κτήριο Ηλεκτρολόγων (ΕΜΠ)

Περιεχόμενο μαθήματος

ADVANCED DATA STRUCTURES: ALGORITHMS AND COMPLEXITY

  1. Basic external memory data structures: B+-tree, Linear and Extensible Hashing
  2. Spatial Index Structures - External Memory: R-trees (R-tree, R+-tree, R*-tree), Grid file
  3. Basic main memory data structures: Splay-trees, Fusion trees, Geometric data structures (kd-trees, Quadtrees, Range trees, Interval trees), Priority R-trees
  4. Persistent Data Structures: Persistent and Partial Persistent data structures, Multi-version B-tree
  5. Randomized and Probabilistic Data Structures and Algorithms: Hash functions, Universal hashing, Perfect hashing, Locality Sensitive Hashing
  6. Embeddings: Metric Embeddings, Random Projections
  7. Sketch based techniques: AMS sketch, Random projections sketch (JL lemma), Count Min sketch, Count sketch, FM sketch and its applications
  8. Data structures for string matching: Suffix trees, Tries, Patricia trees, Suffix arrays, BLAST like methods
  9. Graphs: decompositions, treewidth, pathwidth, cliquewidth, dynamic graphs, small world graphs, social networks, web graph, PageRank
  10. Data structures for data mining and machine learning: Association rules (FP-tree, A-priori, etc), Clustering (CF-tree), Classification (Classification trees)

Βιβλιογραφία

  1. Electronic notes available at the following links:
  2. Books:
    • T. Cormen, C. Leiserson, R. Rivest, and C. Stein: Introduction to Algorithms, 2nd edition, MIT Press, 2001
    • M. de Berg, M. van Kreveld, M.H. Overmars, and O. Schwarzkopf, Computational Geometry, Algorithms and Applications, second edition, Springer-Verlag, 2000
  3. Suggested Reading - Papers to be Presented:
    • Splay trees:
      • Amortized Efficiency of List Update and Paging Rules,Sleator and Tarjan link
    • Fusion trees:
      • Erik Demaine's lecture notes 9 and 10 - Integers link, link
      • M.L. Fredman and D.E. Willard. Surpassing the information theoretic bound with fusion trees. Journal of Computer and System Sciences, 47:424-436, 1993
    • Persistent Data Structures:
      • Peter J. Varman, Rakesh M. Verma: An Efficient Multiversion Access Structure. IEEE Trans. Knowl. Data Eng. 9(3): 391-409 (1997) link
    • Randomized and Probabilistic Data Structures and Algorithms:
      • L. Carter & M. Wegman, Universal hash functions, Journal of computer and system sciences, 1979, pp. 143--154.
      • Universal hashing, Peter Bro Miltersen's notes link
      • Aristides Gionis, Piotr Indyk, Rajeev Motwani: Similarity Search in High Dimensions via Hashing. VLDB 1999: 518-529 link
      • Mayur Datar, Nicole Immorlica, Piotr Indyk, Vahab S. Mirrokni: Locality-sensitive hashing scheme based on p-stable distributions. Symposium on Computational Geometry 2004: 253-262 link
    • Embeddings
      • Jiri matousek, Chapter 15 from "Lectures on Discrete Geometry" link
    • Sketches
      • P. B. Gibbons. Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports, VLDB'01. link
      • Min-Wise Independent Permutations. Andrei Z. Broder, Moses Charikar, Alan M. Frieze, Michael Mitzenmacher. Journal of Computer and System Sciences 1998. link
      • P. Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. Proc. of the 41st Symposium on Foundations of Computer Science, 2000. link
    • String Search - Suffix Trees, etc.:
      • Erik Demaine's notes on string search link, link

Ανακοινώσεις

  • 15/7/2007: Η εξέταση του μαθήματος θα γίνει στις 20/7, ώρα 12:00, στην αίθουσα 1.1.29 του κτ. Ηλεκτρολόγων.
  • 27/4/2007: Το επόμενο μάθημα (30/4) τελικώς θα γίνει και πάλι στο αμφιθέατρο πολυμέσων της βιβλιοθήκης ΕΜΠ και θα περιλαμβάνει διάλεξη του Η. Κουτσουπιά.
  • 24/4/2007: Το επόμενο μάθημα (και εκτός απροόπτου όλα τα επόμενα μαθήματα) θα γίνει και πάλι στην αίθουσα 1.1.29 του κτ. Ηλεκτρολόγων
  • 20/4/2007: Το επόμενο μάθημα (23/4) θα γίνει στο Αμφιθέατρο Πολυμέσων (κάτω από Βιβλιοθήκη) στις 17:00.
  • 13/4/2007: Το επόμενο μάθημα θα πραγματοποιηθεί τη Δευτέρα 16/4 στις 17:00. Νέα τοποθεσία το (ανοικτό πλέον) κτ. Ηλεκτρολόγων, αίθουσα 1.1.29.
  • 19/3/2007: Τα μαθήματα άρχισαν! Οι διαλέξεις θα γίνονται προς το παρόν κάθε Δευτέρα. Μπορείτε να βλέπετε ανακοινώσεις σχετικές με το μάθημα εδώ ή να ενημερώνεστε μέσω mail (αν δεν έχετε γραφτεί στη mailing list του μαθήματος στείλτε ένα mail στο mlambis/at/corelab/dot/ntua/dot/gr)
  • Ανακοινώθηκαν τα papers που θα αποτελέσουν τη εξεταστέα ύλη του μαθήματος. Είναι τα εξής:
    • Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching, Proc. 1984 ACM SIGMOD International Conference on Management of Data, pp. 47-57 link
    • Σημειώσεις Ι. Εμίρη (δείτε κεφάλαια 1 και 10) link
    • Christos Faloutsos and King-Ip (David) Lin FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets ACM SIGMOD, May 1995, San Jose, CA, pp. 163-174. link
    • Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, Vasilis Samoladas: On a model of indexability and its bounds for range queries. J. ACM 49(1): 35-55 (2002) link
    • S. Dasgupta and A. Gupta: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 2003 link
    • D.D. Sleator and R.E. Tarjan. Self-Adjusting Binary Search Trees. Journal of the ACM 32:3, pages 652-686, 1985. link
    • Bruno Becker, Stephan Gschwind, Thomas Ohler, Bernhard Seeger, Peter Widmayer: An Asymptotically Optimal Multiversion B-Tree. VLDB J. 5(4): 264-275 (1996) link
    • Hans L. Bodlaender. Treewidth: Algorithmic Techniques and Results. Proceedings 22nd International Symposium on Mathematical Foundations of Computer Science, MFCS'97. link
    • Noga Alon, Yossi Matias, Mario Szegedy: The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci. 58(1): 137-147 (1999) link
    • Piotr Indyk, Rajeev Motwani: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. STOC 1998: 604-613 link

Υλικό

Σημειώσεις και διαφάνειες διαλέξεων

19/3/2007 Γιώργος Κόλλιος B+-trees and linear hashing Slides Related papers:
  • David B. Lomet: The Evolution of Effective B-tree: Page Organization and Techniques: A Personal Account. SIGMOD Record 30(3): 64-69 (2001)
  • Ricardo A. Baeza-Yates: Fringe Analysis Revisited. ACM Comput. Surv. 27(1): 111-119 (1995)
  • Witold Litwin: Linear Hashing: A New Tool for File and Table Addressing. VLDB 1980: 212-223 (download)
  • Ricardo A. Baeza-Yates, Hector Soza-Pollman: Analysis of Linear Hashing Revisited. Nord. J. Comput. 5(1): (1998)
26/3/2007 Γιώργος Κόλλιος R-trees and grid file
Worst-case performance of the R-tree
Slides Related papers:
  • Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik: The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71 (1984)
  • H. V. Jagadish: Linear Clustering of Objects with Multiple Atributes. ACM SIGMOD Conference 1990: 332-342
  • A. Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57
  • Christos Faloutsos and Ibrahim Kamel. Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. Proc. ACM PODS, 1994
  • Yannis Theodoridis and Timos Sellis. A Model for the Prediction of R-tree Performance. Proc. ACM PODS, 1996.
16/4/2007 Γιώργος Κόλλιος και Χριστόδουλος Φραγκουδάκης Segment trees and Interval trees
Priority R-tree
Geometric Data Structures
Slides1
Slides2
Slides3
  • The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree. Lars Arge, Mark de Berg, Herman Haverkort, and Ke Yi.Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD '04), Paris, France, June 2004, 347-358.
  • Chapter 10, More Geometric Data Structures --- Windowing M. de Berg, M. van Kreveld, M.H. Overmars, and O. Schwarzkopf, Computational Geometry, Algorithms and Applications, second edition, Springer-Verlag, 2000.
  • Σημειώσεις Ι. Εμίρη (δείτε κεφάλαια 1 και 10) link
23/4/2007 Γιώργος Κόλλιος και Βασίλης Αθίτσος Embeddings Slides
Slides2
  • Christos Faloutsos and King-Ip (David) Lin FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets ACM SIGMOD, May 1995, San Jose, CA, pp. 163-174. link
  • BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval. Vassilis Athitsos, Jonathan Alon, Stan Sclaroff, and George Kollios. Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), to appear. link
  • Query-Sensitive Embeddings. Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, and Stan Sclaroff. ACM Transactions on Database Systems (TODS), 32(2), June 2007. link
30/4/2007 Ηλίας Κουτσουπιάς, Αρης Παγουρτζής, Γιώργος Κόλλιος Indexability Theory
Johnson-Lindenstrauss Lemma
Slides
Slides2
  • Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, Vasilis Samoladas: On a model of indexability and its bounds for range queries. J. ACM 49(1): 35-55 (2002) link
  • Dimitris Achlioptas: Database-friendly random projections. PODS 2001 link
  • S. Dasgupta and A. Gupta: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 2003 link
7/5/2007 Γεωργία Καούρη, Άρης Τέντες Splay trees
Fusion Trees
Slides1
Slides2
  • Self-Adjusting Binary Search Trees, Sleator and Tarjan link
  • Erik Demaine's lecture notes 3 link
  • Danny Sleator's lecture notes link
14/5/2007 Νίκος Κιούρτης Suffix arrays
OASIS
Slides
  • Udi Manber and Gene Myers (1991). "Suffix arrays: a new method for on-line string searches". SIAM Journal on Computing, Volume 22, Issue 5 (October 1993), pp. 935-948. link
  • Altschul, SF, W Gish, W Miller, EW Myers, and DJ Lipman. Basic local alignment search tool. J Mol Biol 215(3):403-10, 1990 link
21/5/2007 Ματθαίος Δαμίγος, Γιώργος Κόλλιος Suffix trees
Persistent data structures
Slides1
Slides2
  • E. Ukkonen (1995). "On-line construction of suffix trees". Algorithmica 14 (3): 249--260. link
  • Martin Farach (1997). "Optimal suffix tree construction with large alphabets". Foundations of Computer Science, 38th Annual Symposium on: 137--143 link
  • Making Data Structures Persistent, Driscoll, Sarnak, Sleator, and Tarjan link
  • Bruno Becker, Stephan Gschwind, Thomas Ohler, Bernhard Seeger, Peter Widmayer: An Asymptotically Optimal Multiversion B-Tree. VLDB J. 5(4): 264-275 (1996) link
4/6/2007 Γιώργος Πιερράκος, Αντώνης Αχιλλέως Edit-distance and Ulam metric Slides1
Slides2
  • Approximating edit distance efficiently (FOCS 2004) Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer and Ravi Kumar link
  • Embedding the Ulam metric into l_1. Moses Charikar and Robert Krauthgamer
  • Low distortion embeddings for edit distance (STOC 2005) Rafail Ostrovsky and Yuval Rabani
11/6/2007 Μιχάλης Λαμπής, Βάλια Μήτσου Small-world networks
Treewidth
Expander graphs
Slides1
Slides2
Slides3
  • Hans L. Bodlaender. Treewidth: Algorithmic Techniques and Results. Proceedings 22nd International Symposium on Mathematical Foundations of Computer Science, MFCS'97. link
  • Hans L. Bodlaender. A tourist guide through treewidth. Acta Cybernetica 11 (1993) 1-21. link
  • Jon Kleinberg. The Small-World Phenomenon: An Algorithmic Perspective. Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC 2006). link
  • Nati Linial's and Andy Wigderson's lecture notes link
18/6/2007 Δημήτρης Κωστόπουλος, Ελένη Παναγιώτου AMS Sketch
Count-min sketch
Slides1
Slides2
  • Noga Alon, Yossi Matias, Mario Szegedy: The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci. 58(1): 137-147 (1999) link
  • Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy: Tracking Join and Self-Join Sizes in Limited Storage. PODS 1999. link
  • G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1): 58--75, April 2005. link
25/6/2007 Βαγγέλης Μπαμπάς, Κάτια Παπακωνσταντινοπούλου Approximate nearest neighbors
Geometric Embeddings
Slides1
Slides2
  • The Geometry of Graphs and Some of Its Algorithmic Applications, Nathan Linial, Eran London, Yuri Rabinovich, Combinatorica 1995. link
  • P. Indyk. Algorithmic Application of Low-Distortion Geometric Embeddings. FOCS 2001
  • Piotr Indyk, Rajeev Motwani: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. STOC 1998: 604-613 link