Code Monkey home page Code Monkey logo

algo_datastructures's Introduction

algo_datastructures

Repository which holds different implementation of famous data structures and algorithms. This repository is designed to be taken as a refernce, or to have a quick look at various implementations of algorithms and data structures. The short programs are categorized as below, with their documentation.

language -

Java

java version -

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

Disclaimer

The code written in this repository, are not necessarily the original work of me (except for, some of them). Many of them are referenced from books or tutorial sites such as

Table of Content

  1. Sorting Algorithms
  2. Data Structures
    1. Basic Data Structures
      1. Dynamic Array
      2. Linked List
      3. Stack and Queue
      4. Heap
      5. IndexMinHeap
      6. Binary Search Tree
      7. Balanced Search Tree - Red Black tree
      8. Hashing and Hashtable
    2. Advanced Data Structures
      1. Graphs
      2. Graph Traversal
      3. UnionFind & ConnectedComponents
      4. Spanning Tree and MST
      5. Prim's MST
      6. Kruskal's MST
      7. Cycle Detection - Directed Graph
      8. DFS Order
      9. Topological Sort
      10. Kosaraju's Strong Connected Component algorithm
      11. Dijkstra's SP
  3. Strings
    1. Sorting
      1. Key Indexed Counting
      2. LSD String sort
      3. MSD String sort
      4. Three way string Quick sort
      5. Summary
    2. Tries
      1. R-way Trie ST
Algorithm Time Complexity Best Time Complexity Average Time Complexity Worst Space Complexity Is Stable?
BubbleSort Ω (n) Θ (n^2) Ο (n^2) Ο (1) Yes
SelectionSort Ω (n^2) Θ (n^2) Ο (n^2) Ο (1) No
InsertionSort Ω (n) Θ (n^2) Ο (n^2) Ο (1) Yes
ShellSort Ω (n log(n)) Θ (n log(n)^2) Ο (n log(n)^2) Ο (1) No
MergeSort Ω (n log(n)) Θ (n log(n)) Ο (n log(n)) Ο (n) Yes
QuickSort Ω (n log(n)) Θ (n log(n)) Ο (n^2)) Ο (log(n)) No
HeapSort Ω (n log(n)) Θ (n log(n)) Ο (n log(n)) Ο (1) No

Data structures are a particular way of organizing the data in a computer, so that it can be accessed and modified efficiently.

Dynamic array is an array based data-structure (insert - Ο(1) and deleteAtIndex - Ο(1)), which adjusts its size and rearranges the items (re-indexing), as and when data gets added or deleted.

Linked List is a special data structure of connected items, represented by a special structure called as Node. A Node structure stores the data as well as a link (or memory address) to the next Node item present in the sequence.

In singly linked list, Node will have a pointer to the Node present ahead of it in the sequence (Node next;). In doubly linked list, a Node will be connected to both next and previous Node in the sequence.

Stack is a popular data structure in which items inserted last will sit on top of the structure, pushing the previously inserted items to the bottom. It gives convenient methods to check, what is at the top of the structure (peek())? and to get and delete the item on the top of the structure (pop()), with a time complexity of - Ο(1). It is also called as LIFO data-structure. Following are some different implementations of stack, backed by array, singly linked list and doubly linked list.

Queue too is a popular data structure in which items inserted first will sit on front of the structure, pushing the next items to be inserted to the back. It gives convenient methods to check, what is at the beginning of the structure (peek())? and to get and delete the item on the front of the structure (enQueue()), with a time complexity of - Ο(1). It is also called as FIFO data-structure. Following are some different implementations of queue, backed by array and singly linked list.

Heap is a tree based data structure, where each node follows the heap rule. Heaps can be of types - Min and Max. In Min heap, if a node within the tree is not a leaf node, then its key will always be less than its children. In this way the root of the tree will always be the item, with the least key among all the items present in the heap. Max heap is just the opposite, where the root node's key is the highest, and all non-leaf nodes will always be greater than their children.

IndexMinHeap is an extension of Min Heap implementation which associates a particular Index with the Key. The Key will be inserted and used to maintain the heap order, and the Index will be mapped with the key, to have a 1-1 mapping between them. When a method such as getMin() and delMin() is called, the Index corresponding to the minimum key will be returned by the method.

BinarySearchTree (a.k.a TreeMap or Binary Tree Based Symbol table) is one of the most popular tree based data structure, which provides efficient means of performing search operation on given set of data. BST is a binary tree (Tree, where each node has max of 2 children), with following unique characteristics.

  • All the keys on the left sub-tree of any Node x, are less than the value of x.key
  • All the keys on the right sub-tree of any Node x are greater than the value of x.key

We can use BST's get(key) method to access the data associated with a particular key, and we can use put(key,value), delete(key), deleteMin() and deleteMax() to modify its content. BST provides effective ways to search keys using following methods -

  • min() - Min key
  • max() - Max key
  • floor(key) - Greatest key, smaller than or equal to the passed key parameter
  • ceiling(key) - Smallest key, greater than or equal to the passed key parameter
  • rank(key) - Number of keys, less than the passed key parameter
  • select(rank) - Returns the key with the passed rank value
  • keys() - All the keys in the BST
  • keys(lowKey, hiKey) - Keys whose values are in the range lowKey - hiKey.

Trivia - Along with the above mentioned popular method, various tree traversal techniques also have been implemented.

  • Inorder - Traversal with order, left, root and right. Implementations - inorderR() (Recursion based), inorderI() (Iterative), inorderItr() (Iterator based)
  • Preorder - Traversal with order, root, left and right. Implementations - preorderR() (Recursion based), preorderI() (Iterative), preorderItr() (Iterator based)
  • Postorder - Traversal with order, left, right and root. Implementations - postorderR() (Recursion based), postorderI() (Iterative), postorderItr() (Iterator based)

The problem with Binary search tree is that, if we insert the elements to it in ascending or descending order, the tree, becomes more or less, like a linear list, giving up the performance characteristic it has of O(log n). The balance of BST, heavily depends on the order at which the elements will be inserted. To solve this, we need a tree based data structure, which balances itself, during the insertion / deletion and makes sure constant O(log n) time complexity for most kind of operations.

RedBlackBST does this by defining some nomenclature and simple rules to maintain the tree structure.

  1. The tree can have 2 kinds of Nodes. RED node or BLACK node. The color of a node depends on the color of incoming link from its parent. For exampl, if a Node has parent link of RED, then it is a RED node, or it is a BLACK node.
  2. RED links always lean left.
  3. No node has 2 RED links connected to it
  4. Null links are considered to be BLACK
  5. The tree has perfect BLACK balance. Every path from root to null link has same number of BLACK nodes.
  6. The root of the tree is always a BLACK node

While inserting the nodes, if any of the rules stated above gets violated, then they are fixed using rotation and flipping colors.

  1. If we get a right leaning RED link - then the node representing the root of the right RED leaning tree is left rotated.
  2. If a parent has left and right children as RED nodes, then the color of the children are flipped with parent.
  3. If we get both parent and child are RED nodes leaning towards the left - then the root of the left leaning tree is right rotated, which makes the child new root, having 2 RED children. Now follow step 2.

Deletion in RBT are complicated and involves multiple use-cases, hence it is not covered / implemented here.

Hashtable is an array based symbol table implementation, which relies on Hashing function to distribute the keys uniformly across the array. Once the keys are uniformly distributed, we can search for a key by calculating its hash() value, and converting it into a valid index within the array. The process of converting hash() value into array index is called Modular Hashing.

	int index = hash(key) % M; // where M is size of the array

There are different ways in which Hashtable can be implemented, below are some of them -

Hashing with separate chaining

When we use Modular Hashing, there is a good chance of 2 keys being allocated the same index. This phenomenon is called Hash Collision. In case of collision, we need a way to represent two or more <Key, Value> pairs, hashed to the same index and to solve this, Separate Chaining method uses Array of LinkedLists. If two keys get allocated same index, those two keys will be appended, one after the other into the LinkedList, stored at that index.

Hashing with linear probing
  • Linear probing Hashtable is an array backed, open addressing based Hashing methodology, which solves the issue of Hash Collision by choosing the closest empty index next to the index given by the hash() function.
  • Incase of Hash Collision, the logic is to start with the index given by the hash() function and move forward one index at a time, until an empty slot is encountered.
  • All the keys which are found, starting from the hash index until the empty index are referred as the probes of the new key.
  • While moving forward, if we reach end of the array, then we start over again from the beginning of the array.
  • One of the main criteria for linear probing to work is that - the number of keys N should always be less than the size of the array M.
  • The ratio of number of keys to total size of array (N/M) is called load factor. For an optimal performance, the load factor should be around 0.25 to 0.5 to get efficiency for both memory and running time.
  • For liner probing, load factor cannot be 1 (array cannot be full), because, if that happens, then insertion in case of hash collision will run infinitely.

Graph is a data structure which represents the vertices (data) and the connection between the vertices.

  • A dense graph is a graph, where most of the vertices are connected with each other.
  • A sparse graph is a graph, where only some of the vertices are connected with other vertices.

In real world applications, sparse graphs are common than dense graphs.

There are two ways in which Graphs can be represented.

Graph using Adjacency Matrix - Here the connection between the vertices within the graph are represented in (v * v) boolean matrix, where v represents the number of vertices present in the graph.

Advantage - O(1) access time to check if a particular vertex is connected to another vertex or not.

Disadvantage - The space required to represent the graph is v * v, even though, not all vertices are connected.

Usage - Can be efficiently used to represent dense graphs.

Graph using Adjacency List - Here the graph is represented as array of vertices v[] and each array will have a Dynamic List pointing to the vertices it is connected to.

Advantage - Only O(E) space is required to represent a graph with E edges.

Disadvantage - The time complexity of finding out a connection between vertices is O(E).

Usage - Can be efficiently used to represent sparse graphs

Also, there are 2 kinds of Graphs - Directed and Undirected. Below algorithms are categorized based on the kind of graph they work on.

There are two ways in which the graph's vertices can be traversed for searching purpose.

Depth First Search - In DFS, the child vertices of each vertex are visited first, until all the vertex are covered OR the leaf vertex is reached. Once the vertex has no more children, then backtracking is done and other children are visited / searched.

Breadth First Search - In BFS, first all the children of a vertex are visited one after the other, once finished, then children at the next level are visited.

Cycle Detection -of undirected graph - is the process of finding, whether a cycle exists or not in the graph, by using DFS path search along with parent back tracking.

Theory - If all the adjacent vertex (w) of a particular vertex (v) is visited (marked), then it should be equal to the parent (from which the vertex got searched using dfs). Else, there is a Cycle.

UnionFind (or Disjoint-Set data structure) is a data structure which helps to connect components to form a set, to track disjoint sets and to see if two components are connected (part of a same set) or not.

The application of this data structure varies from, finding if two Computers are connected within a network (either directly or indirectly), or variable name equivalence tracking in some languages (checking if two aliases point to same variable) to name a few. This article gives a good insight into the data structure and how it evolved to become more efficient!

It is also used to detect cycle within the graph, which is used as part of Kruskal's MST algorithm.

ConnectedComponent is another algorithm to find connected vertices in a graph and dividing the vertices within a graph based on which component they belong to. The application of this algorithm is same as UnionFind, but this algorithm is bit slower due to undreneath DFS implementation.

Spanning Tree - Spanning tree of a graph is a sub-graph, where all the vertices of the graph are connected by edges without any cycle.

Minimum Spanning Tree - Is a spanning tree calculated from the undirected weighted graph, where the weight of the edges connecting the spanning tree is minimum.

Prim's algorithm is a Greedy algorithm, which starts with a vertex to build a MST, and then keeps adding it more connections which have the least weight.

Kruskal's algorithm is a simple algorithm, which first orders all the edges of the graph in the ascending order, then adds them to the MST, by making sure that, edge to be added to the mst, does NOT form a cycle with already existing edges in MST. The check of determining cycle is performed using UnionFind data structure.

Cycle detection on the directed graph, is little bit more work compared to Cycle detection on undirected graph. The algorithm is achieved by doing DFS traversal and tracking the recursive call stack.

Theory - While doing DFS, If we ever find a directed edge v -> w to a vertex w that is on that stack, we have found a cycle, since the stack is evidence of a directed path from w to v, and the edge v -> w completes the cycle.

DepthFirstOder - is the ways in which we record the visited vertices during DFS traversal on the directed graph. There are 3 ways in which we can record the encountered vertices.

Pre - In the DFS method, collect the current vertex in a Queue, at the beginning of the method - This records the vertices in the same order as they encounter during the DFS traversal.

Post - In the DFS method, collect the current vertex in a Queue, at the end of the method - This records the vertices in the manner of "Level Order Traversal" from the leaf node to the root node.

Reverse Post - In the DFS method, collect the current vertex in a Stack, at the end of the method - This records the vertices in Topological Order.

Topological Sort is the process of arranging the vertices of a directed graph in such a way that, for all v , w in the graph, if there exists a directed edge v -> w ( v to w) then v will appear before w in the ordered arrangement. In other words, the Topological Order makes sure that, head vertex is always ahead of tail vertex, for all the head-tail edges. To find the order, the graph must not contain Cycles (The ordering only works on DAGs).

In a directed graph, two vertices v and w are said to be strongly connected, if there is a path from v to w in both the directions, directly or indirectly. A **Strongly connected component (SCC) ** of a graph is a subset of vertices with the graph, which have strong connectivity between them. Remember that in a graph having SCC's, there ought to be a Directed Cycle!

KosarajuSCC algorithm helps us to find the SCC within a given directed graph in 3 simple steps.

  1. Calculate the ReversePost order of transpose G(r) of the given digraph G
  2. Iterate through the reverse oder and do the DFS
  3. Increment the counter when one round of DFS ends. The vertices discovered during the round of DFS belong to a single SCC a. Continue the DFS discovery, until all the vertices ordered as per Step 1 exhausts.

Dijkstra's algorithm is an efficient algorithm, which finds shortest path from a source vertex, to all the verties it is connected to (directly / indirectly), by forming a Shortest Path Tree. It takes help of IndexMinHeap to get the next vertex to process (thus achieving the Ο (E log V) ), along with a process called as Edge Relaxation.

Key Indexed Counting - Is one of the basic sorting technique that is used to mostly sort the strings.

  • It maintains a count[] of length 256 (total ASCII characters) to keep track of count of items
  • Count at an index i + 1 , where i represents the index of the string, is updated, for every occurrence of the string (There are multiple ways to derive index from a string (LSD / MSD), which will be discussed later)
  • Count array is updated to set the range (occurrences are transformed into indices)
  • The items are redistributed, as per the index recorded in the count array

LSD sorts are an extension of Key Indexed Counting methodology, which is used to sort the strings of equal length W. The time complexity of this algorithm is O(WN) where N is total number of strings to be sorted.

LSD sorting sorts the array of strings by scanning least significant digit first (that is the right most position of the string) and continues to scan till left most position of the string. For each iteration of the position, it uses the Key Indexed Counting method to sort the elements, as per the relative order of the character at that particular position. The algorithm moves the position one-by-one from right to left, and moves the elements accordingly within the array, as per the order of the character in the strings at that position. This sorting method is useful to sort strings having same number of characters (same length).

MSD sorting (a.k.a Radix sort) - is the best method to sort array of strings having variable length. The process involves scanning the most significant digit first (left most position of the string) and continuing till the right most position of the string or the end of string is reached.

Since we are sorting variable length strings, special care must of taken to handle end of string. Also, the algorithm makes use of special purpose Insertion sort when the sub-arrays reach below certain threshold, to increase the overall efficiency of the algorithm. Again we use the principle of Key Indexed Counting to recursively sort the sub-arrays, until the last character of the longest string is processed or end of string is reached.

3 way Quick sort for strings - is another efficient string sorting algorithm, with performance characteristics similar to MSD sort. One advantage of this algorithm over the latter is the amount of less space it takes, in the overall execution of the program.

Three way Quick sort algorithm, divides the array into 3 partitions, where the first partition will have elements less than pivot, second partition will have elements equal to pivot and third partition will have elements greater than pivot.

Algorithm Time Complexity Space Complexity Is Inplace? Is Stable? Best suited for
LSD Sort Ο (NW) N No Yes short fixed-length strings
MSD Sort (Radix sort) Between N and Nw N + WR No Yes random strings with variable length
Threeway Quick sort Between N and Nw W + logN Yes No general-purpose strings with long prefix matches
  • N - Total number of strings to be sorted
  • W - Max length of the strings
  • w - Average length of the strings
  • R - Total types of characters, that can be used to form a string

Trie is a data structure, which enables us to search a particular string by tracing the path along each characters within the string. There are multiple implementations of this data structure, we will consider some of them.

In R-way trie, each node will have R possible paths, going down from them, where R represents the radix (which is 256 characters present in ASCII character set). In the TrieST, we will have key as concrete String type and value as generic. Following methods are supported by the data structure.

  • get(String key) - Gets the value associated with a particular string key
  • put(String key, V value) - Associates a value with a particular key
  • keys() - Lists all the keys, inserted into the trie.
  • keysWithPrefix(String prefix) - Lists all the keys, which starts with the prefix
  • keysThatMatch(String pattern) - Lists all the keys, which match a wildcard pattern.
  • longestPrefixKeyOf(String s) - Gives a key, which is the longest prefix of given string s
  • delete(String key) - deletes a particular key

Time Complexity - Search hit - O(W), Search miss O(log [base R] N ) - where R is radix, W is length of the string and N is total number of strings in trie.

Space Complexity - Between RN and RNw where w is average length of the strings.

Ternary Search Tries

Ternary Search Tries (TSTs) addresses the shortcoming of the R-way Tries where each node occupies the R-links corresponding to each characters, by having only three links in each node. In addition the node will have a character and a value.The three links corresponds to keys whose current characters are less than, equal to or greater than the node's character.The concept is similar to Binary Search Tree.

Time Complexity - The time complexity of search/insertion operation are O(log N) where N is the average number of keys inserted into the Trie.

Space Complexity - Space required to store N keys with an average length of w is between 3N and 3Nw.

algo_datastructures's People

Contributors

sanbhat avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.