Code Monkey home page Code Monkey logo

Comments (11)

JoshDevHub avatar JoshDevHub commented on May 30, 2024 1

Hi @dziubenzo . Thank you for making this issue.

Is the key argument to the set() method really necessary? I think the reason for its inclusion is to be able to edit values already added to the hash table. If that's the only reason, then I would suggest splitting this method into two separate methods: add() and update(). The former would only take a value as an argument, and it would internally generate the hash code using the hash() method to insert the value into the table. The latter would just take a key as a parameter and update/override the existing value as long as the key-value pair is in the table.

Yes, it is absolutely necessary. Consider something like const obj = {};. Can you set a key value pair in this object without providing a key? Like even beyond overwriting, if you wanted to insert a new value 1 into that object, you couldn't do it without a key of some kind (ie obj["key"] = 1).

I would mention the requirement to grow buckets size later on (preferably after the values() method) or add it as a separate (private) method to be implemented. To my knowledge (I might be missing something here), implementing buckets resizing requires both length() and values() methods to exist, so to me, it seemed out of place being mentioned in the set() method.

It's very important the growth logic is handled inside of the set() method. If this isn't the case, then the hash can easily overrun the load factor and cause a lot of collisions, which will lead to poor performance.

Now you can potentially offload this to a different (probably private) method that gets called inside of set(). That'd be a great thing to do. But that's also an implementation detail that the project's specifications shouldn't guide you on. It's just reminding you that you have to address the growth of the buckets inside the set() method, which I think is a good reminder.

I would suggest adding an extra method for halving buckets size when the number of hash table entries is small enough to be halved. For example, reducing hash table entries from 15 to 11 should trigger the halving event to decrease the hash table size from 32 to 16 (in line with the load factor of 0.75). To me, halving buckets size makes sense despite space being rather cheap these days, but I don't know if such "optimisation" tricks are employed in real-life situations.

This optimization strategy isn't in common use. A big reason is because you potentially lose certain guarantees around the time complexity of the hash map's operations.

Using your example, it's easy to imagine a scenario where a hash map is moving between the sizes of say 10 and 20 items. In the usual implementation, it would grow from a 16 item capacity to a 32 item capacity a single time and then just sit with the 32 item size. In your example implementation though, the hash map has to be repeatedly resized. Each resize is an O(N) operation. Optimizing these costly operations away is more important than optimizing for memory usage.

This one is just for clarity: I would update the set() method description to include what it should do. The other methods say so, but this one is missing its purpose although it is clear what it should do from the context.

I could potentially be interested in a change around this. Do you have any suggestions for the wording to use?

I would add some more context as to why a hash set is useful. To me, doing the Extra Credit task seemed kind of boring (for the lack of a better word) as it is very similar to the HashMap class, but it does not store values, so there's not much new code to write. I think adding some more context would be helpful to others (and to me as well).

This could be a good idea. They are very useful for certain things, but I don't guess the curriculum currently gets into them. I'll think about this one and talk to other maintainers and the lesson's author to see what they think.

Let me know if you have any questions about what I've written above.

from curriculum.

dziubenzo avatar dziubenzo commented on May 30, 2024

@JoshDevHub, thank you for your time and explanations. As for the wording, I think something simple like this would do the trick:

  1. set takes two arguments, the first is a key and the second is a value that is assigned to this key, and inserts a key-value pair/node into the hash map. If a key already exists, then the old value is overwritten.

On second thought, it is perhaps best to mention the grow buckets requirement in the set() method rather than later on because mentioning it after implementing the values() method would potentially guide students to how buckets resizing should be implemented. However, the point I tried to make with mentioning it further in the project was not to avoid growing buckets altogether in the set() method (it is clear that it is very important) but rather to maintain the order in which methods for students to implement are introduced. Right now, to fully implement point 3 (set()), you need to implement point 7 (length()) and point 10 (values()).

Have a great day!

from curriculum.

silenceinspace avatar silenceinspace commented on May 30, 2024

Hi guys. I have not completed the project yet; however, I felt the strong need to briefly comment on this issue as I go (very likely will come back to add more thoughts once I am done)

I agree this lesson feels a bit incomplete in a sense that there could be more information added mostly for clarity. The one obvious example is set(). For some reason, it causes a lot of confusion with building a hash map, more specifically how it is supposed to handle collisions.

set() takes two arguments, the first is a key and the second is a value that is assigned to this key. If a key already exists, then the old value is overwritten.

For me it does sound as though we simply don't care about resolving collisions and instead just overwrite the values assigned to keys. Let's say there's key "1": "Jack", based on the ambiguous wording, I naturally asked myself "but what about a linked list? shouldn't we learn how to use it with this project?". Since the instruction literally stated "overwrite the old value", I interpreted it as "1": "Ben" (another value with key "1" comes up later, which means "Jack" is gone).

This moment started getting only more clear when I needed to implement get(), has(), and remove() methods. I was confused how the hell it can turn into an algorithm with a time complexity O(1) when because of the wrong implementation of the main bucket (due to the ambiguity of set() method), I had O(n) even without a linked list. I couldn't access the value "Jack" directly because it was an object that is an element of an array. So to access the key, I firstly needed to iterate over the array's elements and inside each index to check whether my key was found. Am I missing something or, in our case, there is no way we can make operations with a constant time? (For example, a scenario with a default JS object comes to my mind because there we have that constant time)

I know it's a very new lesson so it is expected that things will be improved. That's why I will try to see what other suggestions I could mention.

UPD: thanks to one person I figured where my logic went wrong. I don't know if it was due to a lack of clarify in this project's assignment or in the previous lesson on Hash Maps, but I will try to see what the main struggle was to come back here and suggest some things

from curriculum.

silenceinspace avatar silenceinspace commented on May 30, 2024

@JoshDevHub Alright, as promised, I am back after finishing my own hash map:D

I have taken a bit of time to discuss things that have been rather vague not only for me but for a couple other students too. Here is the part where, in my book, there is a need for extra clarification:

  1. For some reason, a "key" gets interpreted a lot as a hash code. So some people, including myself, tried to access data from the main array by the key and not the hash code it produced. And it immediately led to building the wrong hash map skeleton. I suggest one more time mentioning that a key is what gets hashed but is never what gets accessed with directly. It is just a pointer.

  2. It would be nice if the set() method, where it states to overwrite the old value, emphasized what a collision is one more time. For example:

Remember, a collision is when TWO DIFFERENT keys sit inside the same bucket. When it's the same key just with different values, it is NOT a collision.

  1. Inside the hash() method or even in the previous lesson, it is important to bring up the point how to handle hashing larger inputs. Due to the limits of JS, if a hashed code is more than 2^53 + 1, any operations with it lose their accuracy. So when we later use the modulo operator to put our key into one of the array's buckets, it will always end up at HashMap[0]. My suggestion:

In our previous lesson, we did not consider large keys. However, it is important to keep in mind that there is a number limit in JavaScript. The limit is 2^53+1 or 9,007,199,254,740,991. Operations with numbers greater than that value will result in incorrect outputs. Thus, we want to take care of it either by limiting the size of a key or finding a technique to keep our large keys' accuracy.

  1. With this initial mentioning of the JS array's limitation, I got extremely confused. Well, I understand why it is mentioned and it is indeed important. But I think the assignment misses the case when the main bucket (outer array) is used with an index to enter a "smaller" bucket. Let's say the size of our hash map is 16, if for some reason we want to access "folder" 20 (hashMap[20]) — nothing prevents us from doing so, it will just throw undefined. I realize that it is not as easy to implement the restriction here as it is for the smaller buckets. Because it probably requires modifying the array's prototype which sounds rather ambitious. But in that case, I would appreciate more specificity where to put that restriction function and that it is just an "imitation" that has no effect inside our project.

Use the following snippet whenever you access a bucket through an index. We want to throw an error if we try to access an out of bound index:

if (index < 0 || index >= buckets.length) {
   throw new Error("Trying to access index out of bound");
}

Anyways, it looks like most of these issues with clarity come from the previous lesson. I don't know if it's better to add extra info/resources there or in this project. But either way, I am sure these points resonate with more people. Curious to see what your thoughts are :)

P.S. if there are small improvements such as set(key, value) instead set() in the assignment, I can just open a separate pull request for them, right?

from curriculum.

JoshDevHub avatar JoshDevHub commented on May 30, 2024

Apologies for temporarily forgetting about this issue.

@dziubenzo

On second thought, it is perhaps best to mention the grow buckets requirement in the set() method rather than later on because mentioning it after implementing the values() method would potentially guide students to how buckets resizing should be implemented. However, the point I tried to make with mentioning it further in the project was not to avoid growing buckets altogether in the set() method (it is clear that it is very important) but rather to maintain the order in which methods for students to implement are introduced. Right now, to fully implement point 3 (set()), you need to implement point 7 (length()) and point 10 (values()).

Yeah I see what you're saying, but I'm also not sure of a good way to solve it? Maybe there could be a note about how those other methods can help with implementing the growth operation?

I would probably accept a PR on that.

@silenceinspace

For some reason, a "key" gets interpreted a lot as a hash code. So some people, including myself, tried to access data from the main array by the key and not the hash code it produced. And it immediately led to building the wrong hash map skeleton. I suggest one more time mentioning that a key is what gets hashed but is never what gets accessed with directly. It is just a pointer.

Sure, that sounds reasonable.

Inside the hash() method or even in the previous lesson, it is important to bring up the point how to handle hashing larger inputs. Due to the limits of JS, if a hashed code is more than 2^53 + 1, any operations with it lose their accuracy. So when we later use the modulo operator to put our key into one of the array's buckets, it will always end up at HashMap[0]. My suggestion:

I initially didn't think this would be an issue but just looking and wow it is pretty easy to run into this problem with JavaScript using the given hashing strategy in the lesson content 🙃

I think your suggestion is mostly okay, but I might want to go even further on this and just give people a direct solution to the problem. I don't think we really want learners going down the rabbit hole on 64bit floating point number precision. The curriculum hasn't given learners any tools/knowledge around understanding binary representation of numbers.

I'll think on this a bit more and talk to some other people and get back to you. Nice job spotting this problem.

With this initial mentioning of the JS array's limitation, I got extremely confused. Well, I understand why it is mentioned and it is indeed important. But I think the assignment misses the case when the main bucket (outer array) is used with an index to enter a "smaller" bucket. Let's say the size of our hash map is 16, if for some reason we want to access "folder" 20 (hashMap[20]) — nothing prevents us from doing so, it will just throw undefined. I realize that it is not as easy to implement the restriction here as it is for the smaller buckets. Because it probably requires modifying the array's prototype which sounds rather ambitious. But in that case, I would appreciate more specificity where to put that restriction function and that it is just an "imitation" that has no effect inside our project.

It should never look at the bucket 20 inside a hash map of size 16. Your function that hashes to an index within the buckets array should prevent this 100% of the time. This is why hashCode % bucketLength should be used. I'm definitely open to suggestions if you feel that's not clear enough in the lesson and/or project, but I think the project is correct to say an error should be thrown if the generated index is above the bounds of the bucket array.

P.S. if there are small improvements such as set(key, value) instead set() in the assignment, I can just open a separate pull request for them, right?

That's a good observation. The other CS projects provide parameter information like that, so this one probably should too. Feel free to make a PR for that.

from curriculum.

silenceinspace avatar silenceinspace commented on May 30, 2024

@JoshDevHub It is totally fine, no worries.

Yeah I see what you're saying, but I'm also not sure of a good way to solve it? Maybe there could be a note about how those other methods can help with implementing the growth operation?

I am not the one who dug in that direction but here is my take — I think just a note mentioning that the later methods could be of some help to achieve the resizing functionality sounds decent to me.

I initially didn't think this would be an issue but just looking and wow it is pretty easy to run into this problem with JavaScript using the given hashing strategy in the lesson content 🙃

Yes, using the modulo operator at the end of our accumulated hash easily causes that issue. So that is why at first I was not sure if we were supposed to provide such large inputs in the first place. There was one interesting solution by Sokolan inside the solution section of the project. That person avoided that bug by applying % on each iteration which means our hash cannot get larger than a value on a range from 0 to MapLength. It is interesting how the modulo at the end and the module on each iteration do not differ in what they output. I am not sure why it works though:D

Fair enough, bringing up 64bit floating point number precision is unnecessary. I personally like the solution with the BigInt() method and some conversions. It is like one of those rare occasions where I really got to appreciate it and overall the JS bigInt data type, which is usually listed among the other 6 primitive ones. But I had never had the chance to run into it before. So looking forward to hearing from you on this one.

It should never look at the bucket 20 inside a hash map of size 16. Your function that hashes to an index within the buckets array should prevent this 100% of the time. This is why hashCode % bucketLength should be used. I'm definitely open to suggestions if you feel that's not clear enough in the lesson and/or project, but I think the project is correct to say an error should be thrown if the generated index is above the bounds of the bucket array.

I agree. The usage of hashCode % bucketLength is crucial here as it does prevent the type of overflow 100% of the time. But in that case, even for imitating purposes, I consider putting that restriction function, which will never throw an error, rather strange. So I am really debating if it should even be there in the first place.

Gotcha. I am definitely going to open a pull request to make a couple more small improvements of this kind.

from curriculum.

JoshDevHub avatar JoshDevHub commented on May 30, 2024

I'm going to create a bit of a todo list here so I can keep track of progress on the work with this:

Tasks

  • Clarify the definition of a "key" and its relationship with a hash code.
  • In the instructions for the set() method, mention that some of the following methods may be useful helpers while implementing it
  • In the instructions for the set() method, reiterate the definition of a collision and the difference between situations where you overwrite a value vs. resolve a collision.
  • Add parameter values to the JavaScript version of the project #27203
  • Add parameter values to the Ruby version of the project
  • Handle numbers exceeding the maximum safe size on 64 bit floating point numbers (JavaScript only)

from curriculum.

JoshDevHub avatar JoshDevHub commented on May 30, 2024

Yes, using the modulo operator at the end of our accumulated hash easily causes that issue. So that is why at first I was not sure if we were supposed to provide such large inputs in the first place. There was one interesting solution by Sokolan inside the solution section of the project. That person avoided that bug by applying % on each iteration which means our hash cannot get larger than a value on a range from 0 to MapLength. It is interesting how the modulo at the end and the module on each iteration do not differ in what they output. I am not sure why it works though:D

Been thinking about this some, and I think this way is probably my favorite because it solves the problem and requires minimal explanation around why it has to be there.

I was going to recommend using Number.MAX_SAFE_INTEGER with a little extra logic to simulate integer overflow, but this way is much simpler lol.

I'm a bit disinclined to use BigInt because it performs poorly compared to plain Numbers.

I agree. The usage of hashCode % bucketLength is crucial here as it does prevent the type of overflow 100% of the time. But in that case, even for imitating purposes, I consider putting that restriction function, which will never throw an error, rather strange. So I am really debating if it should even be there in the first place.

Sometimes errors are used defensively. The error is never thrown if everything is working as intended. Good. That means we know for sure that things are working as intended.

If you ever get into writing TypeScript, you'll see that kind of pattern regularly.


If either of you want to pick up items on the task list, feel free to post here and let me know that you're working on it. Also let me know if I overlooked something when making the task list (bound to happen).

from curriculum.

silenceinspace avatar silenceinspace commented on May 30, 2024

@JoshDevHub Creating a todo in this thread to keep track of progress on improvements is a great idea. If possible, I would like to work on all the items on the todo (going to the ruby course and simply adding parameters inside function declarations does not seem like a lot of work. Therefore, even though I don't know any Ruby, I am willing to take this one too)

Speaking of the overlooked changes that were brought up in this thread earlier:

  1. For some reason, a "key" gets interpreted a lot as a hash code. So some people, including myself, tried to access data from the main array by the key and not the hash code it produced. And it immediately led to building the wrong hash map skeleton. I suggest one more time mentioning that a key is what gets hashed but is never what gets accessed with directly. It is just a pointer.
  2. It would be nice if the set() method, where it states to overwrite the old value, emphasized what a collision is one more time. For example: Remember, a collision is when TWO DIFFERENT keys sit inside the same bucket. When it's the same key just with different values, it is NOT a collision.

So I am waiting for your final confirmation that these should also be worked on as well as your willingness to assign me on the original todo.


I was going to recommend using Number.MAX_SAFE_INTEGER with a little extra logic to simulate integer overflow, but this way is much simpler lol.

Yes, using Number.MAX_SAFE_INTEGER with a bit of extra logic is something that crossed my mind too. But if we can minimally tweak the hash function from the previous lesson by simply applying % earlier than at the end, why not go for it? Just like you said:D

It is interesting that you marked the 64 bit floating point number precision problem as "only JavaScript". From what I have briefly gotten from googling, it seems that Ruby does not suffer from big numbers and their potential inaccuracy because it, under the hood, uses the bigInt() constructor. Is it true? Also, you mentioned that bigInt performs poorly. In that case, is Ruby capable of taking care of that performance better as opposed to JavaScript?

Sometimes errors are used defensively. The error is never thrown if everything is working as intended. Good. That means we know for sure that things are working as intended.
If you ever get into writing TypeScript, you'll see that kind of pattern regularly.

Fair enough. Sometimes things end up working well on the first try and you are happy because of that. But it is important to be aware that at any moment it can all go down. So I may not be able to appreciate this habit right now, but planning to get into TypeScript eventually I will keep it in mind. Well, in fact, I am currently diving into TDD with JavaScript and I think testing somewhat supports your words.

from curriculum.

JoshDevHub avatar JoshDevHub commented on May 30, 2024

@silenceinspace Those other two are great ideas. I'll add them to the todo list.

It is interesting that you marked the 64 bit floating point number precision problem as "only JavaScript". From what I have briefly gotten from googling, it seems that Ruby does not suffer from big numbers and their potential inaccuracy because it, under the hood, uses the bigInt() constructor. Is it true? Also, you mentioned that bigInt performs poorly. In that case, is Ruby capable of taking care of that performance better as opposed to JavaScript?

It's true that Ruby eventually moves over to using Bignum under the hood, but it has to do this later than JS does. The thing that's unusual about JS: even its "integers" are modeled as 64 bit floating point numbers. In a float, some bits must be preserved for the significand. This creates a lower ceiling for its maximum integer value relative to most other languages (2^53 instead of 2^64). In the context of the project, that means it doesn't take a key with very many characters before there's trouble with JS.

We could probably change the Ruby implementation as well though. It's a really simple fix and shouldn't require much explanation.

from curriculum.

JoshDevHub avatar JoshDevHub commented on May 30, 2024

Also you're absolutely welcome to pick any of those items up and submit a PR for them.

from curriculum.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.