alexandru / scala-best-practices Goto Github PK

View Code? Open in Web Editor NEW

4.4K 4.4K 621.0 184 KB

A collection of Scala best practices

scala-best-practices's Introduction

Scala Best Practices

A collection of best practices, friendly to people that want to contribute.

Version: 1.2
Updated at: 2016-06-08

Contribute

Open an issue to make suggestions, or create a pull request ;-)

scala-best-practices's People

Contributors

Stargazers

Watchers

Forkers

pekermert bigdragon1977 iulianu kornel abbi-gaurav alexsisu gotska guxg drkarl shafiahmed arnfred jamesoram qz267 jamerlan leothekim alexislitool gokhanayhan38 dvorobiov rodrigoparada lishenghz daishichao abcijkxyz flowerbear yoyojade xwd0520 magj2006 rudyli sxfmol fbukevin l-yang nsfc sunguo wangwenchao giserh mmallad cloudlander azureplus yuwentao coderxiang frankswu afthill caohongwei programmer-util lb1987 dursunkoc umit bhagatrawat ultimape 0lcnxr1 hjforever viewtiful jxlin ucichuck baidushine guanxiaochao heguangwu sanshi0518 jasonpanacea suntofly ilovejs t-koshi hcchen adam-singer tangboy jfmwz jiayuzhou zazrivec realmeh bunyod szanlin bigxing aliten shijinkui yuyongyang800 watermen defaultrobot jgenso pingxumeng fhsplau suninus hongbinzuo newsky nermin patpenetrante lhfei behrad modalsoul koodex hectorgool ufricke ptanh007 d3rzky abdheshkumar erimerturk viettn85 f1llon julienlafont-tabmo weixu02 noordinaryguy rocketlawyer

scala-best-practices's Issues

Avoid type inference for public methods' return type

At least for public methods, prefer this:

def someFunction: SomeType = ...

to this:

def someFunction = ...

in order to avoid:

the need to rely on IDE or to inspect implementation in order to understand what is the actual return type
unintended actual return type due to some implementation bug. E.g. one logic branch returns SomeType, another logic branch returns SomeOtherType => actual return type may be AnyRef, while SomeType was actually intended. If return type is specified explicitly, such a situation would result in a compilation error, otherwise not.

garbage-collector

https://github.com/alexandru/scala-best-practices/blob/master/sections/2-language-rules.md#215-must-be-mindful-of-the-garbage-collector

that's bullshit

If profiling suggest a refactoring, one could use http://scala-blitz.github.io/ or Streams

It is tempting, but you should almost never define nested case classes inside another object/class because it messes with serialization. The reason is that when you serialize a case class it closes over the "this" pointer and serializes the whole object, which if you are putting in your App object means for every instance of a case class you serialize the whole world. Prefer flat hierachies.

In the actors section you are nesting case classes within your actors, where they are quite likely to get serialized :)
Much like calling the scheduler from within your actor, case classes should be broken out into their own scope

Scala `return` actually throws `NonLocalReturnControl[A]` now

The section 2.1. MUST NOT use "return" states:

In Scala, a return statement inside a nested anonymous function is implemented by throwing and catching a NonLocalReturnException. It says so in the Scala Language Specification, section 6.20.

Actually the NonLocalReturnException was renamed as NonLocalReturnControl, and related content in the pdf is not up to date.

refs: https://issues.scala-lang.org/browse/SI-4862

Add a rule about best / bad practices for implicits and context bounds

I feel like there should definitely be some guidelines about this, as it can be (and is) very easily abused.

I've been playing around with code to help demonstrate some ideas, hopefully I can share the insights in the not so far future. Also, if you guys have any input on this, please do share.

-Stas

Make a rule on not using Option.get

Advice on not using Option.get is specified as part of 2.9, but it's not enough - it should be its own rule.

2.6 issues

I have 2 issues here, the first is a that the sentence "And this freedom is important, because a val for example restricts the way this value can get initialized - only at construction time." is not totally correct: An abstract val may be implemented by a lazy val which is initialized on first use.

The second and more important is about interface design. Yes a parameter-less def gives more flexibility in implementing. But on the other hand, it makes less of a promise to the interface user. Sometimes I really do want to promise that a property is "stable" (or immutable), as in:

trait Foo {
 val value: String
}

def run(foo: Foo): Unit = {
  val prompt = foo.value + " >"   // or something more expensive
  while(true) {
    print(prompt)
    // read & do something
  }
}

Here Foo promises a stable value, which may not change. Therefore I do not need to re-calculate prompt on every iteration of the loop.

I suggest adding "... unless your interface needs to promise a stable value" to the rule.

Add // bad example - for clearly marking bad code samples

Add rule on avoiding optional function parameters

2.3 Clarity

Hi, I am new to scala and stumbled on this repo. So far its great.

I am trying to replicate the advice found in 2.3 and can't get a working code sample. This part could be more noob friendly if possible.

For example, I created a list

def main(args: Array[String]): Unit = {
    var myList: Array[Double] = Array(1.9, 42.0, 25.0, 11.0)

    // summing all elements java way
    var sum = 0.0
    for (num <- myList) {
        sum += num
    }
    println(sum) // 6.0

    val sum2 = myList.map(_.value).sum // does not work...
    println(sum2)

    val sum3 = myList.foldLeft(0.0)((a, b) => a + b) 
    println(sum3) // 6.0

The value is not recognized by my IDE. Thoughts?

2.17 Nested case classes?

SHOULD NOT define case classes nested in other classes

Are nested case classes a problem? There's quite a few instances where you might need them, e.g. representing database rows that may have serialized columns.

Update rule 2.11 to discourage use of joda-time in favour of java.time

Rule 2.11 states:

2.11. MUST NOT use Java's Date or Calendar, instead use Joda-Time or JSR-310

JSR-310 has now been implemented as Java 8's new java.time API and even jodatime's website is asking users to migrate:

Joda-Time is the de facto standard date and time library for Java prior to Java SE 8. Users are now asked to migrate to java.time (JSR-310).

As Java 8 has become more and more common, it's probably time to deemphasise joda-time and recommend the java.time API as the standard approach, with joda-time as a fallback for people stuck in pre-Java 8 land.

Usage of underscore

I would like to see some convergence about best practices involving underscore.
IMHO underscore should be read by humans as "I don't care", "it does not care", "it does not matter", "ignored" or "irrelevant".

So, if you find a snippet of code like this:

  val parseFile: Transformer[File, Reading] = _.concatMap { file => ... }

... you should read as "it does not care".concatMap or "ignored".contactMap or "irrelevant".concatMap, right? Well... this is obvious that the underscore in this case is key for understanding what the code does, which means that I should care about the underscore, it cannot be ignored and it is definitely relevant.

The compiler may be plenty able to identify what the underscore is about (in microseconds!). However, humans may take several minutes to understand what this code is about.

Humans should write code primarily for other humans understand.

A best practice would be writing clean code (no magic!) targetting a human audience.

Rule 5.1 needs a good example

5.1. SHOULD evolve the state of actors only in response to messages received from the outside

This rule shows an example of the anti-pattern but gives insufficient guidance on how to do better. This is exacerbated by the back-pressure rule, 5.4, which does precisely the anti-pattern in its code sample, albeit with an admonishing comment.

Use if-else vs pattern matching for booleans

The general guideline for Scala developers is to prefer pattern matching over if-else which is true. However the compiler generates lot of code under the hood if pattern matching used for booleans. For booleans, plain old if-else is still more efficient.

Reference: http://stackoverflow.com/questions/9266822/pattern-matching-vs-if-else

Add a rule: case classes SHOULD be final

As extending a case class would lead to unexpected behaviour, a rule should be added that case classes should be final.
Why case classes should be final

Rule 3.1 example?

Would be nice to demonstrate a best practice instead of a cake pattern implementation.

Reference to JMH, Scala Meter

While highly agree that benchmarking is hard and proper tools need to be used. I have some question regarding a reference to Google Calibre. I don't want to open flame war but would it be better to reference a JMH and Scala Meter projects here?

Google Calibre vs JMH discussion:
https://groups.google.com/forum/#!msg/mechanical-sympathy/m4opvy4xq3U/7lY8x8SvHgwJ

Add a rule for NOT using Seq.head

What about a rule for NOT using Seq.head?
http://www.bks2.com/blog/2012/12/31/head_vs_headOption/

Awesome project by the way 😄

3.4. SHOULD be mindful of the garbage collector — Incorrect example

https://github.com/alexandru/scala-best-practices/blob/master/sections/3-architecture.md#34-should-be-mindful-of-the-garbage-collector

someCollection
 .filter(Set(a,b,c).contains)
 .map(_.name)

First of all, this creates a Set every single time, on each element of our collection.

This is not true. Set is created only once and then its contains function is passes as an argument to filter method. This function is then invoked per each element of someCollection.

Compare to similar code snippet:

someCollection
 .filter(Set(a,b,c).contains(_))
 .map(_.name)

In this case a wrapping function is be created around Set(a,b,c).contains, something like x:Int => Set(a,b,c).contains(x) and this function is then passed as an argument for filter. Set is created inside wrapping function, so it's creation is performed once per element of someCollection.

Also, in this item it is worth to mention eagerness of monadic operations on scala collections.
In provided example:

  .filter(bySomething)
  .map(toSomethingElse)
  .filter(again)
  .headOption

by default each step produces new collection (opposite to laziness of C#'s linq), which produces lots of garbage when chain is long.

Easy way (though no obvious) is to add .view at the beginning of the chain. This makes whole chain lazy.

Rule 5.2 could mention possible memory leak

If I'm correct:
The unbecome feature is awesome, but it allows to travel back in time. The given example has the complete history of Sets that in the order the items were added.

Writing effective tests?

What do you suggest about writing effective and quick unit tests?
Your suggestion about not to use cake pattern (or any DI technique) basically helps us abstract things so that code can be isolated and tested by mocking its dependencies.

Lets take an example of a Play controller.

We need to dependencies to build a controller: "Service" and "Repository" where service handles the business logic and deals only with domain objects. The repository talks to the DB. The controller talks to the services, converts the domain object to json, xml, html etc.

Now, if I need to test only the service, I need to mock the repository which has a defined interface. How do you suggest that should be dealt with?

Testing controller can be an integration test but service tests about be unit tests and should take <4 seconds to complete.

I am coming from a background of constructor injection (http://misko.hevery.com/2009/02/19/constructor-injection-vs-setter-injection) and trying to implement the credit card processor example the scala way.

explicit section/subsection numbers make it harder to contribute

The explicit section/subsection numbers make it harder to contribute.

I realized this when I wanted to make a contribution to a section, while the same section had an ongoing pull request using a new subsection number. This leaves me with two choices:

Pick the same section number as the other pull request. Someone will then need to rebase, depending on which pull request gets merged sooner.
Pick the next section number based on the other pull request. This assumes, that the other pull request will be accepted and I would need to base my change on that pull requests branch.

Both choices are sub-optimal.

I would like to bring the decision up for discussion to remove all explicit section numbering from all markdown sources. I think, we don't need them. Removing them has the following advantages:

less fuss, easier contributions

As explained above.
no broken links

It may happen, that a section or subsection gets removed. Then, first, you would need to change all following numbers. All saved/bookmarked/referenced hyperlinks would be broken (especially outside of the project!), because the section number and subsection number is in the hyperlink, e.g. https://github.com/alexandru/scala-best-practices/blob/master/sections/2-language-rules.md#29-must-not-use-null
less visual noise

Looking at the TOC in the README, the section numbers don't serve a real purpose. You can find your way around just as well without them. The order of the sections is important, e.g. preface before hygienic before language, etc. You don't need section numbers for that and, in my opinion, they don't add any value.

Also note, that the content this project provides can only be viewed within the confines of GitHub at the moment. I am a strong advocate of GitHub pages and similar tools that automatically convert your stuff to nice looking web pages with less visual noise. This provides a very structured way of creating full-fledged web pages to view the content without all of the visual noise of the GitHub Web UI.

As an example, take a look at a small git knowledge base I prepared for my users at work: this is the repository and this is the resulting, automatically created, GitHub page. The GitHub page looks a lot nicer, doesn't it? It also has way less visual noise so it is less distracting and thus easier to read and learn.

As a teaser: I think, it is even possible that GitHub pages and friends have stuff that allow you to create multilingual pages within a single repository more easily. I don't know for sure, though, as I have not yet done that myself.

Suggestion SHOULD NOT use scala.App

Basic reasoning is:

DelayedInit is being deprecated.
The syntax is difficult to read.
Any variable declarations become field accessors, which is not advisable in most cases.

Basically scala.App feels like an anti-pattern when compared to def main(args: Array[String]) = {}.

2.17. SHOULD NOT define case classes nested in other classes

I saw a lot of people using this for akka-actor messages, like

class MyActor extends Actor {
...
}

object MyActor {
   case class MessageA
   case object MessageB
}

Would this be okay (as the case classes are static) or implies 2.17 that MyActorMessageA would be a better choice?

Add rule on not doing TODO comments

Rule 4.3 needs clarifications

Clarify when it's OK to wrap CPU-bound stuff in the Future apply builder
Add rule for using 'Future.successful` instead of the Future apply builder for instantiating Futures with already known constants or things that are easy to calculate

[question] Usage of return

One question I had was what if you explicitly want control flow to return to the calling function?

For example, in a web controller function I have the following:

def index(): Action[AnyContent] = Action.async { implicit rq: Request[AnyContent] => 
  rq.getQueryString("permission-field") match {
    case Some(pf) => 
      if (!permissioned.contains(pf)) return Action(Forbidden)
      else // do other stuff
    case None => 
      Future { BadRequest }
  }
}

IMPORTANT: Rewriting history of repository

This is of special interest to @Peranikov, @piruty-joy, seeing that they are working on a Japanese translation. And for anybody else with active forks.

I've rewritten the whole history of the project to hide my emails behind [email protected]. This is because GitHub has become a spam problem for me and I do not want to expose my real emails anymore. I've followed the tutorial here and have only touched my own commits: https://help.github.com/articles/changing-author-info/

Existing forks have to rebase off the new master, otherwise your branch is going to be divergent. The recipe for doing that goes something like this:

BACKUP
DO NOT start with a git pull
git fetch
git rebase --onto origin/master master <local_branch> (for local each branch)
git pull --force

Sorry for the inconvenience and let me know if you need help.

Rule 4.4 example

I think the example can be simplified to this:

import java.util.concurrent.Executors
import scala.concurrent.{Await, Future, ExecutionContext}
import scala.concurrent.duration._

implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
Await.result(Future(Await.result(Future("foo"), Duration.Inf)), Duration.Inf)

Which will wait forever, but if the thread pool is increased to 2 threads it will return "foo" immediately.

It is not necessary to spawn several futures from inside a future (addOne() twice). It is awaiting (blocking) on a future inside another future that causes the lock (exhausts the pool). This isn't as clear as it could be in the example.

Usage of blocking in this situation will not help however since it is a fixed thread pool, for which there is no remedy.

jsr-310 as alternative to joda-time

I suggest to also point to java 8 date/time api (jsr-310) as it's supposed to be even better than joda-time. http://blog.joda.org/2009/11/why-jsr-310-isn-joda-time_4941.html

MUST NOT use capital letters but for the most important things

Seriously, MUST NOT / SHOULD NOT makes the overview hard to read.

Rule about updating a var for loops or conditionals

Specifically, "2.3. SHOULD NOT update a var using loops or conditions"

I think it would be good to create some examples of when this should be done other than the toy examples provided. A good example of this is a problem I'm wrestling with now, where I want to read the records from a relational database table in batches, and on each iteration I need to look at the last id returned from the previous result and check to see if there are any records and I should continue to iterate. I've asked this question all over the place and haven't gotten a suggestion that fixes the problem and is performant.

I was able to avoid the vars using Stream the memory utilization is just awful on large tables, but I'd like to see some guidance in a best practices document for dealing with these sort of real-world situations, or some reinforcement that this is a case where you need to observe the "SHOULD" of the "SHOULD NOT".