Code Monkey home page Code Monkey logo

Comments (13)

enjoy-digital avatar enjoy-digital commented on August 18, 2024 1

Awesome work @jiegec! Thanks for sharing the issues/process, this has been very useful to also understand your changes to the codebase. I also read your blog post to understand your changes (If you don't mind, I added a link to it in listed it to https://github.com/enjoy-digital/litex/wiki/Tutorials-Resources#useful-resources).

The VCU128 and Clamshell LiteDRAM/LiteX support seems fine and we'll be able to merge them. Before doing so, I just want to think a bit about the Clamshell support to LiteDRAM and see how we could eventually reduce changes in the codebase. I'll get back to you in a few days.

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

I isolated the module 0 via:

                pads             = PHYPadsReducer(platform.request("ddram"), [0]),

The calibration result:

Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write leveling:
  tCK equivalent taps: 560
  Cmd/Clk scan (0-280)
  |11111  |11111  |11111  |11111| best: 0
  Setting Cmd/Clk delay to 0 taps.
  Data scan:
  m0: |111111111111111111111111| delay: 00
Write latency calibration:
m0:2
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |00000000000000000000000000000000| delays: -
  m0, b02: |00000000000000000000000000000000| delays: -
  m0, b03: |00000000000000000000000000000000| delays: -
  m0, b04: |00000000000000000000000000000000| delays: -
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b00 delays: -
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB
   Read: 0x40000000-0x40200000 2.0MiB
  bus errors:  136/256
  addr errors: 0/8192
  data errors: 524288/524288
Memtest KO

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

By reading the datasheet, module 0, 2, 4, 7 corresponds to U74 and U73. They are located in the bottom side of clam-shell.

I see: there is a dedicated CS_B pin for the bottom half in the xdc:

set_property PACKAGE_PIN BK48       [get_ports "PL_DDR4_BOT_CS_B"] ;# Bank  66 VCCO - DDR4_VDDQ_1V2 - IO_L7P_T1L_N0_QBC_AD13P_66
set_property IOSTANDARD  SSTL12_DCI [get_ports "PL_DDR4_BOT_CS_B"] ;# Bank  66 VCCO - DDR4_VDDQ_1V2 - IO_L7P_T1L_N0_QBC_AD13P_66
set_property PACKAGE_PIN BP49     [get_ports "PL_DDR4_CS_B"] ;# Bank  66 VCCO - DDR4_VDDQ_1V2 - IO_L1N_T0L_N1_DBC_66
set_property IOSTANDARD  SSTL12   [get_ports "PL_DDR4_CS_B"] ;# Bank  66 VCCO - DDR4_VDDQ_1V2 - IO_L1N_T0L_N1_DBC_66

However, adding BK48 to cs_n does not work:

        Subsignal("cs_n",      Pins("BP49 BK48"), IOStandard("SSTL12_DCI")), # Clam-shell fashion

clam-shell is not equivalent to dual-rank dimms. It is still single-rank, but has two cs pins.

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

I tried to replicate the cs_n signal to two pins:

# in litex-boards
        Subsignal("cs_n",      Pins("BP49"), IOStandard("SSTL12_DCI")),
        Subsignal("bot_cs_n",  Pins("BK48"), IOStandard("SSTL12_DCI")), # Clam-shell fashion
# in litedram
            commands = {
                # Pad name: (DFI name,   Pad type (required or optional))
                "reset_n" : ("reset_n", "optional"),
                "cs_n"    : ("cs_n",    "optional"),
                "bot_cs_n"    : ("cs_n",    "optional"),
                "a"       : ("address", "required"),
                pads_ba   : ("bank"   , "required"),
                "ras_n"   : ("ras_n"  , "required"),
                "cas_n"   : ("cas_n"  , "required"),
                "we_n"    : ("we_n"   , "required"),
                "cke"     : ("cke"    , "optional"),
                "odt"     : ("odt"    , "optional"),
                "act_n"   : ("act_n",   "optional"),
            }

But it still does not work.

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

After more investigation, address mirroring is applied to bottom DRAMs: https://support.xilinx.com/s/question/0D52E00006tceu6SAA/ultrascalempsoc-pl-ddr4-clamshell-address-mirroring?language=en_US

Which means we have to mirror some pins when accessing the bottom half. But does it cause the calibration to fail? UPDATE: Yes, it interfere with Mode Register Set command.

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

I manually swapped the address lines:

        Subsignal("a", Pins(
            # "BF50 BD51 BG48 BE50 BE49 BE51 BF53 BG50",
            # "BF51 BG47 BF47 BG49 BF48 BF52"),
            # Swap A3A4, A5A6, A7A8, A11A13
            "BF50 BD51 BG48 BE49 BE50 BF53 BE51 BF51",
            "BG50 BG47 BF47 BF52 BF48 BG49"),
            IOStandard("SSTL12_DCI")),
        # Subsignal("ba",        Pins("BE54 BE53"), IOStandard("SSTL12_DCI")),
        # Swap BA0/1
        Subsignal("ba",        Pins("BE53 BE54"), IOStandard("SSTL12_DCI")),
        Subsignal("cs_n",  Pins("BK48"), IOStandard("SSTL12_DCI")), # Clam-shell fashion
        #Subsignal("cs_n",      Pins("BP49"), IOStandard("SSTL12_DCI")),
        #Subsignal("bot_cs_n",  Pins("BK48"), IOStandard("SSTL12_DCI")), # Clam-shell fashion

In this way, the bottom DRAMs can be calibrated successfully. However, the top and bottom ones can not be used at the same time.

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

To sum up, litedram need to be extended to support:

  1. Mirrored Pins: A3A4, A5A6, A7A8, A11A13, BA0BA1, BG0BG1. Only Load Mode Register command needs special handling, maybe only in software.
  2. Clam shell fashion: like two ranks but with private data pins, like one rank but needs special caring.

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

litedram already considers address inverting for RDIMMs:

            if phy_settings.is_rdimm:
                assert phy_settings.memtype == "DDR4"
                # JESD82-31A page 38
                #
                # B-side chips have certain usually-inconsequential address and BA
                # bits inverted by the RCD to reduce SSO current. For mode register
                # writes, however, we must compensate for this. BG[1] also directs
                # writes either to the A side (BG[1]=0) or B side (BG[1]=1)
                #
                # The 'ba != 7' is because we don't do this to writes to the RCD
                # itself.
                if ba != 7:
                    invert_masks.append((0b10101111111000, 0b1111))

We can handle it in a similar way for address mirroring.

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

The DFII interface replicates the cs_n bits:

                phase.cs_n.eq(Replicate(~self._command.fields.cs, len(phase.cs_n))),
                phase.we_n.eq(~self._command.fields.we),
                phase.cas_n.eq(~self._command.fields.cas),
                phase.ras_n.eq(~self._command.fields.ras)

We need to expose the two cs_n bits to software upon calibration. In later places, the two cs_n can be used as one.

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

I am working on a dirty fix:

  • Change csr_dfi in DFIInjector to allow software to control top/bottom cs_n separately
  • In other cases, assign both cs_n to the same pin from BankMachine
  • Swap address pins in software and send the same commands to top/bottom chip separately

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

I made it!

--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write leveling:
  tCK equivalent taps: 556
  Cmd/Clk scan (0-278)
  |00011  |111111111  |111111111  |111111111| best: 334
  Setting Cmd/Clk delay to 334 taps.
  Data scan:
  m0: |000111111111111111111000| delay: 37
  m1: |011111111111111111100000| delay: 05
  m2: |001111111111111111110000| delay: 26
  m3: |001111111111111111110000| delay: 31
  m4: |000011111111111111111100| delay: 64
  m5: |000000011111111111111111| delay: 106
  m6: |000011111111111111111000| delay: 53
  m7: |000011111111111111111000| delay: 54
Write latency calibration:
m0:6 m1:6 m2:6 m3:6 m4:6 m5:6 m6:6 m7:6
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |00000000000000000000000000000000| delays: -
  m0, b02: |00000000000000000000000000000000| delays: -
  m0, b03: |11100000000000000000000000000000| delays: 22+-22
  m0, b04: |00000000111111111111100000000000| delays: 217+-102
  m0, b05: |00000000000000000000000001111111| delays: 454+-56
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b04 delays: 215+-104
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |00000000000000000000000000000000| delays: -
  m1, b02: |00000000000000000000000000000000| delays: -
  m1, b03: |11111100000000000000000000000000| delays: 42+-42
  m1, b04: |00000000001111111111111000000000| delays: 255+-103
  m1, b05: |00000000000000000000000000001111| delays: 475+-35
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b04 delays: 255+-103
  m2, b00: |00000000000000000000000000000000| delays: -
  m2, b01: |00000000000000000000000000000000| delays: -
  m2, b02: |00000000000000000000000000000000| delays: -
  m2, b03: |11100000000000000000000000000000| delays: 15+-15
  m2, b04: |00000001111111111111000000000000| delays: 203+-105
  m2, b05: |00000000000000000000000001111111| delays: 447+-64
  m2, b06: |00000000000000000000000000000000| delays: -
  m2, b07: |00000000000000000000000000000000| delays: -
  best: m2, b04 delays: 203+-107
  m3, b00: |00000000000000000000000000000000| delays: -
  m3, b01: |00000000000000000000000000000000| delays: -
  m3, b02: |00000000000000000000000000000000| delays: -
  m3, b03: |00000000000000000000000000000000| delays: -
  m3, b04: |00011111111111111000000000000000| delays: 147+-108
  m3, b05: |00000000000000000000011111111111| delays: 416+-95
  m3, b06: |00000000000000000000000000000000| delays: -
  m3, b07: |00000000000000000000000000000000| delays: -
  best: m3, b04 delays: 149+-111
  m4, b00: |00000000000000000000000000000000| delays: -
  m4, b01: |00000000000000000000000000000000| delays: -
  m4, b02: |00000000000000000000000000000000| delays: -
  m4, b03: |00000000000000000000000000000000| delays: -
  m4, b04: |11111111111110000000000000000000| delays: 102+-102
  m4, b05: |00000000000000000111111111111100| delays: 373+-105
  m4, b06: |00000000000000000000000000000000| delays: -
  m4, b07: |00000000000000000000000000000000| delays: -
  best: m4, b05 delays: 375+-104
  m5, b00: |00000000000000000000000000000000| delays: -
  m5, b01: |00000000000000000000000000000000| delays: -
  m5, b02: |00000000000000000000000000000000| delays: -
  m5, b03: |00000000000000000000000000000000| delays: -
  m5, b04: |11111111111000000000000000000000| delays: 89+-89
  m5, b05: |00000000000000001111111111111000| delays: 354+-104
  m5, b06: |00000000000000000000000000000000| delays: -
  m5, b07: |00000000000000000000000000000000| delays: -
  best: m5, b05 delays: 354+-106
  m6, b00: |00000000000000000000000000000000| delays: -
  m6, b01: |00000000000000000000000000000000| delays: -
  m6, b02: |00000000000000000000000000000000| delays: -
  m6, b03: |00000000000000000000000000000000| delays: -
  m6, b04: |00000111111111111000000000000000| delays: 169+-97
  m6, b05: |00000000000000000000000111111111| delays: 432+-79
  m6, b06: |00000000000000000000000000000000| delays: -
  m6, b07: |00000000000000000000000000000000| delays: -
  best: m6, b04 delays: 172+-100
  m7, b00: |00000000000000000000000000000000| delays: -
  m7, b01: |00000000000000000000000000000000| delays: -
  m7, b02: |00000000000000000000000000000000| delays: -
  m7, b03: |00000000000000000000000000000000| delays: -
  m7, b04: |11111111111100000000000000000000| delays: 93+-93
  m7, b05: |00000000000000000111111111111000| delays: 365+-102
  m7, b06: |00000000000000000000000000000000| delays: -
  m7, b07: |00000000000000000000000000000000| delays: -
  best: m7, b05 delays: 363+-101
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB
   Read: 0x40000000-0x40200000 2.0MiB
Memtest OK
Memspeed at 0x40000000 (Sequential, 2.0MiB)...
  Write speed: 108.8MiB/s
   Read speed: 93.6MiB/s

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
No boot medium found

--============= Console ================--

litex> 

from litex-boards.

jiegec avatar jiegec commented on August 18, 2024

You can find my changes here:

https://github.com/litex-hub/litex-boards/compare/master...jiegec:litex-boards:vcu128?expand=1

https://github.com/enjoy-digital/litex/compare/master...jiegec:litex:vcu128?expand=1

https://github.com/enjoy-digital/litedram/compare/master...jiegec:litedram:vcu128?expand=1

I will upstream my changes next.

from litex-boards.

enjoy-digital avatar enjoy-digital commented on August 18, 2024

The different PRs are now merged, thanks a lot!

from litex-boards.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.