The aim of this project is to develop a Single cycle RISC-V processor integrated with a hierarchical cache system to reduce memory access latency. The design includes:
- A functional RV32I processor core with basic arithmetic, logic, control, and memory instructions.
 - L1 and L2 caches with distinct mapping policies (L1: direct-mapped, L2: 4-way set associative).
 - Implementation of write-back and no-write allocate policies with Least Recently Used (LRU) replacement.
 - Testing and verification of a balanced set of Load/Store and ALU instructions on the integrated system.
 
The RISC-V architecture is an open-source instruction set architecture (ISA) known for its simplicity and flexibility. Originally developed at the University of California, Berkeley, it is part of the fifth generation of RISC processors.
A Cache Controller serves as an interface between the processor and memory, executing read and write requests (Load/Store instructions), and managing data flow across cache levels and main memory.
This project focuses on implementing a two-level cache system with a Single-Cycle RISC-V processor, offering hands-on experience in digital design and microprocessor architecture.
- Xilinx Vivado IDE
 - Ripes RISC-V Simulator
 - GTKWave (debugging)
 - Languages: Verilog HDL, RISC-V Assembly
 
Tools Description:
- Xilinx Vivado: FPGA design suite for synthesis, implementation, and verification
 - Ripes: Visual simulator for RISC-V, generates binary 
.datfiles for instruction memory - GTKWave: Waveform viewer for efficient debugging
 
- Implementation and comparison of different cache mappings
 - Accel: Cache simulator
 - Cache architecture studies
 
- Memory Hierarchy Understanding:
Studied spatial and temporal locality to optimize cache. - AMAT (Average Memory Access Time):
AMAT = Hit time + Miss rate × Miss penalty - Write Policy Analysis:
Compared Write-through vs Write-back 
- 
Developed RV32I Processor Core using Verilog HDL (5-stage pipeline):
- Instruction Fetch (IF)
 - Instruction Decode (ID)
 - Execute (EX)
 - Memory Access (MEM)
 - Write Back (WB)
 
 - 
Used structural modeling to define modules and integrate datapath and control path.
 
- Clock Rate: Cache operates ~5× faster than the processor for optimal AMAT.
 
L1 Cache (Direct-Mapped)
- Size: 64 bytes
 - Delay: 1 cycle
 
L2 Cache (4-Way Set Associative)
- Size: 512 bytes
 - Delay: 4 cycles
 - Replacement Policy: LRU
 
Main Memory
- Size: 4KB
 - Delay: 10 cycles
 
Policies Implemented:
- Write-Back
 - No Write-Allocate
 
- 
Check Mode: Ensure controller isn’t busy via the wait signal
 - 
Read Operation:
- Check L1 Cache
 - L1 Hit: Return data to processor
 - L1 Miss: Check L2
 - L2 Hit: Delay 2 cycles, promote block to L1
 - L2 Miss: Fetch from main memory (10-cycle delay)
 - Promotions: L2 → L1 with evictions and write-backs if needed
 
 - 
Write Operation:
- L1 Hit: Modify in L1
 - L1 Miss: Check and modify in L2 if found
 - L2 Miss: Modify directly in main memory
 - Policy: No promotion on write, no eviction on write (No Write-Allocate)
 
 
Test Program:
addi x5, x0, 0  
addi x6, x0, 0  
addi x7, x0, 4  
addi x6, x5, 0  
sw x7, 0(x6)  
lw x7, 0(x6)  
addi x6, x5, 4  
lw x7, 0(x6)  
addi x6, x5, 8  - Processor Speed: 11.9 MHz (84 ns period)
 - Cache Speed: 500 MHz (2 ns period)
 - Speedup (after L1 full): 3.75
 - Observation point: PC = 0x4A; check hit1, hit2, and wait signals
 
The two-level cache controller significantly reduced memory latency and increased performance in the RISC-V system. Through integration with the RV32I core, substantial throughput gains were achieved compared to a baseline design.
- Branch Prediction: Reduce instruction fetch penalties
 - Advanced Cache Policies: Write-through, Write-allocate, and even L3 Cache
 - Multicore Coherence: Implement MESI/MOESI for shared caches
 - Adaptive Replacement: Use DRRIP or ARC for better miss handling
 - Prefetching Mechanisms: To reduce compulsory misses
 - FPGA Implementation: Synthesize the full design to obtain power, area, and timing reports on hardware