Retrieving Historical Transactions by TxHash #1794
Replies: 4 comments 10 replies
-
|
Regarding data size of tx_envelope/result/etc... averages for 2025
Other notes:
None of this is super accurate. More for ballparking and vibes |
Beta Was this translation helpful? Give feedback.
-
Understatement of the year 😂 a full reingest of the network, even off of a data lake, would take weeks. To add to the pros of (2), it's worth noting that you would have extremely fast lookups in S3 (and the smallest storage footprint) because it's a 1:1 mapping vs. other options that need to build indices "on top of" the table. As in, there's no need to do prefix trieing or anything special, it's just a single folder with every single txhash in it as a file by name. This makes lookups trivial (just don't ever |
Beta Was this translation helpful? Give feedback.
-
|
I did some non-trivial math for storage costs/write costs/read costs to S3, based on my understanding of S3 pricing for Option 2 (data lake side index) and Option 3 (data lake side tx store) This is predicated on having 1 file per transaction with a potential naming scheme like - Some Numbers
S3 Billing DetailsS3 Storage Classes and Minimum Object Sizes
For this analysis, we'll use S3 Standard (no minimum object size penalty). Storage Billing for S3 StandardS3 Standard charges $0.023 per GB per month (us-east-1) with NO minimum object size. Example: 9 billion × 4-byte files Example: 9 billion × 2,560-byte files Storage Efficiency: 100% for both scenarios (no waste on S3 Standard) Request Billing: Per-Request PricingNOTE: Request costs depend on NUMBER of requests, NOT data size
Examples:
Why file size doesn't matter for request costs:
Write 1 billion files:
Read 1 billion files:
Data Transfer BillingFor small files (<10 MB), data transfer costs are typically negligible compared to request costs. Data Transfer OUT (to internet):
For this analysis: With 500 queries/sec × 2.5 KB = ~3.2 TB/month transferred, cost would be ~$280/month. Use Case ParametersTransaction Volumes
Storage Options
Query Pattern
SCENARIO 1: Store 4-Byte Ledger SequenceInitial Historical Ingestion (9 Billion Transactions)Storage Calculation
One-Time Write Cost
Note: This is a ONE-TIME cost to upload historical data. Takes approximately 4 weeks (?) with parallel ingestion. Ongoing Monthly Costs (500 tx/sec)Monthly Write Costs
Monthly Read Costs
Monthly Storage Growth
10-Year Cost Projection (Scenario 1)
Key Insight: Storage cost grows from $0.83 to $15.16/month over 10 years, but request costs dwarf storage costs at $6,998.40/month. Summary Costs (Scenario 1 - 4 bytes)
SCENARIO 2: Store 2.5 KB Transaction DataInitial Historical Ingestion (9 Billion Transactions)Storage Calculation
One-Time Write Cost
Note: Same as Scenario 1 - cost is per request, not per byte! Ongoing Monthly Costs (500 tx/sec)Monthly Write Costs
Note: Same as Scenario 1 - file size doesn't affect request cost! Monthly Read Costs
Note: Same as Scenario 1 - file size doesn't affect request cost! Monthly Storage Growth
10-Year Cost Projection (Scenario 2)
Key Insight: Storage cost grows from $530 to $9,667/month over 10 years. Storage becomes a major cost factor unlike Scenario 1. Summary Costs (Scenario 2 - 2.5 KB)
Side-by-Side ComparisonCost Comparison Table
Storage vs Request Cost BalanceScenario 1 (4 bytes):
Scenario 2 (2.5 KB):
Value Analysis
Key Insights1. S3 Standard Has NO Minimum Object Size2. Request Costs Are IdenticalBoth scenarios:
3. Storage Costs Differ DramaticallyScenario 1 (4 bytes):
Scenario 2 (2.5 KB):
|
Beta Was this translation helpful? Give feedback.
-
|
@overcat and @tmosleyIII what are your thoughts on the above options? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What
Stellar RPC has limited "archival" node support as of protocol 23. This was implemented as Option 1 as articulated in https://github.com/orgs/stellar/discussions/1717 but as of yet the only endpoint that supports history is
getLedgers. This limited use satisfies several key use cases around backfilling/reingesting full history via RPC.It was always the intent to expand this functionality into the full set of endpoints that RPC offers, including:
getTransaction(hash)stellar-rpc#492getTransactionsstellar-rpc#493getEventsstellar-rpc#494Among these,
getTransactionby tx_hash has emerged as the most in-demand endpoint, based on feedback from ecosystem partners and operators that have already started running the archival node. This aligns with common developer expectations across chains; being able to retrieve a transaction by its hash, regardless of age, is considered useful for debugging, compliance, and auditing.Options
There's been some investigation done into the options here: stellar/stellar-rpc#492
At this point, we'd like to align on the how this should be implemented, as there are some pretty significant trade-offs in balancing operator cost/burden and end-user ergonomics. This discussion should be focused on the trade-offs between those options. A couple points to keep in mind when assessing the pros/cons of each of these options:
Option 1: RPC-side Index
BLUF; super fast for users, but makes RPC operators store a lot more data
Implementation
tx_hash→ledger_sequenceindex across full history.Pros
Cons
getTransaction(hash)stellar-rpc#492)Option 2: Datalake-side Index
BLUF; cheaper for RPC operators, but slower for users and makes the data lake more complicated
Implementation
tx_hash->ledger_sequenceinto the datalakePros
Cons
Option 3: Datalake-side transaction store
BLUF; faster than Option 2 (+ maybe 1), but blows up data storage even more.
Implementation
Pros
Cons
Beta Was this translation helpful? Give feedback.
All reactions