Skip to content

Conversation

SalvadorN323
Copy link

@SalvadorN323 SalvadorN323 commented Jul 31, 2025

Summary
This PR adds detailed setup and build instructions to help contributors initialize the Crawlee project locally. It documents required dependencies, Yarn installation via Corepack, and guidance on using yarn build successfully.

Key Changes
Documentation Additions:

  • Added Crawlee Project Pre-requisites section listing required Node.js and Yarn versions.
  • Included a Crawlee Installation and Building guide with Corepack instructions and yarn commands.

Important Notes:

  • These updates aim to streamline the developer onboarding and build process.

Contributors:

  1. Salvador Nunez: @SalvadorN323
  2. Alexander Manalad: @axmanalad
  3. Bao Truong: @baotruong04

CONTRIBUTING.md Outdated
Comment on lines 39 to 60
### "Module not found" Fix

**Note: Be sure to build the project in its original state first before contributing new code changes and following this.**

In case you rebuild the project from the project root directory with new changes by `yarn build` and you run into issues relating to a missing module of `index.js` in the relative `dist` directory, this may be caused by `rimraf` behavior with the current location of relative paths. This can be fixed by the following:

1. Navigate into `package.json` under the relevant folder with code changes.
(e.g. If your new code changes are in `./packages/core`, go into that `package.json`)

2. Update "scripts" for `yarn clean` and give it the actual relative path.

Example:
```json
"clean": "rimraf packages/core/dist",
```
**Note: Do not include this as a commit.**

3. If there is a `dist` folder within the relative path, delete it manually.

4. Try rebuilding the Crawlee project again using `yarn build`.

If the fix is successful, you should be able to build the project without any issues. You do not need to manually delete the generated `dist` folder again.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds like something to fix instead of documenting it. can you describe what exactly did happened? rimraf dist should work just fine even if the folder is not present, our CI would fail if that would be the case.

CONTRIBUTING.md Outdated

- [Node.js](https://nodejs.org/en) >= 16.0.0 (recommended: v22.17.0)

- [Yarn](https://yarnpkg.com/) >= 4.0.0 (recommended: v4.8.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont "recommend" a version, users need to use the exact version provided in the packageManager field in package.json.

CONTRIBUTING.md Outdated

```shell
corepack enable
corepack prepare yarn@stable --activate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik this shouldnt be needed, we do it in the CI but locally i dont think i even had to run this myself

SalvadorN and others added 2 commits August 4, 2025 14:28
…nd building instructions

- Update Node.js and Yarn version requirements to be more accurate

- Simplified corepack setup instructions
@axmanalad
Copy link

this sounds like something to fix instead of documenting it. can you describe what exactly did happened? rimraf dist should work just fine even if the folder is not present, our CI would fail if that would be the case.

Hi @B4nan,

Thanks for reviewing! Each of us ran into a similar build issue when we attempted to rebuild the project with any change in general with an error related to gen-esm-wrapper not finding the index.js in the dist folder. For instance, if I only insert a console.log(“Hello World"); line inside the core package TS file (like enqueue_links.ts), running yarn build in the root project would miss the core build cache and either never build the dist folder or it would be incomplete.

I also thought that rimraf ./dist should work as expected, since it does delete it in the frontend. I believe it had to do something with the direction of the path, rather yarn compile points to the deleted dist folder possibly? With this in mind, it can also mean that rimraf ./dist does not delete the dist folder fully during compile time?

However, another fix I found working but would always include more steps was the the following:

  1. When receiving the error, change the current directory into the directory of the error occurring (e.g. packages/core)
  2. Run yarn build
  3. Change the directory back to the root project.
  4. Run yarn clean and yarn build.

I could always ticket a new issue with the error log included if you are interested. When we found the rimraf fix, we did not know whether to include it as a potential change in the codebase.

…nd building instructions

- Update Node.js and Yarn version requirements to be more accurate

- Simplified corepack setup instructions
@B4nan
Copy link
Member

B4nan commented Aug 5, 2025

Thanks for reviewing! Each of us ran into a similar build issue when we attempted to rebuild the project with any change in general, with an error related to gen-esm-wrapper not finding the index.js in the dist folder. For instance, if I only insert a console.log(“Hello World"); line inside the core package TS file (like enqueue_links.ts), running yarn build in the root project would miss the core build cache and either never build the dist folder or it would be incomplete.

This feels like something weird happened on your end, and you are trying to randomly find the culprit (so which one is it, gen-esm-wrapper, tsc build, build not working at all, or being incomplete?). I kinda doubt there is an issue like this (if there is, it would have to be in one of the libraries like tsc or turbo).

I'd need to see a complete reproduction - exact steps, not "either that or that happened, or maybe that". Right now, I am not convinced we need to update the contributing guide. Your changes there could likely confuse people rather than help them.

Reading this again and again, I actually think I know what is happening to you, it sounds the tsc build cache, which wasn't properly ignored some time ago (and we managed to include one tsbuildInfo file in the git). We fixed that already via #3035, maybe you just faced that issue because you cloned the project earlier.

@SalvadorN323 SalvadorN323 changed the title docs(contributing): add setup instructions and local build troubleshooting guide docs(contributing): add setup instructions Aug 5, 2025
@axmanalad
Copy link

axmanalad commented Aug 5, 2025

I updated the documentation to not include the fix.

This feels like something weird happened on your end, and you are trying to randomly find the culprit (so which one is it, gen-esm-wrapper, tsc build, build not working at all, or being incomplete?). I kinda doubt there is an issue like this (if there is, it would have to be in one of the libraries like tsc or turbo).

Reading this again and again, I actually think I know what is happening to you, it sounds the tsc build cache, which wasn't properly ignored some time ago (and we managed to include one tsbuildInfo file in the git). We fixed that already via #3035, maybe you just faced that issue because you cloned the project earlier.

The project was tested and cloned after the fix you mentioned. Even attempting to run the normal steps would result with the same error either way. The normal steps with a fresh start would include:

  1. Had corepack enable set up.
  2. Run yarn install
  3. Run yarn build (success)
  4. Add console.log("Hello World"); in line 520 of enqueue_links.ts in the core package.
  5. Run yarn build with the new code change (fails)

You are correct however that it is tied to a local issue of mine as I tried many checks of the following:

  • Uninstalled my global versionings of TypeScript and Turbo.
  • Every versioning including Node.js and Yarn are correct.
  • Using yarn clean to clear cache.
  • Deleting node_modules and reinstalling with yarn install
  • Deleting the generated tsconfig.build.tsbuildinfo manually (somehow this works)

I once again attempted today to do the normal steps of rebuilding the project. You are also correct that it has to do something with TypeScript's incremental build cache in my local environment; it has to do something with tsconfig.build.tsbuildinfo being out of sync or being corrupted afterwards? In other words, tsconfig.build.tsbuildinfo is not updating for me whenever I make a new build with new code changes weird enough, which forces me to delete it manually. Unfortunately, I am not sure where the source of the problem is regarding the "out of sync" issue as it is somehow not an easy fix to become automatic locally. If you would like to look into the log however, feel free to do so with the document I attached.
log.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants