Skip to content

Conversation

@antgonza
Copy link
Member

No description provided.

@antgonza antgonza changed the title simplifying code and adding resources.yml [WIP]: simplifying code and adding resources.yml Oct 28, 2025
@antgonza antgonza changed the title [WIP]: simplifying code and adding resources.yml simplifying code and adding resources.yml Nov 7, 2025
Copy link
Collaborator

@lucaspatel lucaspatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor concerns about testing but otherwise looks good to me.

cd ../step-2/${sample_name}_split
# making a copy of the small_LCG before they are removed
mkdir -p {{output}}/step-2/${sample_name}_small_LCG
find . -maxdepth 1 -type f -size -512k -print0 | xargs -0 -r cp -t ../${sample_name}_small_LCG
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So small_LCG is defined by files in size < 512kb? Probably important to note in the documentation for the PacBio workflow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting small_LCG to be defined by total genome size

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianshu93, can you comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by file size for now. Can be optimized, they are proportational to total genome size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approximate 515,000 bases (half a million), because one character takes one byte approximately.

else:
full = full.concat(loaded)

with h5py.File(f"{base}/{rank}", "w") as out:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm reading this right, but wouldn't these lines here indicate that you will write full to out on every iteration of the loop, thus rewriting the same path over and over?


@click.command()
@click.option("--base", type=click.Path(exists=True), required=True)
def biom_merge(base):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of logic to merge some BIOM. Is this borrowed from qp-woltka? Do you have tests supporting this function in particular?

"qiita-files@https://github.com/qiita-spots/qiita-files/archive/master.zip",
"qiita_client@https://github.com/qiita-spots/qiita_client/archive/master.zip",
"woltka@git+https://github.com/qiyunzhu/woltka.git#egg=woltka",
"micov@git+https://github.com/biocore/micov.git#egg=micov",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but you can probably get micov from pip now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we can install from bioconda

Copy link
Collaborator

@jianshu93 jianshu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the database option. We will probably update it very soon. wol3 or even GTDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants