Skip to content

Reduce Typosquatting Harm via Social Distancing for Top PyPI Packages #9527

Open
@jspeed-meyers

Description

@jspeed-meyers

What's the problem this feature will solve?

Reduce the total harm typosquatting causes to PyPI users.

Describe the solution you'd like

Block users from uploading new packages with a similar name to any of the current top packages.

  • similar = implementing Levenshtein distance or a similar edit distance metric.
    • As a starting point, an edit distance of two or less would define the cutoff.
  • top packages = the 1000 most downloaded packages.

Additional context

While similar solutions have been proposed before (see below), this particular solution is not a malware check or a predictive model, per se. It is a proposal for a simple rule that users cannot upload any NEW packages that have a similar name to a top package. Full stop. This solution does not stop all typosquatters, but it will likely reduce the harm because typosquatting the top packages will be harder. Typosquatters can either attempt to typosquat less popular packages and therefore harm fewer users or they can use typosquatting attack strategies involving a greater edit distance and also likely harm fewer users.

For reference, past analysis by @bentztozer and myself found that eighteen of forty past documented typosquatting attacks on PyPI had an edit distance of two or less. Similarly, the analysis found that twenty nine of the forty past attacks typosquatted packages that were among the 1000 most downloaded packages.

I should also mention that this proposed feature is not meant to replace a number of other ongoing and related efforts that try to reduce the harm caused by malware on PyPI. Finally, like all approaches to reducing harm from malware on PyPI, there are pros and cons. All debate and critique and suggested revisions are welcome.

Some parties I know will be interested: @di, @ewdurbin, @xmunoz, @benjaoming, @hannob
Some parties who could be interested: @ewjoachim, @brainwane, @pradyunsg, @ncoghlan, @dstufft

Relevant issues and PR’s:

Implement a More Robust Malware Detector - Issue #7748

Detect Packages Being Published with Typo-ish Names Issue #4998

@brainwane rightfully mentioned that if this approach was part of a malware check it is virtually guaranteed that this approach would generate many false positives. This proposal is distinct since this proposal simply calls for a rule to restrict package name selection in the name of what might technically be called “preclusive namespacing” but what might informally be called “social distancing for top PyPI packages.” PyPI administrators can therefore avoid adjudicating whether a certain package is malware or not. PyPI will simply prevent any users--and ideally provide an explanation--that such a package name is not allowed in the name of reducing aggregate typosquatting harm.

Post-registration Alerts for Packages with Similar Names (Typosquatting) - Issue #2268

Monitor New Packages that Might be Typosquats - PR #5001

PSF Fundables - Productionize Malware Detection - Issue #38

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestmalware-detectionIssues related to automated malware detection.securitySecurity-related issues and pull requestssquattingIssues related to preventing any kinds of namesquatting, typosquatting, dependency confusion

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions