Searching for fake (suspicious) reviews in 2gis.
This project is absolutely unofficial, not related to 2GIS.ru.
pipx install git+https://github.com/yaroslaff/antifraud2gis
af2gis
- main program for most user operationsaf2dev
- program for developersaf2web
- webserver (probably you do not need it)af2worker
- background worker (probably you do not need it too)
If you want just to run fraud detection on companies by 2GIS object-id, you need to use only af2gis
.
# by OID:
af2gis fraud 141265769338187
# by alias
af2gis fraud nskzoo
# list all aliases:
af2gis aliases
Next call to fraud detection will show old result unless --overwrite
option given.
You need to have at least few users already crawled (from previous fraud detections). Explore will submit jobs to redis queue, and you need to have af2worker running to process queue.
af2dev explore -t Новосибирск
-t
will limit only to specific town.
"Explore" may look slow or even hangs, but you may check summary or queue status in other tab every 3-5 minutes to see progress. (It really hangs only if af2worker is not running)
$ af2gis summary
2025-05-04 16:15:35 SUMMARY Companies: total=3, nerr=0 ncalc=3 nncalc=0
2025-05-04 16:15:35 SUMMARY LMDB prefixes: {'object': 12053, 'user': 900}
Companies (on first line) are companies which are downloaded, objects (on second line) - companies which are known (but not downloaded).
Use af2dev queue
(or just q
).
$ af2dev q
Queue report
Worker status: started...
Dramatiq queue: 0
Tasks (0): []
Trusted (5/20):
141265771582384 СберБанк, Банки (3.6) None
70000001025692296 Нск Шашлык, Быстрое питание (4.8) None
70000001026944015 Дивногорский, управляющая компания (3) None
70000001023639031 Сибкарт моторспорт, картинг-центр (4.7) None
141265771632879 Подорожник, доступная кофейня (2.2) None
Untrusted (5/11)
70000001091636066 Культура, рюмочная (5) ['risk_users 84% (222 / 262)']
70000001091636066 Культура, рюмочная (5) ['risk_users 84% (222 / 262)']
70000001083008185 Рыдзинский, рюмочная-караоке (4.9) ['risk_users 41% (179 / 432)']
5348552838629866 Юсуповский Дворец на Мойке, музей (4.9) ['sametitle_rel 20% (5 of 1)']
5348552838629866 Юсуповский Дворец на Мойке, музей (4.9) ['sametitle_rel 20% (5 of 1)']
SQLite database has data about all companies after fraud-detect. Company itself and top of company's neighbour (related) companies are added to database.
$ af2gis search арн -t Новосибирск
{'oid': '70000001075178717', 'title': 'Пекарня, Пекарни', 'address': 'Новосибирск, Урожайная, 8', 'town': 'Новосибирск', 'searchstr': 'новосибирск пекарня, пекарни', 'rating_2gis': 4.7, 'trusted': 1, 'nreviews': 31, 'detections': ''}
{'oid': '70000001041433989', 'title': 'Пекарня, Пекарни', 'address': 'Новосибирск, Чигорина, 3', 'town': 'Новосибирск', 'searchstr': 'новосибирск пекарня, пекарни', 'rating_2gis': 4.6, 'trusted': 1, 'nreviews': 52, 'detections': ''}
LMDB database has data about all ever seen companies (even if company was once seen in one user reviews)
mir ~/repo/antifraud2gis $ af2dev lmdb дог
object:70000001007492207
{
"name": "Хот-дог райский, киоск фастфудной продукции",
"address": "Новосибирск, проспект Дзержинского, 23"
}
object:70000001017423547
{
"name": "Хот-дог Мастер, фуд-киоск",
"address": "Новосибирск, проспект Карла Маркса, 4а"
}
...
Settings variables can be overriden in environment or .env file.
You can always see accurate current defaults in settings.py.
If company has less then MIN_REVIEWS
(20) it will be considered trusted without any other checks.
Only reviews with age under MAX_REVIEW_AGE
(730 days) are processed. (Older reviews are counted as discarded
)
SKIP_OIDS
and DEBUG_UIDS
are space-separated list of UID/OIDs to ignore/debug. (only for debugging purposes, not for users).
Definition:
Relation between companies A and B is high if it has many hits (>= RISK_HIT
).
Relation is happy if avg rating for both A/B companies in relation are >= RISK_HIGHRATE
For sametitle check: check applied only if company has more then SAMETITLE_REL
(3) 'happy' relations, and total different titles in relations ( / total happy relations) are under SAMETITLE_RATIO
(percents).
For happy long relations check: check applied if number of towns are >= HAPPY_LONG_REL_MIN_TOWNS
(3) and happy ratio (ratio of happy hight relations to all high relations) is over HAPPY_LONG_REL
(50%). happy_long_rel calculated as unique_towns / happy_high_relations. (If each happy high relation belongs to different town, happy_long_rel will be 1). Detection will be if this value is >= then HAPPY_LONG_REL
(10).
Risk users are users in such risk (high and happy) relations. If ratio of risk users are over RISK_USER
(30), detection happens.
User is empty if it has no visible reviews for other companies (user is external or has private profile) or user have just one review for this company only.
Test applied only if company has more then APPLY_EMPTY_USER
reviews from empty users and from non-empty users. Fraud detected if company has more then EMPTY_USER
(75%) empty users and avg rating about empty users are more then RATING_DIFF
(1.2) higher then rating among non-empty users.
Simple/cheap bots often have very few reviews. Check is applied only if processed (= from not empty users) more then APPLY_MEDIAN_RPU
(20) reviews. Detection happens if median number of reviews per user is lower then MEDIAN_RPU
(5) and rating amount low-RPU users and high-RPU users are over RATING_DIFF
.
User age is age in days between a first review of user and a current review. (so, each user has at least one review with user age = 0). Check is applied only if processed more then APPLY_MEDIAN_UA
(20) reviews, if median user age is under MEDIAN_USER_AGE
(30) days, there is at least MEDIAN_USER_AGE_NUSERS
(10) young users and rating among young users are more then RATING_DIFF
if compare with old users.
Console utility shows all high relations between companies, SHOW_HIT
used to display relations with lower hit count. Can be overriden with -s N
option.
RISK_MEDIAN
will highlight median number of user reviews if it's under this value. (Bots often has low number of reviews).
Neither SHOW_HIT
nor RISK_MEDIAN
do not affect detections, it's used only for displaying information.