Add check_escape.py quality checker for issue #149. #193

dliu04 · 2025-08-02T20:02:15Z

Implements check_escape.py script to detect "Wrong escape" issues as specified in GitHub issue #149. The script identifies:

Incomplete LaTeX commands (\c, \p, \l)
Invalid escape characters (-)
Improper quote escaping
Unknown escape sequences

koppor · 2025-08-03T13:47:50Z

Pleae wire into https://github.com/JabRef/abbrv.jabref.org/blob/main/.github/workflows/checks.yml

koppor · 2025-08-14T13:50:12Z

Review hint: Does this PR update existing fetched lists? If yes: Discuss alternatives (no failing PR, just report -> workflow summary; plus report to upstream list provider)

subhramit · 2025-08-14T15:08:12Z

scripts/check_escape.py

+for file in fileNames:
+    if (file.endswith(".csv")):
+        # For each .csv file in the folder, open in read mode
+        with open(PATH_TO_JOURNALS + file, "r", encoding='utf-8', errors='ignore') as f:
+            for i, line in enumerate(f):
+                # Look for specific problematic patterns
+                problematic_patterns = [
+                    (r'\\c(?![primeyrd])', 'incomplete LaTeX command - should be \\cyr or \\cprime'),
+                    (r'\\p(?!olhk)', 'incomplete LaTeX command - should be \\polhk'),
+                    (r'\\l(?!dots|asp)', 'incomplete LaTeX command'),
+                    (r'\\"[^,"]', 'improper quote escaping'),
+                    (r'\\(?![\\"/nrt$&-]|sp|rm|circledR|cprime|cyr|polhk|cdprime|ldots|lasp)[a-zA-Z]+', 'unknown escape sequence'),
+                ]
+
+                for pattern, description in problematic_patterns:
+                    matches = re.finditer(pattern, line)
+                    for match in matches:
+                        # Skip if we're inside a mathematical expression (between $ signs)
+                        line_before_match = line[:match.start()]
+                        line_after_match = line[match.end():]
+                        dollar_count_before = line_before_match.count('$')
+                        dollar_count_after = line_after_match.count('$')
+
+                        # If we have an odd number of $ before and after, we're inside math - allow it
+                        if (dollar_count_before % 2 == 1) and (dollar_count_after % 2 == 1):
+                            continue
+
+                        errFileNames.append(file)


Suggested change

for file in fileNames:

if (file.endswith(".csv")):

# For each .csv file in the folder, open in read mode

with open(PATH_TO_JOURNALS + file, "r", encoding='utf-8', errors='ignore') as f:

for i, line in enumerate(f):

# Look for specific problematic patterns

problematic_patterns = [

(r'\\c(?![primeyrd])', 'incomplete LaTeX command - should be \\cyr or \\cprime'),

(r'\\p(?!olhk)', 'incomplete LaTeX command - should be \\polhk'),

(r'\\l(?!dots|asp)', 'incomplete LaTeX command'),

(r'\\"[^,"]', 'improper quote escaping'),

(r'\\(?![\\"/nrt$&-]|sp|rm|circledR|cprime|cyr|polhk|cdprime|ldots|lasp)[a-zA-Z]+', 'unknown escape sequence'),

]

for pattern, description in problematic_patterns:

matches = re.finditer(pattern, line)

for match in matches:

# Skip if we're inside a mathematical expression (between $ signs)

line_before_match = line[:match.start()]

line_after_match = line[match.end():]

dollar_count_before = line_before_match.count('$')

dollar_count_after = line_after_match.count('$')

# If we have an odd number of $ before and after, we're inside math - allow it

if (dollar_count_before % 2 == 1) and (dollar_count_after % 2 == 1):

continue

errFileNames.append(file)

for fileName in fileNames:

if (fileName.endswith(".csv")):

# For each .csv file in the folder, open in read mode

with open(PATH_TO_JOURNALS + fileName, "r", encoding='utf-8', errors='ignore') as f:

for i, line in enumerate(f):

# Look for specific problematic patterns

problematic_patterns = [

(r'\\c(?![primeyrd])', 'incomplete LaTeX command - should be \\cyr or \\cprime'),

(r'\\p(?!olhk)', 'incomplete LaTeX command - should be \\polhk'),

(r'\\l(?!dots|asp)', 'incomplete LaTeX command'),

(r'\\"[^,"]', 'improper quote escaping'),

(r'\\(?![\\"/nrt$&-]|sp|rm|circledR|cprime|cyr|polhk|cdprime|ldots|lasp)[a-zA-Z]+', 'unknown escape sequence'),

]

for pattern, description in problematic_patterns:

matches = re.finditer(pattern, line)

for match in matches:

# Skip if we're inside a mathematical expression (between $ signs)

line_before_match = line[:match.start()]

line_after_match = line[match.end():]

dollar_count_before = line_before_match.count('$')

dollar_count_after = line_after_match.count('$')

# If we have an odd number of $ before and after, we're inside math - allow it

if (dollar_count_before % 2 == 1) and (dollar_count_after % 2 == 1):

continue

errFileNames.append(fileName)

Add check_escape.py quality checker for issue JabRef#149.

ca118af

dliu04 added 2 commits August 3, 2025 08:54

Wired escape sequences check to checks.yml.

a3c8f17

Fix valid escape sequence detection for hyphens.

74f5c62

subhramit reviewed Aug 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add check_escape.py quality checker for issue #149. #193

Add check_escape.py quality checker for issue #149. #193

Uh oh!

dliu04 commented Aug 2, 2025

Uh oh!

koppor commented Aug 3, 2025

Uh oh!

koppor commented Aug 14, 2025

Uh oh!

subhramit Aug 14, 2025

Uh oh!

Uh oh!

Add check_escape.py quality checker for issue #149. #193

Are you sure you want to change the base?

Add check_escape.py quality checker for issue #149. #193

Uh oh!

Conversation

dliu04 commented Aug 2, 2025

Uh oh!

koppor commented Aug 3, 2025

Uh oh!

koppor commented Aug 14, 2025

Uh oh!

subhramit Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!