-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex meta char escape using preg_quote #59
Comments
I've seen it in some rare cases, but unfortunately never had the time to investigate it... This is indeed a bug. |
Regex is not my expertise, but could this be as simple as using an non-valid URL character instead of "@"? |
rawurlencode()ing paths as currently do, I think, is a good way, as URL may contain any char code. require_once(__DIR__ . '/vendor/autoload.php');
$parser = new \RobotsTxtParser('User-agent: webcrawler
Disallow: /(
Disallow: /)
Disallow: /.
');
var_dump($parser->isAllowed('/%5C.', 'webcrawler') == true); // bool(false)
var_dump($parser->isAllowed('/(', 'webcrawler') == false); // bool(false) |
I just took a look at the issue again, unable to fix it (for now), but here is something to continue on for the next person who tries to fix it... private function checkBasicRule($rule, $path)
{
$rule = $this->encode_url($rule);
$rule = preg_quote($rule);
// match result
if (preg_match('@' . $rule . '@', $path)) {
if (mb_stripos($rule, '$') !== false) {
if (mb_strlen($rule) - 1 == mb_strlen($path)) {
return true;
}
} else {
$this->log[] = "Rule match: Path";
return true;
}
}
return false;
} I'm not sure what the problem is, but I think this template is a good place to start... |
When using preg_match('@...@'), preg_quote($rule, '@') is expected to be used to escape input.
Currently one of the following warnings occurs when a path contains some meta character:
PHP Warning: preg_match(): Compilation failed: missing ) at offset 15 in /path/to/vendor/t1gor/robots-txt-parser/source/robotstxtparser.php on line 836
PHP Warning: preg_match(): Compilation failed: unmatched parentheses at offset 1 in /path/to/vendor/t1gor/robots-txt-parser/source/robotstxtparser.php on line 836
The text was updated successfully, but these errors were encountered: