码迷,mamicode.com
首页 > 其他好文 > 详细

Nginx - Rewrite Module

时间:2016-08-09 00:16:56      阅读:167      评论:0      收藏:0      [点我收藏+]

标签:

Initially, the purpose of this module (as the name suggests) is to perform URL rewriting. This mechanism allows you to get rid of ugly URLs containing multiple parameters, for instance, http://example.com/article. php?id=1234&comment=32 — such URLs being particularly uninformative and meaningless for a regular visitor. Instead, links to your website will contain useful information that indicate the nature of the page you are about to visit. The URL given in the example becomes http://website.com/article-1234-32-USeconomy-strengthens.html. This solution is not only more interesting for your visitors, but also for search engines — URL rewriting is a key element to Search Engine Optimization (SEO).

The principle behind this mechanism is simple — it consists of rewriting the URI of the client request after it is received, before serving the file. Once rewritten, the URI is matched against location blocks in order to find the configuration that should be applied to the request. The technique is further detailed in the coming sections.

 

Reminder on Regular Expressions

First and foremost, this module requires a certain understanding of regular expressions, also known as regexes or regexps. Indeed, URL rewriting is performed by the rewrite directive, which accepts a pattern followed by the replacement URI.

Purpose

The first question we must answer is: What‘s the purpose of regular expressions? To put it simply, the main purpose is to verify that a string matches a pattern. The said pattern is written in a particular language that allows defining extremely complex and accurate rules.

 String   Pattern   Matches?  Explanation
 hello  ^hello$  Yes  The string begins by character h (^h), followed by e, l, l, and then finishes by o (o$).
 hell  ^hello$  No  The string begins by character h (^h), followed by e, l, l but does not finish by o.
 Hello  ^hello$  Depends  If the engine performing the match is casesensitive, the string doesn‘t match the pattern. 

This concept becomes a lot more interesting when complex patterns are employed, such as one that validate an e-mail addresses: ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$. Validating the well-forming of an e-mail address programmatically would require a great deal of code, while all of the work can be done with a single regular expression pattern matching.

PCRE Syntax

The syntax that Nginx employs originates from the Perl Compatible Regular Expression (PCRE) library. It‘s the most commonly used form of regular expression, and nearly everything you learn here remains valid for other language variations.

In its simplest form, a pattern is composed of one character, for example, x. We can match strings against this pattern. Does example match the pattern x? Yes, example contains the character x. It can be more than one specific character — the pattern [a-z] matches any character between a and z, or even a combination of letters and digits: [a-z0-9]. In consequence, the pattern hell[a-z0-9] validates the following strings: hello and hell4, but not hell or hell!.

You probably noticed that we employed the characters [ and ]. These are called metacharacters and have a special effect on the pattern. There are a total of 11 metacharacters, and all play a different role. If you want to actually create a pattern containing one of these characters, you need to escape them with the \ character.

 Metacharacter  Description

^
Beginning

The entity after this character must be found at the beginning.

Example pattern: ^h

Matching strings: hello, h, hh

Non-matching strings: character, ssh

$

End

The entity before this character must be found at the end.

Example pattern: e$

Matching strings: sample, e, file

Non-matching strings: extra, shell

.

Any

Matches any character.

Example pattern: hell.

Matching strings: hello, hellx, hell5, hell!

Non-matching strings: hell, helo

[ ]

Set

Matches any character within the specified set.

Syntax: [a-z] for a range, [abcd] for a set, and [a-z0-9] for two ranges. Note that if you want to include the character in a range, you need to insert it right after the [ or just before the ].

Example pattern: hell[a-y123-]

Matching strings: hello, hell1, hell2, hell3, hell

Non-matching strings: hellz, hell4, heloo, he-llo

[^ ]

Negate set

Matches any character that is not within the specified set.

Example pattern: hell[^a-np-z0-9]

Matching strings: hello, hell;

Non-matching strings: hella, hell5

|

Alternation

Matches the entity placed either before or after the |.

Example pattern: hello|welcome

Matching strings: hello, welcome, helloes, awelcome

Non-matching strings: hell, ellow, owelcom

( )

Grouping

Groups a set of entities, often to be used in conjunction with |.

Example pattern: ^(hello|hi) there$

Matching strings: hello there, hi there.

Non-matching strings: hey there, ahoy there

\

Escape

Allows you to escape special characters.

Example pattern: Hello\.

Matching strings: Hello., Hello. How are you?, Hi! Hello...

Non-matching strings: Hello, Hello, how are you?

Quantifiers

So far, you are able to express simple patterns with a limited number of characters. Quantifiers allow you to extend the amount of accepted entities:

Quantifier Description

*

0 or more times

The entity preceding * must be found 0 or more times.

Example pattern: he*llo

Matching strings: hllo, hello, heeeello

Non-matching strings: hallo, ello

+

1 or more times

The entity preceding + must be found 1 or more times.

Example pattern: he+llo

Matching strings: hello, heeeello

Non-matching strings: hllo, helo

?

0 or 1 time

The entity preceding ? must be found 0 or 1 time.

Example pattern: he?llo

Matching strings: hello, hllo

Non-matching strings: heello, heeeello

{x}

x times

The entity preceding {x} must be found x times.

Example pattern: he{3}llo

Matching strings: heeello, oh heeello there!

Non-matching strings: hello, heello, heeeello

{x,}

At least x times

The entity preceding {x,} must be found at least x times.

Example pattern: he{3,}llo

Matching strings: heeello, heeeeeeello

Non-matching strings: hllo, hello, heello

{x,y}

x to y times

The entity preceding {x,y} must be found between x and y times.

Example pattern: he{2,4}llo

Matching strings: heello, heeello, heeeello

Non-matching strings: hello, heeeeello

As you probably noticed, the { and } characters in the regular expressions conflict with the block delimiter of the Nginx configuration file syntax language. If you want to write a regular expression pattern that includes curly brackets, you need to place the pattern between quotes (single or double quotes):

rewrite hel{2,}o /hello.php; # invalid
rewrite "hel{2,}o" /hello.php; # valid
rewrite ‘hel{2,}o‘ /hello.php; # valid

Captures

One last feature of the regular expression mechanism is the ability to capture sub-expressions. Whatever text is placed between parentheses ( ) is captured and can be used after the matching process.

Here are a couple of examples to illustrate the principle:

Pattern String Captured
 ^(hello|hi) (sir|mister)$  hello sir

 $1 = hello
 $2 = sir

 ^(hello (sir))$  hello sir

 $1 = hello sir
 $2 = sir

 ^(.*)$  nginx rocks  $1 = nginx rocks 
 ^(.{1,3})([0-9]{1,4})([?!]{1,2})$  abc1234!?

 $1 = abc
 $2 = 1234
 $3 = !?

 Named captures are also supported: ^/(?<folder>[^/]*)/(?<file>.*)$   /admin/doc $folder = admin   $file = doc

When you use a regular expression in Nginx, for example, in the context of a location block, the buffers that you capture can be employed in later directives:

server {
  server_name website.com;
  location ~* ^/(downloads|files)/(.*)$ {
    add_header Capture1 $1;
    add_header Capture2 $2;
  }
}

In the preceding example, the location block will match the request URI against a regular expression. A couple of URIs that would apply here: /downloads/file.txt, /files/archive.zip, or even /files/docs/report.doc. Two parts are captured: $1 will contain either downloads or files and $2 will contain whatever comes after /downloads/ or /files/. Note that the add_header directive is employed here to append arbitrary headers to the client response for the sole purpose of demonstration.

 

Internal requests

Nginx differentiates external and internal requests. External requests directly originate from the client; the URI is then matched against possible location blocks:

 

Nginx - Rewrite Module

标签:

原文地址:http://www.cnblogs.com/huey/p/5734177.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!