101 lines
3.5 KiB
Markdown
101 lines
3.5 KiB
Markdown
# referer-parser Python library
|
|
|
|
This is the Python implementation of [referer-parser] [referer-parser], the library for extracting search marketing data from referer _(sic)_ URLs.
|
|
|
|
The implementation uses the shared 'database' of known referers found in [`referers.yml`] [referers-yml] (converted to a `referers.json` file,
|
|
see below).
|
|
|
|
The Python version of referer-parser is maintained by [Don Spaulding] [donspaulding].
|
|
|
|
## Installation
|
|
|
|
$ pip install referer_parser
|
|
|
|
## Usage
|
|
|
|
Create a new instance of a Referer object by passing in the url you want to parse:
|
|
|
|
```python
|
|
from referer_parser import Referer
|
|
|
|
referer_url = 'http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari'
|
|
|
|
r = Referer(referer_url)
|
|
```
|
|
|
|
The `r` variable now holds a Referer instance. The important attributes are:
|
|
|
|
```python
|
|
print(r.known) # True
|
|
print(r.referer) # 'Google'
|
|
print(r.medium) # 'search'
|
|
print(r.search_parameter) # 'q'
|
|
print(r.search_term) # 'gateway oracle cards denise linn'
|
|
print(r.uri) # ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='q=gateway+oracle+cards+denise+linn&hl=en&client=safari', fragment='')
|
|
```
|
|
|
|
Optionally, pass in the current URL as well, to handle internal referers
|
|
|
|
```python
|
|
from referer_parser import Referer
|
|
|
|
referer_url = 'http://www.snowplowanalytics.com/about/team'
|
|
curr_url = 'http://www.snowplowanalytics.com/account/profile'
|
|
|
|
r = Referer(referer_url, curr_url)
|
|
```
|
|
|
|
The attributes would be
|
|
|
|
```python
|
|
print(r.known) # True
|
|
print(r.referer) # None
|
|
print(r.medium) # 'internal'
|
|
print(r.search_parameter) # None
|
|
print(r.search_term) # None
|
|
print(r.uri) # ParseResult(scheme='http', netloc='www.snowplowanalytics.com', path='/about/team', params='', query='', fragment='')
|
|
```
|
|
|
|
The `uri` attribute is an instance of ParseResult from the standard library's `urlparse` module.
|
|
|
|
## referers.json
|
|
|
|
Unlike the other ports, the Python version of referer-parser uses a `referers.json` file, generated from the main `referers.yml` file. This is for two reasons:
|
|
|
|
1. Python's standard library includes a JSON parser but not a YAML parser
|
|
2. Loading from JSON in Python is significantly faster than loading from YAML
|
|
|
|
To support the `referers.json` file, the distribution process for Python looks like this:
|
|
|
|
$ ./sync_yaml.sh
|
|
$ cd python/referer_parser && python build_json.py
|
|
$ python setup.py sdist bdist_wininst upload
|
|
|
|
## Contributing
|
|
|
|
1. Fork it
|
|
2. Create your feature branch (`git checkout -b my-new-feature`)
|
|
3. Commit your changes (`git commit -am 'Add some feature'`)
|
|
4. Push to the branch (`git push origin my-new-feature`)
|
|
5. Create new Pull Request
|
|
|
|
## Copyright and license
|
|
|
|
The referer-parser Python library is copyright 2012-2013 Don Spaulding.
|
|
|
|
Licensed under the [Apache License, Version 2.0] [license] (the "License");
|
|
you may not use this software except in compliance with the License.
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
|
|
[referer-parser]: https://github.com/snowplow/referer-parser
|
|
[referers-yml]: https://github.com/snowplow/referer-parser/blob/master/referers.yml
|
|
|
|
[donspaulding]: https://github.com/donspaulding
|
|
|
|
[license]: http://www.apache.org/licenses/LICENSE-2.0
|