embeddings package¶
embeddings.embedding module¶
-
class
embeddings.embedding.
Embedding
[source]¶ Bases:
object
-
static
download_file
(url, local_filename)[source]¶ Downloads a file from an url to a local file.
Parameters: Returns: file name of the downloaded file.
Return type:
-
static
ensure_file
(name, url=None, force=False, logger=<RootLogger root (WARNING)>, postprocess=None)[source]¶ Ensures that the file requested exists in the cache, downloading it if it does not exist.
Parameters: - name (str) – name of the file.
- url (str) – url to download the file from, if it doesn’t exist.
- force (bool) – whether to force the download, regardless of the existence of the file.
- logger (logging.Logger) – logger to log results.
- postprocess (function) – a function that, if given, will be applied after the file is downloaded. The function has the signature
f(fname)
Returns: file name of the downloaded file.
Return type:
-
static
initialize_db
(fname)[source]¶ Parameters: fname (str) – location of the database. Returns: a SQLite3 database with an embeddings table. Return type: db (sqlite3.Connection)
-
insert_batch
(batch)[source]¶ Parameters: batch (list) – a list of embeddings to insert, each of which is a tuple (word, embeddings)
.Example:
e = Embedding() e.db = e.initialize_db(self.e.path('mydb.db')) e.insert_batch([ ('hello', [1, 2, 3]), ('world', [2, 3, 4]), ('!', [3, 4, 5]), ])
-
static
embeddings.fasttext module¶
-
class
embeddings.fasttext.
FastTextEmbedding
(lang='en', show_progress=True, default='none')[source]¶ Bases:
embeddings.embedding.Embedding
Reference: https://arxiv.org/abs/1607.04606
-
__init__
(lang='en', show_progress=True, default='none')[source]¶ Parameters: Note
Default can use zeros, return
None
, or generate random between[-0.1, 0.1]
.
-
d_emb
= 300¶
-
sizes
= {'en': 1}¶
-
url
= 'https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.{}.zip'¶
-
embeddings.glove module¶
-
class
embeddings.glove.
GloveEmbedding
(name='common_crawl_840', d_emb=300, show_progress=True, default='none')[source]¶ Bases:
embeddings.embedding.Embedding
Reference: http://nlp.stanford.edu/projects/glove
-
class
GloveSetting
(url, d_embs, size, description)¶ Bases:
tuple
-
d_embs
¶ Alias for field number 1
-
description
¶ Alias for field number 3
-
size
¶ Alias for field number 2
-
url
¶ Alias for field number 0
-
-
__init__
(name='common_crawl_840', d_emb=300, show_progress=True, default='none')[source]¶ Parameters: - name – name of the embedding to retrieve.
- d_emb – embedding dimensions.
- show_progress – whether to print progress.
- default – how to embed words that are out of vocabulary. Can use zeros, return
None
, or generate random between[-0.1, 0.1]
.
-
settings
= {'common_crawl_48': GloveSetting(url='http://nlp.stanford.edu/data/glove.42B.300d.zip', d_embs=[300], size=1917494, description='48B token common crawl'), 'common_crawl_840': GloveSetting(url='http://nlp.stanford.edu/data/glove.840B.300d.zip', d_embs=[300], size=2195895, description='840B token common crawl'), 'twitter': GloveSetting(url='http://nlp.stanford.edu/data/glove.twitter.27B.zip', d_embs=[25, 50, 100, 200], size=1193514, description='27B token twitter'), 'wikipedia_gigaword': GloveSetting(url='http://nlp.stanford.edu/data/glove.6B.zip', d_embs=[50, 100, 200, 300], size=400000, description='6B token wikipedia 2014 + gigaword 5')}¶
-
class
embeddings.kazuma module¶
-
class
embeddings.kazuma.
KazumaCharEmbedding
(show_progress=True)[source]¶ Bases:
embeddings.embedding.Embedding
Reference: https://www.logos.t.u-tokyo.ac.jp/~hassy/publications/arxiv2016jmt/
-
d_emb
= 100¶
-
size
= 874474¶
-
url
= 'https://www.logos.t.u-tokyo.ac.jp/~hassy/publications/arxiv2016jmt/jmt_pre-trained_embeddings.tar.gz'¶
-