deeppavlov.models.squad

class deeppavlov.models.squad.squad.SquadModel(word_emb: numpy.ndarray, char_emb: numpy.ndarray, context_limit: int = 450, question_limit: int = 150, char_limit: int = 16, train_char_emb: bool = True, char_hidden_size: int = 100, encoder_hidden_size: int = 75, attention_hidden_size: int = 75, keep_prob: float = 0.7, min_learning_rate: float = 0.001, noans_token: bool = False, **kwargs)[source]

SquadModel predicts answer start and end position in given context by given question.

High level architecture: Word embeddings -> Contextual embeddings -> Question-Context Attention -> Self-attention -> Pointer Network

If noans_token flag is True, then special noans_token is added to output of self-attention layer. Pointer Network can select noans_token if there is no answer in given context.

Parameters:
  • word_emb – pretrained word embeddings
  • char_emb – pretrained char embeddings
  • context_limit – max context length in tokens
  • question_limit – max question length in tokens
  • char_limit – max number of characters in token
  • char_hidden_size – hidden size of charRNN
  • encoder_hidden_size – hidden size of encoder RNN
  • attention_hidden_size – size of projection layer in attention
  • keep_prob – dropout keep probability
  • min_learning_rate – minimal learning rate, is used in learning rate decay
  • noans_token – boolean, flags whether to use special no_ans token to make model able not to answer on question
__call__(c_tokens: numpy.ndarray, c_chars: numpy.ndarray, q_tokens: numpy.ndarray, q_chars: numpy.ndarray, *args, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray, List[float]][source]

Predicts answer start and end positions by given context and question.

Parameters:
  • c_tokens – batch of tokenized contexts
  • c_chars – batch of tokenized contexts, each token split on chars
  • q_tokens – batch of tokenized questions
  • q_chars – batch of tokenized questions, each token split on chars
Returns:

answer_start, answer_end positions, answer logits which represent models confidence

train_on_batch(c_tokens: numpy.ndarray, c_chars: numpy.ndarray, q_tokens: numpy.ndarray, q_chars: numpy.ndarray, y1s: Tuple[List[int], ...], y2s: Tuple[List[int], ...]) → float[source]

This method is called by trainer to make one training step on one batch.

Parameters:
  • c_tokens – batch of tokenized contexts
  • c_chars – batch of tokenized contexts, each token split on chars
  • q_tokens – batch of tokenized questions
  • q_chars – batch of tokenized questions, each token split on chars
  • y1s – batch of ground truth answer start positions
  • y2s – batch of ground truth answer end positions
Returns:

value of loss function on batch

process_event(event_name: str, data) → None[source]

Processes events sent by trainer. Implements learning rate decay.

Parameters:
  • event_name – event_name sent by trainer
  • data – number of examples, epochs, metrics sent by trainer