扫二维码与项目经理沟通
我们在微信上24小时期待你的声音
解答本文疑问/技术咨询/运营咨询/技术建议/互联网交流
发布回复 |
pylearn-users › | |
Weights and Regularization4 名作者发布了 29 个帖子 |
b_m...@live.com | 13-10-10 |
Here is the basic script I am using (I am using a data set from UCI regarding wine ratings).
####CODE#####################################################################
# create hidden layer with 5 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)
# create hidden layer with 2 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)
# create Softmax output layer
output_layer = mlp.Softmax(2, 'output', irange=.1)
# create Stochastic Gradient Descent trainer that runs for x epochs
trainer = sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200))
layers = [hidden_layer,hidden_layer2,output_layer] #according to the code, the last layer will be considered the output
# create neural net that takes two inputs
ann = mlp.MLP(layers, nvis=11)
trainer.setup(ann, ds)
# train neural net until the termination criterion is true
while True:
trainer.train(dataset=ds)
ann.monitor.report_epoch()
ann.monitor()
if not trainer.continue_learning(ann):
break
####END CODE####################################################################
My questions:
I. Weights. How do I see the weights from the trained model? I *think* I am adding a second hidden layer above but if I looked at ann.get_weights() the dimension of this resulting object does not change if I remove the second hidden layer. So I question if I am looking at the right thing. Ultimately I want to see the finished weights so (outside pylearn) I can visualize the network.
II. Regularization. How to use regularization? Specifically, how to adjust the above code to use 1) drop out and then 2) L2 norm?
Thanks!
Brian
点击此处回复 此帖已被删除。b_m...@live.com | 13-10-12 |
Through the ann.get_param_values() call I am now able to see the weight and bias values and through knowledge of the net architecture, accomplish question #1.
I would still like to get some quick help on how to use regularization (especially dropout) and then how to predict new cases with such a model (ann.fprop(theano.shared(testMatrix, name='test')).eval() call still work?).
Thanks!
Kyle Kastner | 13-10-12 |
- 显示引用文字 -
--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com | 13-10-13 |
Kyle,
I am not getting at error, instead I am looking to learn/confirm the proper method to train a MLP using regularization (L2 as well as dropout) and then get predictions on a new data set. I am not using yaml though i want a way to use pylearn2 directly in python using its functions.
I referenced a blog that showed how to train a MLP w/o regularization (only number of epochs) and then predict new data using ann.fprop where ann is the trained MLP. I *think* I can use drop out simply by adding the call into SGD like this:
sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200), cost=Dropout())
and then to predict new data I *think* i just need to call dropout_fprop instead of fprop. Like this (where X_s is the new test set).
test_preds=ann.dropout_fprop(theano.shared(X_s, name='test')).eval()
But I am hoping one of the developers will confirm this is correct and explain how to add a L2 penalty, as that is escaping me currently. I am not very experienced with Python yet so following the code is a challenge.
Kyle Kastner | 13-10-14 |
- 显示引用文字 -
--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com | 13-10-14 |
Hey Kyle,
I. I see this description of dropout_fprop from models/mlp.py so I am not sure:
def dropout_fprop(self, state_below, default_input_include_prob=0.5,
input_include_probs=None, default_input_scale=2.,
input_scales=None, per_example=True):
"""
state_below: The input to the MLP
Returns the output of the MLP, when applying dropout to the input and intermediate layers.
II. regarding L2, I would not be using both, just want to see how to do it as another option.
I saw that class. I also am thinking that here https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py
there is this:
class WeightDecay(Cost):
"""
coeff * sum(sqr(weights))
for each set of weights.
"""
def __init__(self, coeffs):
"""
coeffs: a list, one element per layer, specifying the coefficient
to multiply with the cost defined by the squared L2 norm of the weights
for each layer.
and this
class L1WeightDecay(Cost):
"""
coeff * sum(abs(weights))
for each set of weights.
"""
def __init__(self, coeffs):
"""
coeffs: a list, one element per layer, specifying the coefficient
to multiply with the cost defined by the L1 norm of the
weights(lasso) for each layer.
which might be the way to go for L1 and L2 reg.
Kyle Kastner | 13-10-14 |
- 显示引用文字 -
--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com | 13-10-14 |
I could not figure out how to use the weightdecay class in my code (I am not using yaml). I tried this with no success
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=WeightDecay(coeffs=[0.005,0.005,0.005]))
With yaml are you able to make predictions on a new dataset (and get the probabilities and not just the predicted class).
On Monday, October 14, 2013 9:28:46 AM UTC-4, Kyle Kastner wrote:
> As far as dropout_fprop goes, I think that description matchesmy thoughts. You use dropout_fprop during the training stage, to apply dropout at each layer, which *effectively* creates many separate neural networks, each trained on one example. Then, once the training is all done, you can use a regular fprop, which will *effectively* give you the bagged decision result from all of these networks, by making a decision using all of the weights (see http://arxiv.org/pdf/1207.0580.pdf)
>
>
>
> In short, I think that dropout_fprop is largely internal/used during training - while fprop is used for predictions with a trained net.
>
>
> I did not see the WeightDecay/L1WeightDecay classes - I agree that those seem like the way to go. If I can get those working in my own code I will let you know.
>
>
> Kyle
>
>
>
Ian Goodfellow | 13-10-15 |
b_m...@live.com | 13-10-15 |
Ian,
I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set?
II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?
b_m...@live.com | 13-10-15 |
I mean for I. that a user doesn't ever call dropout_fprop directly correct, just ass the cost=Dropout() call into sgd?
Ian Goodfellow | 13-10-15 |
b_m...@live.com | 13-10-15 |
Thanks! Last follow-up, how would I actually accomplish II?
I tried this but receive an NotImplementedError.
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
Ian Goodfellow | 13-10-15 |
b_m...@live.com | 13-10-15 |
Traceback (most recent call last):
File "
File "C:Anacondalibsite-packagespylearn2-0.1dev-py2.7.eggpylearn2 raining_algorithmssgd.py", line 314, in train
"data_specs: %s" % str(data_specs))
NotImplementedError: Unable to train with SGD, because the cost does not actually use data from the data set. data_specs: (CompositeSpace(), ())
I can post the entire script (it is a simple 2 hidden layer mlp) if need be.
Ian Goodfellow | 13-10-15 |
b_m...@live.com | 13-10-15 |
import theano
from pylearn2.models import mlp
from pylearn2.training_algorithms import sgd
from pylearn2.termination_criteria import MonitorBased, EpochCounter
from pylearn2.costs.mlp.dropout import Dropout
from pylearn2.costs.cost import SumOfCosts, MethodCost
from pylearn2.models.mlp import WeightDecay, L1WeightDecay
from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix
import numpy as np
from random import randint
from sklearn.metrics import confusion_matrix, roc_auc_score, accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Binarizer
import pandas as pd
X=np.loadtxt(open("C:UsersDesktoppylearn2wine.csv"), delimiter=';',usecols=range(0, 11), skiprows=1) #first 11 cols
X=np.array(X)
y=np.loadtxt(open("C:UsersDesktoppylearn2wine.csv"), delimiter=';',usecols=(11,12), skiprows=1)
y=np.array(y)
#train
X_t=X[:3000,:]
y_t=y[:3000,:]
#valid
X_v=X[2500:3000,:]
y_v=y[2500:3000,:]
#test
X_s=X[3000:,:]
y_s=y[3000:,:]
#center and scale inputs
scaler=StandardScaler()
scaler.fit(X_t)
X_t=scaler.transform(X_t)
X_v=scaler.transform(X_v)
X_s=scaler.transform(X_s)
class datMake(DenseDesignMatrix): #inherits from DenseDesignMatrix
def __init__(self,X,y):
super(datMake, self).__init__(X=X, y=y)
dt_train=datMake(X_t,y_t)
dt_valid=datMake(X_v,y_v)
dt_test=datMake(X_s,y_s)
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])])) #epoch is complete run through the data. if the training set is 2000 records and the batch size is 100, there are two batches in an epoch
layers = [hidden_layer,hidden_layer2,output_layer] #according to the code, the last layer will be considered the output# create neural net that takes 11 inputs
ann = mlp.MLP(layers, nvis=11)
trainer.setup(ann, dt_train)
ann.get_params()
ann.get_param_values()
#predict the test set
test_preds=ann.fprop(theano.shared(X_s, name='test')).eval()
Ian Goodfellow | 13-10-15 |
b_m...@live.com | 13-10-15 |
I placed the file here: https://docs.google.com/file/d/0B9dsnio60wRoRHptdHlTZjk2RU0/edit?usp=sharing
thanks Ian!
Pascal Lamblin | 13-10-15 |
Ian Goodfellow | 13-10-15 |
Brian Miner | 13-10-15 |
Ian Goodfellow | 13-10-16 |
b_m...@live.com | 13-10-16 |
What i am struggling with and perhaps just did not explain well enough is how to add the weight decay to the default cost that results from a call to sgd without the cost parameter added at all. I don't want to combine weight decay with dropout. I want the output layer to dictate the cost, to which to add the weight decay term.
For example, this call
sgd.SGD(learning_rate=0.005,batch_size=100,termination_criterion=EpochCounter(5000))
has some default cost. I expect it is the negative log lik derived from the choice of output layer.
So, my question is simply what to do to add this cost to the weight decay (within SumOfCosts). There is a NegativeLogLikelihood in supervised_cost but that seems to be depreciated.
Thanks for the time!
Ian Goodfellow | 13-10-16 |
Pascal Lamblin | 13-10-16 |
Ian Goodfellow | 13-10-16 |
Brian | 13-10-16 |
Ian Goodfellow | 13-10-16 |
我们在微信上24小时期待你的声音
解答本文疑问/技术咨询/运营咨询/技术建议/互联网交流