data-mining

How do I data mine text?

How do I data mine text?
Here's the problem. I have a bunch of large text files with paragraphs and paragraphs of written matter. Each para contains references to a few people (names), and documents a few topics (places, objects).How do I data mine this pile to assemble some

Clustering of sparse matrix in python and scipy

Clustering of sparse matrix in python and scipy
I'm trying to cluster some data with python and scipy but the following code does not work for reason I do not understand:from scipy.sparse import * matrix = dok_matrix((en,en), int) for pub in pubs: authors = pub.split(";") for auth1 in authors

Using a Geo Distance Function on ELKI

Using a Geo Distance Function on ELKI
I am using ELKI to mine some geospatial data (lat,long pairs) and I am quite concerned on using the right data types and algorithms. On the parameterizer of my algorithm, I tried to change the default distance function by a geo function (LngLatDistan

Data Mining - Predictive Analysis

Data Mining - Predictive Analysis
We are looking at acquiring Data Mining software to primarily run predictive analysis processes.How does SQL Server Data Mining solution compares to other solutions like SPSS from IBM?Since SQL Server DM is included in SQL Server Enterprise license -

itemFrequency of two items together

itemFrequency of two items together
Assume that I have following transactions:B C A F H F E C H E D B A C H F E F A D H B E C F B D A H C E G A E B H EI read transactions in R with read.transactions function of arules library. I need item frequency of a specific items. For example for

Extracting information from millions of simple but inconsistent text files

Extracting information from millions of simple but inconsistent text files
We have millions of simple txt documents containing various data structures we extracted from pdf, the text is printed line by line so all formatting is lost (because when we tried tools to maintain the format they just messed it up). We need to extr

How to determine topic of given document (text)? [closed]

How to determine topic of given document (text)? [closed]
I know how to classify texts through Weka, I can insert a folder of texts in Weka GUI and trying different algorithms it can show me if one of the texts is positive/negative to some topic.Now I need something different, I want to build an application

Detecting users emotions by analyzing the text [closed]

Detecting users emotions by analyzing the text [closed]
im doing a project which will detect users emotions through text? Im new to this area im still finding the best algorithm to detect the emotions from text.suggest me a good method to do this?You can use in a simple way:Build a list of lets say 300 qu

How to handle nominal data in scikit learn, python?

How to handle nominal data in scikit learn, python?
I am new to data mining. I have a data set which includes directors' names. What is the right way to convert them to something that Scikit learn estimators can use without problem?From what I found on the internet I thought that sklearn.preprocessing

Why Adtree has more accuracy than C4.5 [closed]

Why Adtree has more accuracy than C4.5 [closed]
I've been working on a data mining project lately, and it confuses me a lot that alternating decision tree seems to have more accuracy than WEKA built-in j48 algorithm. I don't have much idea about how these two algorithms are implemented, I hope som

Algorithms for mapping data in data mining

Algorithms for mapping data in data mining
I need to scrape some webpages and extract content from them. I'm planning to select some specific keywords and map the data that has some relationship b/w them. But I have no Idea, how I could do that. Could anyone suggest me some algorithms for doi

KDD1999 dataset Features exolaination

KDD1999 dataset Features exolaination
I'm using KDD1999 dataset to prevent intrusion, but i have some questions about the features: can someone explain to me or give me the meaning of the flags. Here is the list of the flags used in the KDD1999 dataset:'flag' { 'OTH', 'REJ', 'RSTO', 'RST

How could I use graph mining method to get a multi-node graph?

How could I use graph mining method to get a multi-node graph?
I now use apriori algorithm to do a data mining project,and I get result such as:item1 <=>iteam2.item2 <=>item3....... I want use graph mining to generate a graph containing many nodes and illustrating relation between these node like this:I h

How would you group/cluster these three areas in arrays in python?

How would you group/cluster these three areas in arrays in python?
So you have an array1 2 3 60 70 80 100 220 230 250For a better understanding:How would you group/cluster the three areas in arrays in python(v2.6), so you get three arrays in this case containing[1 2 3] [60 70 80 100] [220 230 250]Background:y-axis i

Applying K-means clustering on Z-score Normalized Data

Applying K-means clustering on Z-score Normalized Data
I've been working to understand how to apply k-means clustering to a small set of data for a list of companies.The mean and standard deviation is given so that I can determine the normalized data.For example, I have the following:From my understandin

Add a constant value to a numerical attribute in Rapid Miner

Add a constant value to a numerical attribute in Rapid Miner
I am working with rapidminer , I have a dataset with a numerical field (attribute) , I want to simply add a constant (e.g. 1) to all values of this feature ,How may I do this? I have not found anything straightforward so far.Use Data Transformation >

Correcting a known bias in collected data

Correcting a known bias in collected data
Ok, so here is a problem analogous to my problem (I'll elaborate on the real problem below, but I think this analogy will be easier to understand).I have a strange two-sided coin that only comes up heads (randomly) 1 in every 1,001 tosses (the remain

The meaning/implication of the matrices generated by Singular Value Decomposition (SVD) for Latent Semantic Analysis (LSA)

The meaning/implication of the matrices generated by Singular Value Decomposition (SVD) for Latent Semantic Analysis (LSA)
SVD is used in LSA to get the latent semantic information. I am confused about the interpretation about the SVD matrices.We first build a document-term matrix. And then use SVD to decompose it into 3 matrices.For example:The doc-term matrix M1 is M x

How to do column wise intersection with itertools

How to do column wise intersection with itertools
When I calculate the jaccard similarity between each of my training data of (m) training examples each with 6 features (Age,Occupation,Gender,Product_range, Product_cat and Product) forming a (m*m) similarity matrix.I get a different outcome for matr

How to fit linear model after principal component analysis

How to fit linear model after principal component analysis
I've done a princomp analysis on four columns of my dataframe, and found that the first component is overwhelmingly more important than the other three.Now I want to fit a linear model using the first component only. How do I get the new data made up
What Others Are Reading