Python Read Text File Into Numpy Array
Welcome to another module of numpy. In our previous module, we had got insights on numpy in python. But the chore becomes hard while dealing with files or CSV files in python as there are a humongous amount of information in a file. To brand this task easier, nosotros volition have to deal with the numpy module in python. If you lot have non studied numpy, so I would recommend studying my previous tutorial to empathise numpy.
Introduction
One of the difficult tasks is when working with data and loading information properly. The most mutual fashion the information is formatted is CSV. Y'all might wonder if at that place is a direct way to import the contents of a CSV file into a tape assortment much in the mode that we do in R programming?
Why CSV file format is used?
CSV is a plainly-text file that makes it easier for data manipulation and is easier to import onto a spreadsheet or database. For example, Y'all might want to export the information of certain statistics to a CSV file and then import it to the spreadsheet for further data assay. It makes users working feel very piece of cake programmatically in python. Python supports a text file or string manipulation with CSV files directly.
Ways to load CSV file in python
There are many ways to load a CSV file in python. The iii common approaches in Python are the following: –
- Load CSV using numpy.
- Using Standard Library function.
- Load CSV using pandas.
- Using PySpark.
Out of all the three today, nosotros volition discuss simply how to read a CSV file using numpy. Moving ahead, let's see how Python natively uses CSV.
Reading of a CSV file with numpy in python
Equally mentioned before, numpy is used past information scientists and machine learning engineers extensively because they accept to work with a lot with the data that are mostly stored in CSV files. Somehow numpy in python makes it a lot easier for the information scientist to piece of work with CSV files. The two ways to read a CSV file using numpy in python are:-
- Without using any library.
- numpy.loadtxt() function
- Using numpy.genfromtxt() function
- Using the CSV module.
- Utilise a Pandas dataframe.
- Using PySpark.
i.Without using whatsoever congenital-in library
Sounds unreal, right! Merely with the help of python, we can achieve anything. There is a congenital-in function provided by python called 'open up' through which we can read any CSV file. The open built-in function copies everything that is there is a CSV file in string format. Let united states become to the syntax part to get it more articulate.
Syntax:-
open('File_name')
Parameter
All we need to practice is laissez passer the file name as a parameter in the open up built in role.
Return value
It returns the content of the file in string format.
Allow's do some coding.
file_data = open up('sample.csv') for row in file_data: print(row)
OUTPUT:-
Name,Hire Engagement,Salary,Sick Days Left
Graham Bell,03/15/nineteen,50000.00,10
John Cleese,06/01/xviii,65000.00,eight
Kimmi Chandel,05/12/20,45000.00,10
Terry Jones,11/01/xiii,70000.00,3
Terry Gilliam,08/12/twenty,48000.00,7
Michael Palin,05/23/twenty,66000.00,8
2. Using numpy.loadtxt() office
It is used to load text file data in python. numpy.loadtxt( ) is similar to the function numpy.genfromtxt( ) when no data is missing.
Syntax:
numpy.loadtxt(fname)
The default information type(dtype) parameter for numpy.loadtxt( ) is float.
import numpy as np information = np.loadtxt("sample.csv", dtype=int) print(data)# Text file data converted to integer information type
OUTPUT:-
[[i. 2. three.] [4. 5. 6.]]
Explanation of the code
- Imported numpy library having alias name as np.
- Loading the CSV file and converting the file data into integer information type by using dtype.
- Print the information variable to get the desired output.
3. Using numpy.genfromtxt() part
The genfromtxt()
office is used quite ofttimes to load information from text files in python. We can read data from CSV files using this role and store it into a numpy array. This office has many arguments available, making it a lot easier to load the data in the desired format. We tin can specify the delimiter, deal with missing values, delete specified characters, and specify the datatype of data using the different arguments of this function.
Lets do some code to become the concept more than articulate.
Syntax:
numpy.genfromtxt(fname)
Parameter
The parameter is usually the CSV file name that yous want to read. Other than that, we can specify delimiter, names, etc. The other optional parameters are the following:
Name | Description |
fname | file, file proper noun, list to read. |
dtype | The data type of the resulting assortment. If none, then the data type will be adamant by the content of each column. |
comments | All characters occurring on a line later on a comment are discarded. |
delimiter | The string is used to divide values. Past default, any whitespace occurring consecutively acts as a delimiter. |
skip_header | The number of lines to skip at the beginning of a file. |
skip_footer | The number of lines to skip at the end of a file. |
missing_values | The set up of strings corresponding to missing data. |
filling_values | A prepare of values that should be used when some data is missing. |
usecols | The columns that should be read. It begins with 0 showtime. For instance, usecols = (1,iv,v) will extract the 2nd,5th and 6th columns. |
Render Value
Information technology returns ndarray.
from numpy import genfromtxt data = genfromtxt('sample.csv', delimiter=',', skip_header = i) print(information)
OUTPUT:
[[i. 2. 3.] [four. 5. 6.]]
Explanation of the code
- From the package, numpy imported genfromtxt.
- Stored the data into the variable data that will return the ndarray bypassing the file name, delimiter, and skip_header as the parameter.
- Print the variable to get the output.
4. Using CSV module in python
TheCSV
the module is used to read and write data to CSV files more than efficiently in Python. This method volition read the data from a CSV file using this module and store it into a listing. And so it will further proceed to convert this list to a numpy array in python.
The code below volition explain this.
import csv import numpy as np with open('sample.csv', 'r') as f: information = list(csv.reader(f, delimiter=";")) data = np.array(data) print(information)
OUTPUT:-
[[1. ii. 3.] [4. 5. 6.]]
Explanation of the lawmaking
- Imported the CSV module.
- Imported numpy every bit nosotros desire to use the numpy.array feature in python.
- Loading the file sample.csv in reading fashion as we take mention 'r.'
- After separating the value using a delimiter, nosotros store the data into an array course using numpy.array
- Print the data to get the desired output.
5. Use a Pandas dataframe in python
We can utilise a dataframe of pandas to read CSV data into an array in python. We can practise this past using the value() function. For this, we will have to read the dataframe and then catechumen it into a numpy assortment by using the value() function from the pandas' library.
from pandas import read_csv df = read_csv('sample.csv') information = df.values impress(data)
OUTPUT:-
[[ane 2 3] [iv 5 6]]
To show some of the power ofpandas
CSV capabilities, I've created a slightly more complicated file to read, chosenhrdataset.csv
. It contains data on company employees:
hrdataset CSV file
Name,Hire Date,Salary,Sick Days Left
Graham Bong,03/15/19,50000.00,ten
John Cleese,06/01/eighteen,65000.00,8
Kimmi Chandel,05/12/20,45000.00,10
Terry Jones,11/01/xiii,70000.00,3
Terry Gilliam,08/12/twenty,48000.00,7
Michael Palin,05/23/xx,66000.00,8
import pandas dataframe = pandas.read_csv('hrdataset.csv') impress(dataFrame)
OUTPUT:-
Proper noun Hire Date Bacon Ill Days Left
0 Graham Bell 03/15/19 50000.0 ten
1 John Cleese 06/01/18 65000.0 viii
two Kimmi Chandel 05/12/20 45000.0 10
3 Terry Jones 11/01/13 70000.0 3
iv Terry Gilliam 08/12/twenty 48000.0 7
5 Michael Palin 05/23/twenty 66000.0 eight
6. Using PySpark in Python
Reading and writing information in Spark in python is an important task. By and large, it is the starting time for any course of Big data processing. For case, there are unlike ways to read a CSV file using pyspark in python if you want to know the core syntax for reading data earlier moving on to the specifics.
Syntax:-
spark.format("...").option("key", "value").schema(…).load()
Parameters
DataFrameReaderis the foundation for reading information in Spark, it can be accessed via spark.read attribute.
- format — specifies the file format as in CSV, JSON, parquet, or TSV. The default is parquet.
- option — a set of key-value configurations. It specifies how to read data.
- schema — It is an optional one that is used to specify if you would like to infer the schema from the database.
3 ways to read a CSV file using PySpark in python.
- df = spark.read.format("CSV").selection("header", "Truthful").load(filepath).
2. df = spark.read.format("CSV").pick("inferSchema", "Truthful").load(filepath).
3. df = spark.read.format("CSV").schema(csvSchema).load(filepath).
Lets do some coding to understand.
diamonds = spark.read.format("csv") .option("header", "truthful") .option("inferSchema", "true") .load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv")
OUTPUT:-
Decision
This article has covered the different ways to read information from a CSV file using the numpy module. This brings united states to the stop of our article, "How to read CSV File in Python using numpy." I hope y'all are articulate with all the concepts related to CSV, how to read, and the different parameters used. If you understand the basics of reading CSV files, you won't ever be defenseless flat-footed when dealing with importing information.
Brand sure you practice as much every bit possible and gain more than experience.
Got a question for the states? Please mention it in the comments section of this "6 ways to read CSV File with numpy in Python" article, and we will go back to you as presently as possible.
FAQs
- How exercise I skip the outset line of a CSV file in python?
Ans:- Use csv.reader() and side by side() if yous are non using any library. Lets code to sympathise.
Allow us consider the following sample.csv file to sympathise.
sample.csv
fruit,count
apple,1
banana,ii
file = open('sample.csv') csv_reader = csv.reader(file) next(csv_reader) for row in csv_reader: impress(row)
OUTPUT:-
['apple tree', '1']
['banana', 'two']
As you tin can see the commencement line which had fruit, count is eliminated.
2. How exercise I count the number of rows in a csv file?
Ans:- Use len() and list() on a csv reader to count the number of lines.
lets go to this sample.csv information
1,2,iii
4,5,6
7,eight,9
file_data = open("sample.csv") reader = csv.reader(file_data) Count_lines= len(listing(reader)) print(Count_lines)
OUTPUT:-
3
Equally you can meet from the sample.csv file that there were three rows that got displayed with the help of the len() part.
Source: https://www.pythonpool.com/numpy-read-csv/
0 Response to "Python Read Text File Into Numpy Array"
Post a Comment